metadata 元數據的checkpoint是用來恢復當驅動程序失敗的場景下,
而數據本身或者RDD的checkpoint通常是用來容錯有狀態的數據處理失敗的場景
import org.apache.log4j.{Level, Logger}
import org.apache.spark.streaming.dstream.DStream
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{SparkConf, SparkContext}
/**
* Created by csw on 2017/7/13.
*/
object CheckPointTest {
Logger.getLogger("org").setLevel(Level.WARN)
val conf = new SparkConf().setAppName("Spark shell")
val sc = new SparkContext(conf)
//設置時間間隔
val batchDuration=2
// 設置Metadata在HDFS上的checkpoint目錄
val dir = "hdfs://master:9000/csw/tmp/test3"
// 通過函數來創建或者從已有的checkpoint裏面構建StreamingContext
def functionToCreatContext(): StreamingContext = {
val ssc = new StreamingContext(sc, Seconds(batchDuration))
ssc.checkpoint(dir)
val fileStream: DStream[String] = ssc.textFileStream("hdfs://master:9000/csw/tmp/testStreaming")
//設置通過間隔時間,定時持久checkpoint到hdfs上
fileStream.checkpoint(Seconds(batchDuration*5))
fileStream.foreachRDD(x => {
val collect: Array[String] = x.collect()
collect.foreach(x => println(x))
})
ssc
}
def main(args: Array[String]) {
val context: StreamingContext = StreamingContext.getOrCreate(dir, functionToCreatContext _)
context.start()
context.awaitTermination()
}
}
17/07/13 10:57:10 INFO WriteAheadLogManager for Thread: Reading from the logs:
hdfs://master:9000/csw/tmp/test3/receivedBlockMetadata/log-1499914584482-1499914644482
17/07/13 10:57:10 ERROR streaming.StreamingContext: Error starting the context, marking it as stopped
org.apache.spark.SparkException: org.apache.spark.streaming.dstream.MappedDStream@4735d6e5 has not been initialized
at org.apache.spark.streaming.dstream.DStream.isTimeValid(DStream.scala:323)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:344)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:344)
17/07/13 11:26:45 ERROR util.Utils: Exception encountered
java.lang.ClassNotFoundException: streaming.CheckPointTest$$anonfun$functionToCreatContext$1
....
17/07/13 11:26:45 WARN streaming.CheckpointReader: Error reading checkpoint from file hdfs://master:9000/csw/tmp/test3/checkpoint-1499916310000
java.io.IOException: java.lang.ClassNotFoundException: streaming.CheckPointTest$$anonfun$functionToCreatContext$1
......
hadoop fs -rm /csw/tmp/test3/checkpoint*
然後再次啓動,發現一切ok,能從checkpoint恢復數據,然後kill掉又一次啓動
就能正常工作了。