metadata 元数据的checkpoint是用来恢复当驱动程序失败的场景下,
而数据本身或者RDD的checkpoint通常是用来容错有状态的数据处理失败的场景
import org.apache.log4j.{Level, Logger}
import org.apache.spark.streaming.dstream.DStream
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{SparkConf, SparkContext}
/**
* Created by csw on 2017/7/13.
*/
object CheckPointTest {
Logger.getLogger("org").setLevel(Level.WARN)
val conf = new SparkConf().setAppName("Spark shell")
val sc = new SparkContext(conf)
//设置时间间隔
val batchDuration=2
// 设置Metadata在HDFS上的checkpoint目录
val dir = "hdfs://master:9000/csw/tmp/test3"
// 通过函数来创建或者从已有的checkpoint里面构建StreamingContext
def functionToCreatContext(): StreamingContext = {
val ssc = new StreamingContext(sc, Seconds(batchDuration))
ssc.checkpoint(dir)
val fileStream: DStream[String] = ssc.textFileStream("hdfs://master:9000/csw/tmp/testStreaming")
//设置通过间隔时间,定时持久checkpoint到hdfs上
fileStream.checkpoint(Seconds(batchDuration*5))
fileStream.foreachRDD(x => {
val collect: Array[String] = x.collect()
collect.foreach(x => println(x))
})
ssc
}
def main(args: Array[String]) {
val context: StreamingContext = StreamingContext.getOrCreate(dir, functionToCreatContext _)
context.start()
context.awaitTermination()
}
}
17/07/13 10:57:10 INFO WriteAheadLogManager for Thread: Reading from the logs:
hdfs://master:9000/csw/tmp/test3/receivedBlockMetadata/log-1499914584482-1499914644482
17/07/13 10:57:10 ERROR streaming.StreamingContext: Error starting the context, marking it as stopped
org.apache.spark.SparkException: org.apache.spark.streaming.dstream.MappedDStream@4735d6e5 has not been initialized
at org.apache.spark.streaming.dstream.DStream.isTimeValid(DStream.scala:323)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:344)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:344)
17/07/13 11:26:45 ERROR util.Utils: Exception encountered
java.lang.ClassNotFoundException: streaming.CheckPointTest$$anonfun$functionToCreatContext$1
....
17/07/13 11:26:45 WARN streaming.CheckpointReader: Error reading checkpoint from file hdfs://master:9000/csw/tmp/test3/checkpoint-1499916310000
java.io.IOException: java.lang.ClassNotFoundException: streaming.CheckPointTest$$anonfun$functionToCreatContext$1
......
hadoop fs -rm /csw/tmp/test3/checkpoint*
然后再次启动,发现一切ok,能从checkpoint恢复数据,然后kill掉又一次启动
就能正常工作了。