版本號:
spark-streaming-kafka-0-10_2.11
version:2.4.0
kafka-clients
version:0.11.0.0
問題:之前都是使用的0.8版本的來保存offset,但因爲線上Kafka集羣版本爲0.11.0.0的,保存offset的方式發生了很大的變化。
官網的方式:
import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.streaming.kafka010._
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> "localhost:9092,anotherhost:9092",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> "use_a_separate_group_id_for_each_stream",
"auto.offset.reset" -> "latest",
"enable.auto.commit" -> (false: java.lang.Boolean)
)
val topics = Array("topicA", "topicB")
val stream = KafkaUtils.createDirectStream[String, String](
streamingContext,
PreferConsistent,
Subscribe[String, String](topics, kafkaParams)
)
stream.foreachRDD { rdd =>
val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
rdd.foreachPartition { iter =>
val o: OffsetRange = offsetRanges(TaskContext.get.partitionId)
println(s"${o.topic} ${o.partition} ${o.fromOffset} ${o.untilOffset}")
}
// some time later, after outputs have completed
stream.asInstanceOf[CanCommitOffsets].commitAsync(offsetRanges)
}
而我現在有個需求就是在foreachRDD之前做一下開窗函數和map、filter運算,但做完這些操作之後再進行foreachRDD就獲取不到offsetRanges了。改好之後的代碼:
//全局的
var offsetRanges: Array[OffsetRange] = Array[OffsetRange]()
//方法體內的
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> PropertiesUtil.getPropertiesToStr("kafka.hosts"),
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> "xxx",
"auto.offset.reset" -> "latest",
"enable.auto.commit" -> (false: java.lang.Boolean)
)
val topics = Array("topicA")
val messages: InputDStream[ConsumerRecord[String, String]] = KafkaOffsetUtil.createMyZookeeperDirectKafkaStream(
ssc,
kafkaParams,
topics,
PropertiesUtil.getPropertiesToStr("zookeeper.group.name"),
PropertiesUtil.getPropertiesToStr("zookeeper.path"))
val dataOriginDStream = messages.transform {
rdd =>
offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
rdd
}.filter(kv => {
...
})
dataOriginDStream.foreachRDD(rdd => {
if (!rdd.isEmpty()) {
...
}
// 存儲新的offset
KafkaOffsetUtil.saveOffsets(
PropertiesUtil.getPropertiesToStr("zookeeper.path"),
offsetRanges,
PropertiesUtil.getPropertiesToStr("zookeeper.group.name"))
})
我的思路:使用transform先將offsetRanges保存到一個變量中,然後foreachRDD的時候再取出使用!!