SparkStreaming x Kafka 0.8 / 0.10 消費指南

Streaming x Kafka

實時統計數據時需要用到Spark Sreaming x kafka,spark版本就不多贅述了,kafka版本現在主要分0.8.x.x和0.10.x.x,但是調用相同API消費時發現兩者有區別,這裏做一下記錄。Kafka Streaming生成選擇常用的Direct Approach(No receiver)方式簡化並行,提升straming接數據時的穩定性。

 

0.8.x.x maven 依賴 與 消費

生成Spark Streaming時也可以不調用Spark Context,直接將Spark Conf 傳給 Streaming Context,這裏sc可以用來讀取其他變File

maven

        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-clients</artifactId>
            <version>0.8.x.x</version>
        </dependency>

消費topic

    val kafkaParams = Map(
      "metadata.broker.list" -> KAFKA_BROKERS,
      "group.id" -> KAFKA_GROUP_ID,
      "auto.offset.reset" -> kafka.api.OffsetRequest.LargestTimeString
    )   
    val sparkConf = if (local) {
      new SparkConf()
        .setMaster(SPARK_LOCAL_HOST)
        .setAppName(appName)
    } else {
      new SparkConf().setAppName(appName)
    }
    val sc = new SparkContext(sparkConf)
    val ssc = new StreamingContext(sc,
      Seconds(SPARK_STREAMING_INTERVAL.toInt)
    )
    

val messages = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topicsSet)

    messages.foreachRDD(rdd => {
      rdd.foreachPartition(partition => {
        partition.foreach(line => {
          Execute(line)
        })
      })
    })
    ssc.start()
    ssc.awaitTermination()
  }

 

0.10.x.x maven 依賴 與 消費

與0.8.x.x的消費主要區別在kafka配置與DStream生成的API改動,主要邏輯寫在Excute函數中即可

maven

        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-clients</artifactId>
            <version>0.10.x.x</version>
        </dependency>

消費topic

   val kafkaParameters = Map[String, Object](
      "bootstrap.servers" -> KAFKA_BROKERS,
      "group.id" -> KAFKA_GROUP_ID,
      "enable.auto.commit" -> (true: java.lang.Boolean),
      "auto.offset.reset" -> "latest",
      "key.deserializer" -> classOf[StringDeserializer],
      "value.deserializer" -> classOf[StringDeserializer],
      "security.protocol" -> "SASL_PLAINTEXT",
      "fetch.min.bytes" -> "4096",
      "sasl.mechanism" -> "PLAIN"
    )


    val sparkConf = if (local) {
      new SparkConf()
        .setMaster(SPARK_LOCAL_HOST)
        .setAppName(appName)
    } else {
      new SparkConf().setAppName(appName)
    }

    val sc = new SparkContext(sparkConf)
    val ssc = new StreamingContext(sc,
      Seconds(SPARK_STREAMING_INTERVAL.toInt)
    )


    val kafkaStream = KafkaUtils.createDirectStream[String, String](ssc,
      LocationStrategies.PreferConsistent,
      ConsumerStrategies.Subscribe[String, String](Array(KAFKA_TOPIC), kafkaParameters))

    kafkaStream.foreachRDD(rdd=>{
      rdd.foreachPartition(partition => {
        partition.foreach(line => {
          Execute(line.value())
        })
      })
    })
    ssc.start()
    ssc.awaitTermination()

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章