Spark的Direct方式接收kafka消息實現WordCount

1.yarn集羣開啓

2.啓動zookeeper集羣(kafka需要)

3.啓動kafka服務端、生產者和消費者端(生產者模擬往kafka灌入數據,消費者端打印數據)

3.1啓動kafka服務端

3.2啓動kafka生產者

3.3啓動kafka消費者

4.spark官方Demo改吧改吧

找到你的spark安裝目錄-->spark-2.0.2-bin-hadoop2.6/examples/src/main/scala/org/apache/spark/examples/streaming

代碼如下:

import kafka.serializer.StringDecoder

import org.apache.spark.streaming._
import org.apache.spark.streaming.kafka._
import org.apache.spark.SparkConf

/**
  * Consumes messages from one or more topics in Kafka and does wordcount.
  * Usage: DirectKafkaWordCount <brokers> <topics>
  *   <brokers> is a list of one or more Kafka brokers
  *   <topics> is a list of one or more kafka topics to consume from
  *
  * Example:
  *    $ bin/run-example streaming.DirectKafkaWordCount broker1-host:port,broker2-host:port \
  *    topic1,topic2
  */
object DirectKafkaWordCount {
  def main(args: Array[String]): Unit = {
//    if (args.length < 2) {
//      System.err.println(s"""
//                            |Usage: DirectKafkaWordCount <brokers> <topics>
//                            |  <brokers> is a list of one or more Kafka brokers
//                            |  <topics> is a list of one or more kafka topics to consume from
//                            |
//        """.stripMargin)
//      System.exit(1)
//    }

    val Array(brokers, topics) = args

    // Create context with 2 second batch interval
    val sparkConf = new SparkConf().setAppName("DirectKafkaWordCount")
    val ssc = new StreamingContext(sparkConf, Seconds(2))

    // Create direct kafka stream with brokers and topics
    val topicsSet = topics.split(",").toSet
    val kafkaParams = Map[String, String]("metadata.broker.list" -> brokers)
    val messages = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
      ssc, kafkaParams, topicsSet)

    // Get the lines, split them into words, count the words and print
    val lines = messages.map(_._2)
//    val words = lines.flatMap(_.split(" "))
    val wordCounts = lines.map(x => (x, 1L)).reduceByKey(_ + _)
    wordCounts.print()

    // Start the computation
    ssc.start()
    ssc.awaitTermination()
  }
  }

5.打包上傳到集羣,編寫啓動腳本如下,並運行

/usr/local/src/spark-2.0.2-bin-hadoop2.6/bin/spark-submit\
        --class streaming.DirectKafkaWordCount\
        --master yarn-cluster \
        --executor-memory 1G \
        --total-executor-cores 1 \
        --files $HIVE_HOME/conf/hive-site.xml \
        ./sparkTest-1.0-SNAPSHOT-jar-with-dependencies.jar\
        192.168.2.11:9092 topic_name

6.在yarn集羣上查看啓動的Application

7.在kafka的生產者端輸入數據,可以在kafka的消費端看到打印的數據,然後再在yarn集羣查看日誌可以看到計算結果。

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章