1.yarn集羣開啓
2.啓動zookeeper集羣(kafka需要)
3.啓動kafka服務端、生產者和消費者端(生產者模擬往kafka灌入數據,消費者端打印數據)
3.1啓動kafka服務端
3.2啓動kafka生產者
3.3啓動kafka消費者
4.spark官方Demo改吧改吧
找到你的spark安裝目錄-->spark-2.0.2-bin-hadoop2.6/examples/src/main/scala/org/apache/spark/examples/streaming
代碼如下:
import kafka.serializer.StringDecoder
import org.apache.spark.streaming._
import org.apache.spark.streaming.kafka._
import org.apache.spark.SparkConf
/**
* Consumes messages from one or more topics in Kafka and does wordcount.
* Usage: DirectKafkaWordCount <brokers> <topics>
* <brokers> is a list of one or more Kafka brokers
* <topics> is a list of one or more kafka topics to consume from
*
* Example:
* $ bin/run-example streaming.DirectKafkaWordCount broker1-host:port,broker2-host:port \
* topic1,topic2
*/
object DirectKafkaWordCount {
def main(args: Array[String]): Unit = {
// if (args.length < 2) {
// System.err.println(s"""
// |Usage: DirectKafkaWordCount <brokers> <topics>
// | <brokers> is a list of one or more Kafka brokers
// | <topics> is a list of one or more kafka topics to consume from
// |
// """.stripMargin)
// System.exit(1)
// }
val Array(brokers, topics) = args
// Create context with 2 second batch interval
val sparkConf = new SparkConf().setAppName("DirectKafkaWordCount")
val ssc = new StreamingContext(sparkConf, Seconds(2))
// Create direct kafka stream with brokers and topics
val topicsSet = topics.split(",").toSet
val kafkaParams = Map[String, String]("metadata.broker.list" -> brokers)
val messages = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
ssc, kafkaParams, topicsSet)
// Get the lines, split them into words, count the words and print
val lines = messages.map(_._2)
// val words = lines.flatMap(_.split(" "))
val wordCounts = lines.map(x => (x, 1L)).reduceByKey(_ + _)
wordCounts.print()
// Start the computation
ssc.start()
ssc.awaitTermination()
}
}
5.打包上傳到集羣,編寫啓動腳本如下,並運行
/usr/local/src/spark-2.0.2-bin-hadoop2.6/bin/spark-submit\
--class streaming.DirectKafkaWordCount\
--master yarn-cluster \
--executor-memory 1G \
--total-executor-cores 1 \
--files $HIVE_HOME/conf/hive-site.xml \
./sparkTest-1.0-SNAPSHOT-jar-with-dependencies.jar\
192.168.2.11:9092 topic_name
6.在yarn集羣上查看啓動的Application
7.在kafka的生產者端輸入數據,可以在kafka的消費端看到打印的數據,然後再在yarn集羣查看日誌可以看到計算結果。