步驟概述
1 啓動zookeeper
2 啓動Kafa
3 創建kafka topic
4 通過控制檯測試本kafka topic是否能夠正常的生產和消費信息
5 寫Spark Streaming代碼
6 啓動Spark Streaming程序(傳入參數zookeeper,group,topic,線程數)(傳入參數 hadoop000:2181 test kafka_streaming_topic 1)
7 通過kafka-console-producer生產數據
8 查看idea控制檯輸出信息是否正確
/*Receiver沒有Direct好,生產上一般使用Direct,Direct在Spark1.3之後纔有*/
----------------------------------------
1 啓動zookeeper命令
./zkServer.sh start
----------------------------------------
2 啓動Kafa
$KAFKA_HOME/bin/kafka-server-start.sh -daemon $KAFKA_HOME/config/server.properties
----------------------------------------
3 創建kafka topic
$KAFKA_HOME/bin/kafka-topic.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic kafka_streaming_topic
----------------------------------------
4 通過控制檯測試本kafka topic是否能夠正常的生產和消費信息
// kafka消費端啓動命令
$KAFKA_HOME/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic kafka_streaming_topic
// kafka生產端啓動命令
$KAFKA_HOME/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic kafka_streaming_topic
// 向生產端發送字符,如果消費端能夠收到就證明通了
----------------------------------------
5 寫Spark Streaming代碼
// 向maven添加依賴
groupId = org.apache.spark
artifactId = spark-streaming-kafka-0-8_2.11
version = 2.2.0
package com.imooc.spark
import org.apache.spark.SparkConf
import org.apache.spark.streaming.kafka.KafkaUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}
/*Spark Streaming 整合 Kafka Receiver 方法*/
object KafkaReceiverWordCount {
def main(args: Array[String]): Unit = {
if(args.length != 4) {
System.err.println("Usage: KafkaReceiverWordCount <zkQuorum> <group> <topics> <numThreads>")
}
val Array(zkQuorum, group, topics, numThreads) = args
val sparkConf = new SparkConf().setAppName("KafkaReceiverWordCount").setMaster("local[2]")
val ssc = new StreamingContext(sparkConf, Seconds(5))
val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap
// Spark Streaming對接Kafka需要ssc,zookeeper,組,topic
val messages = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap)
// 自己趣測試爲什麼要取第二個
messages.map(_._2).flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).print()
ssc.start()
ssc.awaitTermination()
}
}
----------------------------------------
6 啓動Spark Streaming程序(傳入參數zookeeper,group,topic,線程數)(傳入參數 hadoop000:2181 test kafka_streaming_topic 1)
在idea右鍵啓動一次後添加啓動參數
----------------------------------------
7 通過kafka-console-producer生產數據
$KAFKA_HOME/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic kafka_streaming_topic
// 運行上面命令以後就打幾個字符
----------------------------------------
8 查看idea控制檯輸出信息是否正確
----------------------------------------