pyspark steaming 連接kafka數據實時處理(也可以對接flume+kafka+spark)

需要下載對應 spark-streaming-kafka-0-8-assembly jar包(版本要對於)

下載地址:
https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-8-assembly_2.11

一定要下載對應的assembly版本,不然不識別

版本對應說明比如:spark-streaming-kafka-0-8-assembly_2.11-2.4.4.jar
2.11是scale版本,2.4.4是spark版本
2.11可以查kafka版本 find / -name \*kafka_\* | head -1 | grep -o '\kafka[^\n]*' 看到像kafka_2.10-0.8.2-beta.jar這樣的文件,其中2.10是Scala版本,0.8.2-beta是Kafka版本

代碼區:

啓動steaming程序後,zookeeper和kafka也啓動起來
運行:

spark-submit --jars /Us****oads/spark-streaming-kafka-0-8-assembly_2.11-2.4.4.jar pyspark01/pyspark_steaming02.py

from pyspark import SparkContext
from pyspark.streaming import StreamingContext
import os
import json
from pyspark.streaming.kafka import KafkaUtils
os.environ["PYSPARK_PYTHON"]="/Users/lonng/opt/anaconda3/python.app/Contents/MacOS/python"

# Create a local StreamingContext with two working thread and batch interval of 1 second
sc = SparkContext("local[2]", "NetworkWordCount")
sc.setLogLevel("OFF")
ssc = StreamingContext(sc, 5)

# checkpoint一定要設置,否則報錯
ssc.checkpoint("./")

zookeeper = "localhost:2181"

topic = {"test1": 1}
group_id = "test"
line1 = KafkaUtils.createStream(ssc, zookeeper, group_id, topic)
print(line1)
# lines = KafkaUtils.createDirectStream(ssc, ["hello"], {"metadata.broker.list": "127.0.0.1:9092"})
lines = line1.map(lambda x: x[1])
counts = lines.flatMap(lambda line: line.split(" ")) \
    .map(lambda word: (word, 1)) \
    .reduceByKey(lambda a, b: a+b)
counts.pprint()
# print(lines)
# # # Split each line into words
# # words = lines.flatMap(lambda line: line.split(" "))
#
# # Count each word in each batch
# pairs = lines.map(lambda word: (word, 1))
# wordCounts = pairs.reduceByKey(lambda x, y: x + y)
#
# # Print the first ten elements of each RDD generated in this DStream to the console
# wordCounts.pprint()
# print(wordCounts)

ssc.start()             # Start the computation
ssc.awaitTermination()  # Wait for the computation to terminate


在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章