需要下載對應 spark-streaming-kafka-0-8-assembly jar包（版本要對於）

下載地址：
https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-8-assembly_2.11

一定要下載對應的assembly版本，不然不識別

版本對應說明比如：spark-streaming-kafka-0-8-assembly_2.11-2.4.4.jar
2.11是scale版本，2.4.4是spark版本
2.11可以查kafka版本 find / -name \*kafka_\* | head -1 | grep -o '\kafka[^\n]*' 看到像kafka_2.10-0.8.2-beta.jar這樣的文件，其中2.10是Scala版本，0.8.2-beta是Kafka版本

代碼區：

啓動steaming程序後，zookeeper和kafka也啓動起來
運行：

spark-submit --jars /Us****oads/spark-streaming-kafka-0-8-assembly_2.11-2.4.4.jar pyspark01/pyspark_steaming02.py

from pyspark import SparkContext
from pyspark.streaming import StreamingContext
import os
import json
from pyspark.streaming.kafka import KafkaUtils
os.environ["PYSPARK_PYTHON"]="/Users/lonng/opt/anaconda3/python.app/Contents/MacOS/python"

# Create a local StreamingContext with two working thread and batch interval of 1 second
sc = SparkContext("local[2]", "NetworkWordCount")
sc.setLogLevel("OFF")
ssc = StreamingContext(sc, 5)

# checkpoint一定要設置，否則報錯
ssc.checkpoint("./")

zookeeper = "localhost:2181"

topic = {"test1": 1}
group_id = "test"
line1 = KafkaUtils.createStream(ssc, zookeeper, group_id, topic)
print(line1)
# lines = KafkaUtils.createDirectStream(ssc, ["hello"], {"metadata.broker.list": "127.0.0.1:9092"})
lines = line1.map(lambda x: x[1])
counts = lines.flatMap(lambda line: line.split(" ")) \
    .map(lambda word: (word, 1)) \
    .reduceByKey(lambda a, b: a+b)
counts.pprint()
# print(lines)
# # # Split each line into words
# # words = lines.flatMap(lambda line: line.split(" "))
#
# # Count each word in each batch
# pairs = lines.map(lambda word: (word, 1))
# wordCounts = pairs.reduceByKey(lambda x, y: x + y)
#
# # Print the first ten elements of each RDD generated in this DStream to the console
# wordCounts.pprint()
# print(wordCounts)

ssc.start()             # Start the computation
ssc.awaitTermination()  # Wait for the computation to terminate

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

pyspark steaming 連接kafka數據實時處理(也可以對接flume+kafka+spark)

需要下載對應 spark-streaming-kafka-0-8-assembly jar包（版本要對於）

一定要下載對應的assembly版本，不然不識別

代碼區：

spark-submit --jars /Us****oads/spark-streaming-kafka-0-8-assembly_2.11-2.4.4.jar pyspark01/pyspark_steaming02.py

keras非Sequential模型的保存加載再訓練和預測

本地圖片轉在線url，flask搭建在線服務器

pyspark steaming 連接kafka數據實時處理(也可以對接flume+kafka+spark)

pyspark sql dataframe與pandas dataframe簡單操作

flume日誌採集及斷點去重模塊(mac\linux安裝)

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結