運行sparkstreaming的NetworkWordCount不能出現

官網：https://spark.apache.org/docs/2.2.0/streaming-programming-guide.html#points-to-remember-1
代碼：

from pyspark import SparkContext
from pyspark.streaming import StreamingContext
sc = SparkContext("local[2]","NetworkWordCount")
ssc = StreamingContext(sc, 1)
lines = ssc.socketTextStream("localhost", 9999)
words = lines.flatMap(lambda line: line.split(" "))
pairs = words.map(lambda word: (word, 1))
wordCounts = pairs.reduceByKey(lambda x, y: x + y)
wordCounts.pprint()
ssc.start()
ssc.awaitTermination()

在一個終端輸入：

nc -lk 9999

另起一個終端輸入：

spark-submit /home/hadoop01/test1.py localhost 9999

然後再nk終端任意輸入文本
spark-submit終端出現能正常接收數據但是不能處理數據，終端反覆出現下面的圖片內容

官網解釋：

Points to remember
When running a Spark Streaming program locally, do not use “local” or “local[1]” as the master URL.
Either of these means that only one thread will be used for running tasks locally. If you are using
an input DStream based on a receiver (e.g. sockets, Kafka, Flume, etc.), then the single thread will
be used to run the receiver, leaving no thread for processing the received data. Hence, when
running locally, always use “local[n]” as the master URL, where n > number of receivers to run
(see Spark Properties for information on how to set
the master).
Extending the logic to running on a cluster, the number of cores allocated to the Spark Streaming
application must be more than the number of receivers. Otherwise the system will receive data, but not be able to process it.
其意思是
在本地運行Spark Streaming程序時，請勿使用“local”或“local [1]”作爲主URL。這兩種方法都意味着只有一個線程將用於本地運行任務。如果您正在使用基於接收器的輸入DStream（例如套接字，Kafka，Flume等），那麼將使用單個線程來運行接收器，而不留下用於處理接收數據的線程。因此，在本地運行時，始終使用“local [n]”作爲主URL，其中n>要運行的接收器數量（有關如何設置主服務器的信息，請參閱Spark屬性）。
將邏輯擴展到在集羣上運行，分配給Spark Streaming應用程序的核心數必須大於接收器數。否則系統將接收數據，但無法處理數據

但是我的程序使用的是local[2]按道理應該不是這個問題，然後又繼續搜，看到一篇博客是修改了虛擬機處理器的設置，然後我也重新修改：

重新運行，在nk終端輸入：

spark-submit終端出現結果

問題解決！
sparkstreaming需要注意的問題：

定義上下文後，您必須執行以下操作：
（1）通過創建輸入DStreams來定義輸入源。
（2）通過將轉換和輸出操作應用於DStream來定義流式計算。
（3）開始接收數據並使用streamingContext.start（）處理它。
（4）等待使用streamingContext.awaitTermination（）停止處理（手動或由於任何錯誤）。
（5）可以使用streamingContext.stop（）手動停止處理。
Discretized Stream或DStream是Spark Streaming提供的基本抽象。它表示連續的數據流，可以是從源接收的輸入數據流，也可以是通過轉換輸入流生成的已處理數據流。它表示連續的數據流，可以是從源接收的輸入數據流，也可以是通過轉換輸入流生成的已處理數據流。
在內部，DStream由一系列連續的RDD表示，這是Spark對不可變分佈式數據集的抽象（有關更多詳細信息，請參閱Spark編程指南）。 DStream中的每個RDD都包含來自特定時間間隔的數據
Spark Streaming提供兩類內置流媒體源。
基本來源：StreamingContext API中直接提供的源。示例：文件系統和套接字連接。
高級資源：Kafka，Flume，Kinesis等資源可通過額外的實用程序類獲得。這些需要鏈接額外的依賴關係，如鏈接部分所述。
注意：
Spark Streaming應用程序需要分配足夠的內核（或線程，如果在本地運行）來處理接收的數據，以及運行接收器。

運行sparkstreaming的NetworkWordCount不能出現

再談23種設計模式（3）：行爲型模式（學習筆記）

Power Automate Desktop 安裝完，登錄後老是提示one driver 錯誤

微前端學習筆記(4):從微前端到微模塊之EMP與hel-micro方案探索

微前端學習筆記（1）：微前端總體架構概述，從微服務發微

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

運行sparkstreaming的NetworkWordCount不能出現

兩個rdd函數的理解及python3不能使用元組

vmware workstation15 清理磁盤

推導式

python數據結構與序列

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結