1. 下載安裝 spark
2. 下載安裝python
3. 創建環境變量 spark_home D:\Spark\spark-2.0.1-bin-hadoop2.6
4. 將路徑D:\Spark\spark-2.0.1-bin-hadoop2.6\python\pyspark加入環境變量
5. 將D:\Spark\spark-2.0.1-bin-hadoop2.6 下的pyspark 文件夾拷貝到python安裝路徑下:D:\Python\Python35\Lib
6. 在python自帶編譯器中測試如下代碼:
from pyspark import SparkContext
logFile = "F:\\testData\\test.txt"
sc = SparkContext("local", "Simple App")
logData = sc.textFile(logFile).cache()
numAs = logData.filter(lambda s: 'a' in s).count()
numBs = logData.filter(lambda s: 'b' in s).count()
print("Lines with a: %i, lines with b: %i" % (numAs, numBs))
7. 執行結果如下