在是使用 pyspark 連接spark 時出現一下錯誤,當時試了很多方都沒有解決,最後終於解決。如下所示
from pyspark import SparkContext
from pyspark import SparkConf
import pyspark
string_test = 'pyspark_test'
print(pyspark.__version__)
conf = SparkConf().setAppName(string_test).setMaster('spark://master:7077')
sc = SparkContext(conf=conf)
#
list_test = [1, 2, 3]
x = sc.parallelize(list_test)
y = x.map(lambda x: (x, x * 2))
print (x.collect())
print (y.collect())
sc.stop()
報錯
ValueError: Cannot run multiple SparkContexts at once; existing SparkContext(app=PySparkShell, master=local[*]) created by <module> at /usr/local/spark/python/pyspark/shell.py:59
解決辦法
出現這個錯誤是因爲之前已將啓動了SparkContext ,所以需要先關閉spark,然後再啓動