spark使用過程中的問題

我的環境:

scala2.10.6
Hadoop 2.6.2
jdk-8u66-linux-x64
spark1.5.2
一臺master,兩臺slave 

問題1、

scala> val textFile =sc.textFile("README.md")

報錯信息爲: error: not found: value sc

sc爲spark context,創建rdd時候就有的了,很明顯創建失敗

可能原因:

hdfs失敗,未啓動hadoop;加載spark-shell就失敗;權限不對;環境變量沒配好==》路徑不對

附註:

Spark context available as sc.

Spark context的縮寫爲Sc。


問題2、

spark.sparkcontext:error initializiing

報錯總信息爲:sparkDriver could not bind on  port 0

具體摘要爲:

starting remoting

         java.net.BindException:Failed to bind to: /10.1.4.221:0:shutting down Netty transport

 Service 'sparkDriver' failed after 16 retries!

Master上報錯,slave2上未報這個錯誤

Slave2上的部分信息爲:

         Successfully started service ‘sparkDriver’ on port 42887

         Remoting  started listening on addresses: [akka.tcp://[email protected]:42887]

         *上面ip地址爲ip地址

都報的錯誤爲

         error  not  found: value sqlContext

解決辦法:

export  SPARK_LOCAL_IP=127.0.0.1

注意export 之後,僅對當前窗口有效,僅僅是臨時創建環境變量,最好加到$SPARK_HOME/conf/spark-env.sh

     猜測報錯原因:前幾天網線動了,所以ip變了?

解決上述問題的參考鏈接:

http://stackoverflow.com/questions/30085779/apache-spark-error-while-start

 

然後只報error not found: value sql Context的錯誤了

google後的相關信息爲:

Looks like your Spark config may be trying to log to an HDFS path. Can you review yourconfig settings? 

While reading a local file which is not in HDFS throughspark shell, does the HDFS need to be up and running ?

The data may be spilled off to disk hence HDFS is anecessity for Spark. 

You can run Spark on a single machine & not use HDFS but in distributed mode HDFS will be required. 

所以問題原因應該是:

這個應該是沒有啓動hadoop

         **突然忘了斷電後重啓過了

於是

bin/hadoopnamenode –format

sbin/start-dfs.sh

sbin/start-yarn.sh

然後再啓動spark

進入spark目錄,sbin/start-all.sh

然後再bin/spark-shell即成功

 





發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章