當用pyspark在CDH的yarn集羣上運行時,用pipe管道調用bash腳本若遇到如下問題
"/usr/lib64/python2.7/subprocess.py", line 1234, in _execute_child raise child_exception OSError:
[Errno 13] Permission denied
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38
解決:
遇到該問題首先想到應該是沒有執行權限。
給bash腳本添加執行權限,
chmod +x xx.sh命令
重新提交spark任務,如若還有該問題,則可能該腳本還需要可讀或者可寫 則設置該腳本所在的目錄src權限,
chmod 777 -R src