【MapReduce】Streaming Job Failed!

報錯發生情況:

用Python寫好了一個MR程序,使用Linux環境本地測試正常。
在Hadoop環境上測試就報錯。

我的環境:

$hadoop version
Hadoop 2.5.2
...

執行指令:

hadoop jar $HADOOP_INSTALL_HOME/contrib/streaming/hadoop-*streaming*.jar   \
-file ./mapper.py -mapper ./mapper.py \
-file ./reducer.py -reducer ./reducer.py \
-input /data/poem/data_test \
-output /data/poem/result

報錯信息:

packageJobJar: [mapper.py, reducer.py] [] /tmp/streamjob4957099323859594325.jar tmpDir=null
17/04/13 15:10:52 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/04/13 15:10:53 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/04/13 15:10:56 INFO mapred.FileInputFormat: Total input paths to process : 2
17/04/13 15:10:56 INFO mapreduce.JobSubmitter: number of splits:2
17/04/13 15:10:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1492067422224_0001
17/04/13 15:10:57 INFO impl.YarnClientImpl: Submitted application application_1492067422224_0001
17/04/13 15:10:57 INFO mapreduce.Job: The url to track the job: http://chinahaoop0:8088/proxy/application_1492067422224_0001/
17/04/13 15:10:57 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-hadoop/mapred/local]
17/04/13 15:10:57 INFO streaming.StreamJob: Running job: job_1492067422224_0001
17/04/13 15:10:57 INFO streaming.StreamJob: Job running in-process (local Hadoop)
17/04/13 15:10:59 INFO streaming.StreamJob:  map 0%  reduce 0%
17/04/13 15:11:56 INFO streaming.StreamJob:  map 50%  reduce 0%
17/04/13 15:11:57 INFO streaming.StreamJob:  map 100%  reduce 0%
17/04/13 15:11:58 INFO streaming.StreamJob:  map 0%  reduce 0%
17/04/13 15:12:27 INFO streaming.StreamJob:  map 50%  reduce 0%
17/04/13 15:12:31 INFO streaming.StreamJob:  map 0%  reduce 0%
17/04/13 15:13:08 INFO streaming.StreamJob:  map 100%  reduce 0%
17/04/13 15:13:09 INFO streaming.StreamJob:  map 0%  reduce 0%
17/04/13 15:13:30 INFO streaming.StreamJob:  map 50%  reduce 0%
17/04/13 15:13:32 INFO streaming.StreamJob:  map 100%  reduce 0%
17/04/13 15:13:33 INFO streaming.StreamJob:  map 100%  reduce 100%
17/04/13 15:13:36 INFO streaming.StreamJob: Job running in-process (local Hadoop)
17/04/13 15:13:36 ERROR streaming.StreamJob: Job not Successful!
17/04/13 15:13:36 INFO streaming.StreamJob: killJob...
17/04/13 15:13:36 INFO impl.YarnClientImpl: Killed application application_1492067422224_0001
Streaming Job Failed!

找到日誌文件,發現具體報錯信息爲:

Error: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1937)
        at org.apache.hadoop.mapred.JobConf.getMapRunnerClass(JobConf.java:1125)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1905)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
        ... 8 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1811)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
        ... 9 more

報錯的關鍵信息是:

java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found

定位錯誤過程

1.MR腳本有誤:

本地測試的時候,腳本正常,排除此問題。

2.環境配置有誤:

使用hadoop的example jar包測試,正常。排除此問題。

3.jar包問題:

因爲提示ClassNotFund的異常,第一個時間就應該想到是jar包的問題。jar包可能與hadoop的版本不匹配。

最終處理:

我的jar包是在網上單獨下的,因爲根據網上大多數教程提供的路徑$HADOOP_INSTALL_HOME/contrib/streaming/hadoop-streaming.jar
最初我沒有找到相應的路徑,以爲需要自身去下載。

最後發現,hadoop 2.5.2中對應的jar包地址是在:

$HADOOP_INSTALL_HOME/share/hadoop/tools/lib

jar包

藏得有點兒太深了呀(′д` )…彡…彡!找了我半天!

重寫的執行語句:

hadoop jar $HADOOP_INSTALL_HOME/share/hadoop/tools/lib/hadoop-*streaming*.jar\
-file ./mapper.py -mapper ./mapper.py \
-file ./reducer.py  -reducer ./reducer.py \
-input /data/poem/data_test -output /data/poem/result

經驗總結:

  1. ClassNotFound異常,很有可能是jar包與hadoop環境不匹配。我的jar包太老了。像hadoop-streaming*.jar這類型的官方發佈基礎jar包,一般在裝軟件的時候都會自帶。
  2. 軟件不同的版本,其路徑很有可能有變化,需要靈活應變。(就連Centos7對比之前版本,許多命令都變了呢)
  3. 屏幕上打印的的異常信息,常常不是很詳細且精準。除了看屏幕上的錯誤信息以外,最好查看運行日誌,查看詳細的錯誤報告。
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章