報錯發生情況:
用Python寫好了一個MR程序,使用Linux環境本地測試正常。
在Hadoop環境上測試就報錯。
我的環境:
$hadoop version
Hadoop 2.5.2
...
執行指令:
hadoop jar $HADOOP_INSTALL_HOME/contrib/streaming/hadoop-*streaming*.jar \
-file ./mapper.py -mapper ./mapper.py \
-file ./reducer.py -reducer ./reducer.py \
-input /data/poem/data_test \
-output /data/poem/result
報錯信息:
packageJobJar: [mapper.py, reducer.py] [] /tmp/streamjob4957099323859594325.jar tmpDir=null
17/04/13 15:10:52 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/04/13 15:10:53 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/04/13 15:10:56 INFO mapred.FileInputFormat: Total input paths to process : 2
17/04/13 15:10:56 INFO mapreduce.JobSubmitter: number of splits:2
17/04/13 15:10:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1492067422224_0001
17/04/13 15:10:57 INFO impl.YarnClientImpl: Submitted application application_1492067422224_0001
17/04/13 15:10:57 INFO mapreduce.Job: The url to track the job: http://chinahaoop0:8088/proxy/application_1492067422224_0001/
17/04/13 15:10:57 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-hadoop/mapred/local]
17/04/13 15:10:57 INFO streaming.StreamJob: Running job: job_1492067422224_0001
17/04/13 15:10:57 INFO streaming.StreamJob: Job running in-process (local Hadoop)
17/04/13 15:10:59 INFO streaming.StreamJob: map 0% reduce 0%
17/04/13 15:11:56 INFO streaming.StreamJob: map 50% reduce 0%
17/04/13 15:11:57 INFO streaming.StreamJob: map 100% reduce 0%
17/04/13 15:11:58 INFO streaming.StreamJob: map 0% reduce 0%
17/04/13 15:12:27 INFO streaming.StreamJob: map 50% reduce 0%
17/04/13 15:12:31 INFO streaming.StreamJob: map 0% reduce 0%
17/04/13 15:13:08 INFO streaming.StreamJob: map 100% reduce 0%
17/04/13 15:13:09 INFO streaming.StreamJob: map 0% reduce 0%
17/04/13 15:13:30 INFO streaming.StreamJob: map 50% reduce 0%
17/04/13 15:13:32 INFO streaming.StreamJob: map 100% reduce 0%
17/04/13 15:13:33 INFO streaming.StreamJob: map 100% reduce 100%
17/04/13 15:13:36 INFO streaming.StreamJob: Job running in-process (local Hadoop)
17/04/13 15:13:36 ERROR streaming.StreamJob: Job not Successful!
17/04/13 15:13:36 INFO streaming.StreamJob: killJob...
17/04/13 15:13:36 INFO impl.YarnClientImpl: Killed application application_1492067422224_0001
Streaming Job Failed!
找到日誌文件,發現具體報錯信息爲:
Error: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1937)
at org.apache.hadoop.mapred.JobConf.getMapRunnerClass(JobConf.java:1125)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1905)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
... 8 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1811)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
... 9 more
報錯的關鍵信息是:
java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found
定位錯誤過程
1.MR腳本有誤:
本地測試的時候,腳本正常,排除此問題。
2.環境配置有誤:
使用hadoop的example jar包測試,正常。排除此問題。
3.jar包問題:
因爲提示ClassNotFund的異常,第一個時間就應該想到是jar包的問題。jar包可能與hadoop的版本不匹配。
最終處理:
我的jar包是在網上單獨下的,因爲根據網上大多數教程提供的路徑$HADOOP_INSTALL_HOME/contrib/streaming/hadoop-streaming.jar
最初我沒有找到相應的路徑,以爲需要自身去下載。
最後發現,hadoop 2.5.2中對應的jar包地址是在:
$HADOOP_INSTALL_HOME/share/hadoop/tools/lib
藏得有點兒太深了呀(′д` )…彡…彡!找了我半天!
重寫的執行語句:
hadoop jar $HADOOP_INSTALL_HOME/share/hadoop/tools/lib/hadoop-*streaming*.jar\
-file ./mapper.py -mapper ./mapper.py \
-file ./reducer.py -reducer ./reducer.py \
-input /data/poem/data_test -output /data/poem/result
經驗總結:
- ClassNotFound異常,很有可能是jar包與hadoop環境不匹配。我的jar包太老了。像hadoop-streaming*.jar這類型的官方發佈基礎jar包,一般在裝軟件的時候都會自帶。
- 軟件不同的版本,其路徑很有可能有變化,需要靈活應變。(就連Centos7對比之前版本,許多命令都變了呢)
- 屏幕上打印的的異常信息,常常不是很詳細且精準。除了看屏幕上的錯誤信息以外,最好查看運行日誌,查看詳細的錯誤報告。