在命令行裏能夠將程序運行在hadoop集羣環境後,將Eclipse裏的各種配置也相應配好,點擊run on hadoop。
作業成功運行,hdfs上能夠看到結果,但是仍然,沒有提交至真正的集羣環境。
查了好久資料,直接在代碼中指定遠程jobtracker地址,仍然未果。
於是在Eclipse裏調試程序,運行成功後打成jar包上傳至hadoop集羣中運行:
直接export,保證jar文件的META-INF/MANIFEST.MF文件中存在Main-Class映射:
Main-Class: WordCount
其實直接next自動文件裏就有這個關係。
將打好的jar上傳至服務器,假設在/opt目錄下,則命令:
hadoop jar /opt/myWordCount.jar WordCount /test_in /output12
報錯:
xception in thread "main" java.lang.UnsupportedClassVersionError: WordCount : Unsupported major.minor version 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.main(RunJar.java:205)
網上查資料,懷疑是java版本不同導致,win7上的Eclipse是java1.8.而服務器上的是java1.7
在Eclipse裏面 windows--preference--java--compile--compile level,選擇1.7
重新導入運行
出現錯誤:
14/11/07 10:33:46 INFO ipc.Client: Retrying connect to server: hadoop-05/192.168.0.7:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
14/11/07 10:33:47 INFO ipc.Client: Retrying connect to server: hadoop-05/192.168.0.7:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
14/11/07 10:33:48 INFO ipc.Client: Retrying connect to server: hadoop-05/192.168.0.7:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
14/11/07 10:33:49 INFO ipc.Client: Retrying connect to server: hadoop-05/192.168.0.7:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
14/11/07 10:33:50 INFO ipc.Client: Retrying connect to server: hadoop-05/192.168.0.7:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
14/11/07 10:33:51 INFO ipc.Client: Retrying connect to server: hadoop-05/192.168.0.7:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
14/11/07 10:33:52 INFO ipc.Client: Retrying connect to server: hadoop-05/192.168.0.7:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
resourcemanager連不上。檢查yarn-site.xml都配置好了
但是發現端口號與默認的端口號不一致,於是修改
配置文件改爲如下:
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8031</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>192.168.0.7</value>
</property>
重新運行,仍然出現同樣錯誤,於是將代碼中顯式指定的job.tracker註釋掉。
竟然又出現錯誤:
Usage: wordcount <in> <out>
檢查代碼,發現這是因爲輸入參數不是兩個而導致。但是檢查了命令沒有發現錯誤,只能將路徑寫死在程序中,再打jar包
FileInputFormat.addInputPath(job, new Path("hdfs://192.168.0.7:9000/test_in"));
FileOutputFormat.setOutputPath(job, new Path("hdfs://192.168.0.7:9000/out1"));
提交至hadoop集羣,結果出來了。
但是還是沒有想通爲什麼路徑寫在外面不可以。先記錄 mark下