半夜手賤,調整了下集羣配置,發現yarn掛了,nodemanager一直起不來,查了下log是個null pointer,沒看出來撒,結果google到這玩意。
https://issues.apache.org/jira/browse/YARN-2816
https://sskaje.me/2014/11/yarn-nodemanager-failed-start/
原來。。。
And, in the start-up message part,
2014-10-30 21:23:07,141 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: registered UNIX signal handlers for [TERM, HUP, INT]
2014-10-30 21:23:08,259 INFO org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService: Using state database at /tmp/hadoop-yarn/yarn-nm-recovery/yarn-nm-state for recover
2014-10-30 21:23:08,291 INFO org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService$LeveldbLogger: Recovering log #432
2014-10-30 21:23:08,309 INFO org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService$LeveldbLogger: Delete type=0 #432
2014-10-30 21:23:08,309 INFO org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService$LeveldbLogger: Delete type=3 #431
2014-10-30 21:23:08,321 INFO org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService: Loaded NM state version info 1.0
The solution is, stop the instance, delete ‘/tmp/hadoop-yarn/’ from local filesystem, start the instance.
將每個namenode下的這個目錄都刪除後,終於恢復了,可以睡覺了。。。