半夜手贱,调整了下集群配置,发现yarn挂了,nodemanager一直起不来,查了下log是个null pointer,没看出来撒,结果google到这玩意。
https://issues.apache.org/jira/browse/YARN-2816
https://sskaje.me/2014/11/yarn-nodemanager-failed-start/
原来。。。
And, in the start-up message part,
2014-10-30 21:23:07,141 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: registered UNIX signal handlers for [TERM, HUP, INT]
2014-10-30 21:23:08,259 INFO org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService: Using state database at /tmp/hadoop-yarn/yarn-nm-recovery/yarn-nm-state for recover
2014-10-30 21:23:08,291 INFO org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService$LeveldbLogger: Recovering log #432
2014-10-30 21:23:08,309 INFO org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService$LeveldbLogger: Delete type=0 #432
2014-10-30 21:23:08,309 INFO org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService$LeveldbLogger: Delete type=3 #431
2014-10-30 21:23:08,321 INFO org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService: Loaded NM state version info 1.0
The solution is, stop the instance, delete ‘/tmp/hadoop-yarn/’ from local filesystem, start the instance.
将每个namenode下的这个目录都删除后,终于恢复了,可以睡觉了。。。