hbase故障分析之-regionserver突然宕機啓動後也是宕機

近日發現測試環境中的RegionServer總是突然宕機,重新啓動節點依然無效,無耐看了半天日誌發現如下信息:
2015-02-13 05:40:04,325 WARN  [regionserver60020] zookeeper.RecoverableZooKeeper: Node /hbase/rs/slave2,60020,1423777199540 already deleted, retry=false
2015-02-13 05:40:04,325 WARN [regionserver60020] regionserver.HRegionServer: Failed deleting my ephemeral node
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/rs/slave2,60020,1423777199540
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:179)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1273)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1262)
at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1342)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1054)
at java.lang.Thread.run(Thread.java:745)
2015-02-13 05:40:04,329 INFO [regionserver60020-EventThread] zookeeper.ClientCnxn: EventThread shut down
2015-02-13 05:40:04,329 INFO [regionserver60020] zookeeper.ZooKeeper: Session: 0x14b7113ebc50012 closed
2015-02-13 05:40:04,329 INFO [regionserver60020] regionserver.HRegionServer: stopping server null; zookeeper connection closed.
2015-02-13 05:40:04,330 INFO [regionserver60020] regionserver.HRegionServer: regionserver60020 exiting


找了半天問題任然沒有解決,無頭緒中。。。。

喝杯茶,繼續往上翻,突然發現救命稻草:
2015-02-13 05:40:04,294 FATAL [regionserver60020] [color=red]regionserver.HRegionServer: Master rejected startup because clock is out of sync
org.apache.hadoop.hbase.ClockOutOfSyncException: [/color]org.apache.hadoop.hbase.ClockOutOfSyncException: Server slave2,60020,1423777199540 has been rejected; Reported time is too far out of sync with master. Time difference of 71419ms > max allowed of 30000ms
at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:345)
at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:238)
at org.apache.hadoop.hbase.master.HMaster.regionServerStartup(HMaster.java:1294)
at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:7910)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:74)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


問題找到了,由於是服務器Master的時間和RegionServer的時間不一致,沒有裝時間同步服務,導致此問題發生。
手動修改下RegionServer的時間 data -s 時間 ,重啓RegionServer問題解決。

下一步需要在測試環境也安裝NTP服務。
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章