hadoop HA standby無法切換爲active

將active namenode殺死,standby namenode進程無法自動切換爲active狀態,查看hadoop-hdfs-zkfc-ha-master01.log,出現如下錯誤信息
2020-01-14 01:07:58,346 ERROR org.apache.hadoop.ha.NodeFencer: Unable to fence service by any configured method.
2020-01-14 01:07:58,346 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
java.lang.RuntimeException: Unable to fence NameNode at ha-master02/172.17.0.2:8020
	at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:538)
	at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:510)
	at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
	at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:933)
	at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:991)
	at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
	at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:607)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)
2020-01-14 01:07:58,346 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
2020-01-14 01:07:58,349 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Caught an exception, leaving main loop due to Socket closed
2020-01-14 01:07:58,355 INFO org.apache.zookeeper.ZooKeeper: Session: 0x3000239cac1000c closed
2020-01-14 01:07:59,356 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=ha-slave01:2181,ha-slave02:2181,ha-slave03:2181 sessionTimeout=2000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@501129f3
2020-01-14 01:07:59,365 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ha-slave02/172.17.0.4:2181. Will not attempt to authenticate using SASL (unknown error)
2020-01-14 01:07:59,367 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ha-slave02/172.17.0.4:2181, initiating session
2020-01-14 01:07:59,373 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server ha-slave02/172.17.0.4:2181, sessionid = 0x2000239ca930009, negotiated timeout = 4000
2020-01-14 01:07:59,374 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down for session: 0x3000239cac1000c
2020-01-14 01:07:59,376 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
2020-01-14 01:07:59,381 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...
2020-01-14 01:07:59,382 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a0a746573746c757374657212036e6e321a0b68612d6d6173746572303220d43e28d33e
2020-01-14 01:07:59,382 INFO org.apache.hadoop.ha.ZKFailoverController: Should fence: NameNode at ha-master02/172.17.0.2:8020
2020-01-14 01:08:00,384 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ha-master02/172.17.0.2:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
2020-01-14 01:08:00,384 WARN org.apache.hadoop.ha.FailoverController: Unable to gracefully make NameNode at ha-master02/172.17.0.2:8020 standby (unable to connect)
java.net.ConnectException: Call From ha-master01/172.17.0.1 to ha-master02:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at sun.reflect.GeneratedConstructorAccessor29.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:824)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:754)
	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1495)
	at org.apache.hadoop.ipc.Client.call(Client.java:1437)
	at org.apache.hadoop.ipc.Client.call(Client.java:1347)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
	at com.sun.proxy.$Proxy9.transitionToStandby(Unknown Source)
	at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToStandby(HAServiceProtocolClientSideTranslatorPB.java:112)
	at org.apache.hadoop.ha.FailoverController.tryGracefulFence(FailoverController.java:172)
	at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:520)
	at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:510)
	at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
	at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:933)
	at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:991)
	at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
	at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:607)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:685)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:788)
	at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:409)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1552)
	at org.apache.hadoop.ipc.Client.call(Client.java:1383)
	... 15 more

由於dfs.ha.fencing.methods參數的value是sshfence,需要使用的fuser命令;所以通過如下命令安裝一下即可,兩個namenode節點都需要安裝

yum -y install psmisc
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章