hbase regionserver異常退出

2017-09-23 09:20:54,223 WARN  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 28836ms
No GCs detected
2017-09-23 09:20:54,250 INFO  [regionserver60020-SendThread(bis-hadoop-datanode-s-01:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 40327ms for sessionid 0x505e897ce7f900e8, closing socket connection and attempting reconnect
2017-09-23 09:20:54,237 WARN  [regionserver60020] util.Sleeper: We slept 31841ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2017-09-23 09:20:54,238 INFO  [regionserver60020-SendThread(bis-backup-s-01:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 41588ms for sessionid 0x875e9aa7d1a10050, closing socket connection and attempting reconnect
2017-09-23 09:20:54,238 INFO  [regionserver60020-SendThread(bis-hadoop-namenode-s-01:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 37261ms for sessionid 0x525cd4f74e5cfcab, closing socket connection and attempting reconnect
2017-09-23 09:20:54,237 WARN  [regionserver60020.periodicFlusher] util.Sleeper: We slept 29502ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2017-09-23 09:21:01,815 INFO  [regionserver60020-SendThread(bis-hadoop-namenode-s-01:2181)] zookeeper.ClientCnxn: Opening socket connection to server bis-hadoop-namenode-s-01/10.10.10.82:2181. Will not attempt to authenticate using SASL (unknown error)
2017-09-23 09:21:01,816 INFO  [regionserver60020-SendThread(bis-hadoop-datanode-s-01:2181)] zookeeper.ClientCnxn: Opening socket connection to server bis-hadoop-datanode-s-01/10.10.10.80:2181. Will not attempt to authenticate using SASL (unknown error)
2017-09-23 09:21:01,816 INFO  [regionserver60020-SendThread(bis-hadoop-namenode-s-01:2181)] zookeeper.ClientCnxn: Socket connection established to bis-hadoop-namenode-s-01/10.10.10.82:2181, initiating session
2017-09-23 09:21:01,816 INFO  [regionserver60020-SendThread(bis-hadoop-datanode-s-01:2181)] zookeeper.ClientCnxn: Socket connection established to bis-hadoop-datanode-s-01/10.10.10.80:2181, initiating session

2017-09-23 09:21:01,966 WARN  [regionserver60020-EventThread] client.HConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, closing it. It will be recreated next time someone needs it
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:401)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:319)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
2017-09-23 09:21:02,006 FATAL [regionserver60020-EventThread] regionserver.HRegionServer: ABORTING region server bis-hadoop-datanode-s2d-129,60020,1506121204494: regionserver:60020-0x525cd4f74e5cfcab, quorum=bis-backup-s-01:2181,bis-hadoop-namenode-s-01:2181,bis-hadoop-datanode-s-01:2181, baseZNode=/hbase regionserver:60020-0x525cd4f74e5cfcab received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:401)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:319)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)



分析: gc回收太長,花了28836ms,這段時間內,所有線程被阻塞,導致zk客戶端與服務端連接超時。
從兩個方面解決:
1、將zk的timeout時間加長
<property>
    <name>zookeeper.session.timeout</name>
    <value>120000</value>
</property>
2、避免gc對zk影響,在hbase-env.sh中調整HBASE_REGIONSERVER_OPTS值:
export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Xmx6000m -Xms6000m -Xmn2250m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=70 -Xloggc:/home/hadoop/hbase-0.96.2-hadoop2/logs/gc.log"

重啓集羣。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章