因爲regionserver 的管理信息主要記錄在zookeeper,regionserver的宕機判斷依據是session expired。ok
那麼regionserver 和Zookeeper的session expired原因有哪些尼?
1. 網絡不好。
2. Java full GC, 這會block所有的線程。如果時間比較長,也會導致session expired.
解決辦法:
1. 將Zookeeper的timeout時間加長。
2. 配置“hbase.regionserver.restart.on.zk.expire” 爲true。 這樣子,遇到ZooKeeper session expired , regionserver將選擇 restart 而不是 abort
具體的配置是,在hbase-site.xml中加入
<property>
<name>zookeeper.session.timeout</name>
<value>90000</value>
<description>ZooKeeper session timeout.
HBase passes this to the zk quorum as suggested maximum time for a
session. See http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions
“The client sends a requested timeout, the server responds with the
timeout that it can give the client. The current implementation
requires that the timeout be a minimum of 2 times the tickTime
(as set in the server configuration) and a maximum of 20 times
the tickTime.” Set the zk ticktime with hbase.zookeeper.property.tickTime.
In milliseconds.
</description>
</property>
<property>
<name>hbase.regionserver.restart.on.zk.expire</name>
<value>true</value>
<description>
Zookeeper session expired will force regionserver exit.
Enable this will make the regionserver restart.
</description>
</property>
3、爲了避免java full GC suspend thread 對Zookeeper heartbeat的影響,我們還需要對hbase-env.sh進行配置。
設置jvm的內存回收算法:
-XX:+CMSParallelRemarkEnabled。
如下所示:
export HBASE_OPTS="-Xms16g -Xmx16g -Xmn2g -Xss200k -XX:MaxNewSize=2g -XX:SurvivorRatio=2 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseConcMarkSweepGC -XX:+DisableExplicitGC -XX:+CMSParallelRemarkEnabled -XX:+UseFastAccessorMethods -XX:+UseParNewGC -XX:MaxPermSize=300m -XX:MaxTenuringThreshold=5 -XX:GCTimeRatio=19 -XX:ParallelGCThreads=10 -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0 -XX:-UseGCOverheadLimit "
最後啓動 regionserver:
命令:hbase-daemon.sh start regionserver
開啓balance命令:balance_switch true