版本:HDP3.0
mapreduce提交任務計算時,job已經結束,但是容器仍不能關閉持續等待五分鐘
INFO[Thread-100] org.apache.hadoop.yarn.event.AsyncDispatcher:Waiting for AsyncDispatcher to drain.Thread state is :WAITING
INFO[Thread-100] org.apache.hadoop.yarn.event.AsyncDispatcher:Waiting for AsyncDispatcher to drain.Thread state is :WAITING
INFO[Thread-100] org.apache.hadoop.yarn.event.AsyncDispatcher:Waiting for AsyncDispatcher to drain.Thread state is :WAITING
INFO[Thread-100] org.apache.hadoop.yarn.event.AsyncDispatcher:Waiting for AsyncDispatcher to drain.Thread state is :WAITING
INFO[Thread-100] org.apache.hadoop.yarn.event.AsyncDispatcher:Waiting for AsyncDispatcher to drain.Thread state is :WAITING
INFO[Thread-100] org.apache.hadoop.yarn.event.AsyncDispatcher:Waiting for AsyncDispatcher to drain.Thread state is :WAITING
INFO[Thread-100] org.apache.hadoop.yarn.event.AsyncDispatcher:Waiting for AsyncDispatcher to drain.Thread state is :WAITING
五分鐘後拋出異常:
org.apache.hadoop.yarn.exceptions.YarnException:Failed while publishing entity
...
Cause By :com.sun.jersey.api.client.ClientHandlerException:java.net.SocketTimeoutException:Read timed out
...
Cause By :java.net.SocketTimeoutException:Read timed out
發生這種情況是因爲來自ATSv2的嵌入式HBASE崩潰。
解決這個問題的方法需要重置ATsv2內嵌HBASE數據庫
1.停止Yarn服務
Ambari -> Yarn-Actions -> Stop
2.刪除Zookeeper上的ATSv2 Znode
zookeeper-client -server zookeeper-quorum-servers
rmr /atsv2-hbase-unsecure或rmr /atsv2-hbase-secure(如果是kerberized集羣)
3.從HDFS移動Hbase時間線服務器Hbase嵌入式數據庫
hdfs dfs -mv /atsv2/hbase/tmp/
4.開始使用yarn服務
Ambari - > Yarn-Actions- > Start
再次重新提交任務,發現程序正常,問題解決