Hadoop集羣寫數據異常--timeout while waiting for channel to be ready for write

今天kylin寫數據到HBase到HDFS是報錯:
timeout while waiting for channel to be ready for write
具體異常如下:

2019-07-05 11:18:10,862 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.146.127, datanodeUuid=cbb24ad6-04ff-
4cd2-ae11-f4389055bac5, infoPort=50075, ipcPort=50020, storageInfo=lv=-55;cid=CID-7e94c25f-298b-4867-a117-f20f384eaef3;nsid=2061526458;c=0):Got
 exception while serving BP-535123581-192.168.136.54-1532154474227:blk_1074454358_713534 to /192.168.146.126:54066
java.net.SocketTimeoutException: 1200000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[c
onnected local=/192.168.146.127:50010 remote=/192.168.146.126:54066]
        at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
        at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
        at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:508)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:234)
        at java.lang.Thread.run(Thread.java:745)
2019-07-05 11:18:10,862 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: hn146127.olap-hbase.data.m.com:50010:DataXceiver error processin
g READ_BLOCK operation  src: /192.168.146.126:54066 dst: /192.168.146.127:50010
java.net.SocketTimeoutException: 1200000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[c
onnected local=/192.168.146.127:50010 remote=/192.168.146.126:54066]
        at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
        at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
        at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:508)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:234)
        at java.lang.Thread.run(Thread.java:745)

發現datanode節點都會抱這個錯誤,但是我們集羣的超時時間1200000s.我覺得參數沒有問題。後來測試了一下 hadoop put xxx /tmp 發現同樣報這個異常,所以定位是HDFS寫數據異常,重啓DN節點即可解決。主要原因可能是我直接下線了部分DN節點導致的。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章