今天kylin寫數據到HBase到HDFS是報錯:
timeout while waiting for channel to be ready for write
具體異常如下:
2019-07-05 11:18:10,862 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.146.127, datanodeUuid=cbb24ad6-04ff-
4cd2-ae11-f4389055bac5, infoPort=50075, ipcPort=50020, storageInfo=lv=-55;cid=CID-7e94c25f-298b-4867-a117-f20f384eaef3;nsid=2061526458;c=0):Got
exception while serving BP-535123581-192.168.136.54-1532154474227:blk_1074454358_713534 to /192.168.146.126:54066
java.net.SocketTimeoutException: 1200000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[c
onnected local=/192.168.146.127:50010 remote=/192.168.146.126:54066]
at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:508)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:234)
at java.lang.Thread.run(Thread.java:745)
2019-07-05 11:18:10,862 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: hn146127.olap-hbase.data.m.com:50010:DataXceiver error processin
g READ_BLOCK operation src: /192.168.146.126:54066 dst: /192.168.146.127:50010
java.net.SocketTimeoutException: 1200000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[c
onnected local=/192.168.146.127:50010 remote=/192.168.146.126:54066]
at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:508)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:234)
at java.lang.Thread.run(Thread.java:745)
發現datanode節點都會抱這個錯誤,但是我們集羣的超時時間1200000s.我覺得參數沒有問題。後來測試了一下 hadoop put xxx /tmp 發現同樣報這個異常,所以定位是HDFS寫數據異常,重啓DN節點即可解決。主要原因可能是我直接下線了部分DN節點導致的。