Hadoop 坑爹的Be Replicated to 0 nodes, instead of 1 異常

原創

2018-09-05 03:01

原文地址：http://dongyajun.iteye.com/blog/1039836

新的項目上線之後發現，有些會員上傳資源到我們集羣的速度，既然跟我們集羣的吞吐量差不多，達到了70M+/s的速度。在向集羣put數據時，拋出了異常：

 org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /xxx/xxx/xx could only be replicated to 0 nodes, instead of 1

org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /xxx/xxx/xx could only be replicated to 0 nodes, instead of 1

這樣的信息告訴我，集羣內部無可用的節點了，因爲是在put階段出現的，直覺告訴我，所有的節點是不是都已經寫滿數據了？

節點少的情況比較容易觀察Hadoop的一些問題，查看dfshealth.jsp頁面發現，至少有三臺節點可寫，但dfsClient put數據仍然拋出無節點可用的異常。

追究源碼，NameNode身邊的 ReplicationTargetChooser#isGoodTarget方法給出了說明：

1.// check the communication traffic of the target machine  
2.    if (considerLoad) {  
3.      double avgLoad = 0;  
4.      int size = clusterMap.getNumOfLeaves();  
5.      if (size != 0) {  
6.        avgLoad = (double)fs.getTotalLoad()/size;  
7.      }  
8.      if (node.getXceiverCount() > (2.0 * avgLoad)) {  
9.        logr.debug("Node "+NodeBase.getPath(node)+  
10.                  " is not chosen because the node is too busy");  
11.        return false;  
12.      }  
13.}

isGoodTarget方法對預選的數據節點做出了終審判決，然而除了磁盤空間可利用外，另外需穩定在一定的壓力之下，這裏的標準是Datanode中XceiverServer所接受的連接數，我們在使用Hadoop時，這個值很容易被忽略，因爲這個值不方便被統計到。上段代碼說明當前節點的連接數，不得大於集羣所有節點平均連接數的兩倍。爲了使我的系統儘量獨力，我在dfshealth.jsp 頁面把每臺節點的連接數打印了出來，結果發現正好符合上述代碼的判斷。

比如ReplicationTargetChooser選擇了node13，那麼即使node13有大片的空間可寫，最終也會被上述代碼認爲是一個不符合條件的節點。

157 > ((27 + 45 + 44 + 54 + 35 + 50 + 104 + 55 + 73 + 69 + 157 + 146)/12 * 2)

 157 > ((27 + 45 + 44 + 54 + 35 + 50 + 104 + 55 + 73 + 69 + 157 + 146)/12 * 2)

這樣的異常，一般解決辦法是添加節點，或是在節點允許的情況下，對這段算法進行上調。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Hadoop 坑爹的Be Replicated to 0 nodes, instead of 1 異常

Nginx R31 doc-13-Limiting Access to Proxied HTTP Resources 訪問限流

公司剛入職了一名 Java 中級開發，短短 4 行代碼居然湊齊了 3 個 bug！我哭了~~

python包：pandas

中外程序員到底有啥區別？

Python數據分析與挖掘實戰（5章）

一、什麼是Docker

C++文件/流

二、Docker 組件

揹包九講一 01揹包

今天！通義靈碼在北京、成都、杭州三城開講啦

Improving DataSet Serialization and Remoting Performance

B/S開發中,用得比較多的在線編輯器

大數據量(16M)的DataSet壓縮方法比較

WebService傳輸DataSet的一點想法和實踐

由捕獲到的數據包重組html頁面技術

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結