Hadoop及HBase使用過程中的一些問題集

 

本文是我在使用Hbase的過程碰到的一些問題和相應的解決方法,現將這些經過總結分析,以免日後忘記。

hadoop新增節點

hadoop集羣要增加3個節點。在3臺主機分別配置了host、與集羣內所有機器的ssh登錄互信、jdk1.7 等。所有配置、目錄等與集羣其他機器保持一致。把安裝文件拷貝到3臺主機後,同時將修改後的配置文件分發到三臺主機相應目錄中。修改namenode的slaves文件,增加三臺主機的host,然後分發到所有datanode。

在三臺主機中分別執行:
yarn-daemon.sh start nodemanager
hadoop-daemon.sh start datanode

兩個命令分別啓動nodemanager、datanode 。

再執行數據分佈負載均衡,start-balancer.sh -threshold 5

保證hadoop的文件是平均分佈在各個節點中。

Hbase 排錯

INFO client.HConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, will automatically reconnect when needed.

region server的超時時間需要設置爲以下,防止FULL GC,帶來的zookeeper超時。
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=70

<property>
<name>hbase.regionserver.lease.period</name>
<value>240000</value>
</property>
<property>
<name>hbase.rpc.timeout</name>
<value>280000</value>
</property>

集羣假死,日誌中出現zk通信超時的情況,不都是zk出現了問題,下面這兩個問題在表現上是zk超時,實則是hbase或hadoop出現了問題。

1、region server 所分配的內存堆過小,hb的數據量佔用空間量大的情況下
在系統的profile中,將此參數設置大一些:
export HBASE_HEAPSIZE=30720

在hb的hbase-env.sh中
# The maximum amount of heap to use, in MB. Default is 1000.
# export HBASE_HEAPSIZE=1000

2、zk是最大連接數過小
默認值是300,在查詢量與記錄數量特大的集羣中,可能會造成相互間通信連接過多從而拒絕連接服務。
<property>
<name>hbase.zookeeper.property.maxClientCnxns</name>
<value>15000</value>
</property>

3、HRegionServer啓動不正常
在namenode上執行jps,則可看到hbase啓動是否正常,進程如下:
[root@master bin]# jps
26341 HMaster
26642 Jps
7840 ResourceManager
7524 NameNode
7699 SecondaryNameNode

由上可見,hadoop啓動正常。HBase少了一個進程,猜測應該是有個節點regionserver沒有啓動成功。

進入節點slave1 ,執行jps查看啓動進程:
[root@master bin]# ssh slave1
Last login: Thu Jul 17 17:29:11 2014 from master
[root@slave1 ~]# jps
4296 DataNode
11261 HRegionServer
11512 Jps
11184 QuorumPeerMain

由此可見Slave1節點正常。進入節點slave2節點,執行jps查看啓動進程:
[root@slave2 ~]# jps
3795 DataNode
11339 Jps
11080 QuorumPeerMain

OK,問題找到了 HRegionServer沒有啓動成功。進入HBase日誌:

2014-07-17 09:28:19,392 INFO  [regionserver60020] regionserver.HRegionServer: STOPPED: Unhandled: org.apache.hadoop.hbase.ClockOutOfSyncException: Server slave2,60020,1405560498057 has been rejected; Reported time is too far out of sync with master.  Time difference of 28804194ms > max allowed of 30000ms
at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:314)
at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:215)
at org.apache.hadoop.hbase.master.HMaster.regionServerStartup(HMaster.java:1292)
at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5085)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2185)
at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1889)

根據錯誤日誌,可得到slave2和maste機器時間差太多,查看各個系統的時間,果真如此,同步即可。另外一種方法就是配置hbase的配置文件:

配置:hbase.master.maxclockske

<property>
 <name>hbase.master.maxclockskew</name>
 <value>200000</value>
 <description>Time difference of regionserver from master</description>
</property>

(這種方法不推薦)

4、Zookeeper啓動不正常
在啓動hbase時,總是報錯,提示zookeeper連接不上,查看zookeeper日誌,發現:
ClientCnxn$SendThread@966] - Opening socket connection to server slave1. Will not attempt to authenticate using SASL (無法定位登錄配置)。經過百度可得

由於hosts文件的問題,於是vi /etc/hosts 發現 ip slave1配置中ip錯誤。汗!幸虧hbase和zookeeper都有日誌。於是重啓zookeeper和hbase,上述問題解決。

5、HBase shell執行list命令報錯
在Hbase shell執行list命令報錯:
...
實在太長

關鍵錯誤信息:client.HConnectionManager$HConnectionImplementation: Can't get connection to ZooKeeper: KeeperErrorCode = ConnectionLoss for /hbase。根據信息可以判斷zk無法連接。執行jps查看zk都正常。查看hbase-site.xml中zk節點配置正常。根據經驗,應該是防火牆沒有關閉,2181端口無法訪問。ok執行service iptables stop關閉防火牆,重啓hbase。進入hbase shell,執行list:
hbase(main):001:0> list
TABLE
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/hadoop/hbase/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2014-07-17 14:06:26,013 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
0 row(s) in 1.0070 seconds

=> []

一切正常,問題解決。

6、HBase Shell 增刪改異常
在hbase shell上做增刪改就會報異常:
zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect。
經判斷是hbase版本的jar包和hadoop中的jar包不兼容的問題。解決方法:將hadoop中hadoop-2.2.0相關的jar包copy過來(${HABASE_HOME}/lib)替換即可。

7、hb下面的regionserver全部掉線
不是zk的問題 是hdfs的問題
does not have any open files.   提高datanode節點間的最大傳輸數dfs.datanode.max.transfer.threads

EndOfStreamException: Unable to read additional data from client sessionid 0x4f6ce1baef1cc5, likely client has closed socket
    at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
    at java.lang.Thread.run(Thread.java:662)
2015-09-09 12:00:04,636 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /172.16.0.175:39889 which had sessionid 0x4f6ce1baef1cc5
2015-09-09 12:00:07,570 WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x4f6ce1baef1d4f, likely client has closed socket
    at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
    at java.lang.Thread.run(Thread.java:662)
2015-09-09 13:19:20,232 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /172.16.0.161:55772
2015-09-09 13:19:20,274 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /172.16.0.161:55772
2015-09-09 13:19:20,276 INFO  [CommitProcessor:0] server.ZooKeeperServer: Established session 0x4f6ce1baef1f96 with negotiated timeout 90000 for client /172.16.0.161:55772
2015-09-09 13:20:21,000 INFO  [SessionTracker] server.ZooKeeperServer: Expiring session 0x24f6ce1bd0f207c, timeout of 90000ms exceeded
2015-09-09 13:20:21,000 INFO  [ProcessThread(sid:0 cport:-1):] server.PrepRequestProcessor: Processed session termination for sessionid: 0x24f6ce1bd0f207c
2015-09-09 13:24:21,037 WARN [QuorumPeer[myid=0]/0.0.0.0:2181] quorum.LearnerHandler: Closing connection to peer due to transaction timeout.
2015-09-09 13:24:21,237 WARN [LearnerHandler-/192.168.40.35:56545] quorum.LearnerHandler: ****** GOODBYE /192.168.40.35:56545 ******
2015-09-09 13:24:21,237 WARN [LearnerHandler-/192.168.40.35:56545] quorum.LearnerHandler: Ignoring unexpected exception

第二種可能情況

hadoop的 namenode重新格式化以後,重啓hbase,發現它的hmaster進程啓動後馬上消失,查看一大堆日誌,最後在zookeeper的日誌裏發現如下問題:
Unable to read additional data from client sessionid 0x14e86607c850007, likely client has closed socket

解決方法:刪除掉hbase的hbase-site.xml中一下內容所配置路徑下的目錄,重啓zookeeper集羣,再重啓hbase讓該目錄重新生成即可。

<property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/home/freeoa/zookeeper/data</value>
</property>

hbase.zookeeper.property.dataDir:這個是ZooKeeper配置文件zoo.cfg中的dataDir。zookeeper存儲數據庫快照的位置。

8、設置分數少於3時datanode節點失敗導致hdfs寫失敗
任務量大,集羣節點少(4個DN),集羣不斷出現問題,導致收集數據出現錯誤,以致數據丟失。出現數據丟失,最先拿來開刀的就是數據收集,先看看錯誤日誌:

Caused by: java.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-<br />
on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[10.0.2.163:50010, 10.0.2.164:50010], original=[10.0.2.163:50010, 10.0.2.164:50010])<br />
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:817)<br />
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:877)<br />
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:983)<br />
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:780)<br />
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
    
錯誤:
Failed to add a datanode. User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT

從日誌上看是說添加DN失敗,需要關閉dfs.client.block.write.replace-datanode-on-failure.policy特性。但是我沒有添加節點啊?看來問題不是這麼簡單。

通過查看官方配置文檔對上述的參數配置:
參數 默認值 說明
dfs.client.block.write.replace-datanode-on-failure.enable
true     
If there is a datanode/network failure in the write pipeline, DFSClient will try to remove the failed datanode from the pipeline and then continue writing with the remaining datanodes. As a result, the number of datanodes in the pipeline is decreased. The feature is to add new datanodes to the pipeline. This is a site-wide property to enable/disable the feature. When the cluster size is extremely small, e.g. 3 nodes or less, cluster administrators may want to set the policy to NEVER in the default configuration file or disable this feature. Otherwise, users may experience an unusually high rate of pipeline failures since it is impossible to find new datanodes for replacement. See also dfs.client.block.write.replace-datanode-on-failure.policy

dfs.client.block.write.replace-datanode-on-failure.policy      
DEFAULT      
This property is used only if the value of dfs.client.block.write.replace-datanode-on-failure.enable is true. ALWAYS: always add a new datanode when an existing datanode is removed. NEVER: never add a new datanode. DEFAULT: Let r be the replication number. Let n be the number of existing datanodes. Add a new datanode only if r is greater than or equal to 3 and either (1) floor(r/2) is greater than or equal to n; or (2) r is greater than n and the block is hflushed/appended.

來自:https://hadoop.apache.org/docs/current2/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

然後尋找源碼位置在dfsclient中,發現是客戶端在pipeline寫數據塊時候的問題,也找到了兩個相關的參數:
dfs.client.block.write.replace-datanode-on-failure.enable
dfs.client.block.write.replace-datanode-on-failure.policy

前者是,客戶端在寫失敗的時候,是否使用更換策略,默認是true沒有問題。
後者是,更換策略的具體細節,默認是default。

default在3個或以上備份的時候,是會嘗試更換結點次數??次datanode。而在兩個備份的時候,不更換datanode,直接開始寫。由於我的節點只有4個,當集羣負載太高的時候,同時兩臺以上DN沒有響應,則出現HDFS寫的問題。當集羣比較小的時候我們可以關閉這個特性。

9、hbase啓動過程排錯
在進行災難測試時,將其中的一臺hdfs及region server直接進行了重啓,那麼如何在其啓動之後來啓動之前的服務呢。
啓動hdfs:
sbin/hadoop-daemon.sh start datanode

啓動hbase regionserver:
bin/hbase-daemon.sh  start regionserver

但通過jps查看時,發現沒有HQuorumPeer進程。經過在網上查證,說可能是服務器時間有較大的'漂移',重新對時後,發現快了30s。重新同步後,再重啓hbase服務,發現該進程依然沒有起來。

看來只有重啓整個hbase集羣。
hadoop@xdm:/usr/local/hbase$ bin/stop-hbase.sh 
stopping hbase...................
xd1: no zookeeper to stop because no pid file /tmp/hbase-hadoop-zookeeper.pid
xd0: stopping zookeeper.
xd2: stopping zookeeper.

原來這個進程是zookeeper的。

bash bin/hbase-daemon.sh start zookeeper

進程HQuorumPeer沒有啓動;
原因:/hbase/conf/hbase-site.xml文件有關hbase.zookeeper.property.dataDir配置的目錄不存在
解決方案:hbase.zookeeper.property.dataDir屬性對應的value值/home/grid/zookeeper,在本地建立/home/grid/zookeeper目錄

Hadoop一般通過主機名來與集羣中的節點進行通信,因此我們要將所有節點的ip與主機名映射關係寫入到/etc/hosts中來保證我們的通信正常。如果有的朋友在配置文件中使用了ip後操作不正常請修改成對應的主機名即可。hadoop與hbase內核版本不兼容問題,因爲hbase的lib目錄下的hadoop的包比我安裝的0.20.2的版本要新,需要用0.20.2的hadoop包替換,這點官網的文檔是有說明的。

如果你使用的是cloudera公司的定製版hadoop和hbase那麼就免去了替換jar的過程,因爲cloudera公司已經把所有的兼容性問題都解決了。

10、Hadoop新加datanode節點時碰到的一些問題
在所有機器中加入新的主機名到/etc/hosts文件中

將主節點的ssh-key加入到新機器中
ssh-copy-id 192.168.0.83

將新節點的主機名加入到slaves中(新加一行):
vim /usr/local/hadoop/etc/hadoop/slaves

重新同步配置文件到所有節點:
for i in {80..83};do rsync -av /usr/local/hadoop/etc/ 192.168.0.$i:/usr/local/hadoop/etc/;done

注意:對於所有非新加節點不用重啓,只需要在新加節點上開啓datanode,在開啓成功後,要開啓平衡腳本(/sbin/start-balancer.sh)。讓其從其它節點同步數據過來,相應的其它節點所佔用的系統空間會減少。

還需要注意的是:數據節點之間的同步帶寬默認只有1M,這個對有大量數據的集羣來說實在太少了,可在新加節點上修改爲10M:
<property>
<name>dfs.balance.bandwidthPerSec</name>
<value>10485760</value>
</property>

開啓datanode後,master的日誌有如下:
2015-10-19 15:26:31,738 WARN BlockStateChange: BLOCK* addStoredBlock: Redundant addStoredBlock request received for blk_1073742014_1202 on node 192.168.0.83:50010 size 60
2015-10-19 15:26:31,738 WARN BlockStateChange: BLOCK* addStoredBlock: Redundant addStoredBlock request received for blk_1073742015_1203 on node 192.168.0.83:50010 size 60
2015-10-19 15:26:31,739 INFO BlockStateChange: BLOCK* processReport: from storage DS-14d45702-1346-4f53-8178-4c9db668729d node DatanodeRegistration(192.168.0.83, datanodeUuid=c82b8d07-5ffb-46d5-81f7-47c06e673384, infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=CID-c06612c4-aa83-4145-b2b5-e4faffa08074;nsid=531546365;c=0), blocks: 48, hasStaleStorage: false, processing time: 5 msecs

出現了很多的警告,在50070端口的頁面中不斷的刷新時,發現xd0與xd3交替的出現,原來我是將xd0的機器克隆爲xd3,沒有將datanode下面的目錄清理,只能停機清理乾淨後在開啓。主節點的目錄記錄如下:
2015-10-19 15:38:51,415 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 170 Total time for transactions(ms): 13 Number of transactions batched in Syncs: 2 Number of syncs: 131 SyncTimes(ms): 2621 
2015-10-19 15:38:51,433 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073742974_2164 192.168.0.81:50010 192.168.0.82:50010 192.168.0.80:50010 
2015-10-19 15:38:51,446 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073742975_2165 192.168.0.82:50010 192.168.0.81:50010 192.168.0.80:50010 
2015-10-19 15:38:53,844 INFO BlockStateChange: BLOCK* BlockManager: ask 192.168.0.82:50010 to delete [blk_1073742974_2164, blk_1073742975_2165]
2015-10-19 15:38:56,844 INFO BlockStateChange: BLOCK* BlockManager: ask 192.168.0.81:50010 to delete [blk_1073742974_2164, blk_1073742975_2165]
2015-10-19 15:38:56,848 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2015-10-19 15:38:56,848 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
2015-10-19 15:38:57,220 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* registerDatanode: from DatanodeRegistration(192.168.0.83, datanodeUuid=b99253b7-6535-4d0e-85ad-7ca6332181af, infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=CID-c06612c4-aa83-4145-b2b5-e4faffa08074;nsid=531546365;c=0) storage b99253b7-6535-4d0e-85ad-7ca6332181af
2015-10-19 15:38:57,220 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor: Number of failed storage changes from 0 to 0
2015-10-19 15:38:57,220 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/192.168.0.83:50010
2015-10-19 15:38:57,273 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor: Number of failed storage changes from 0 to 0
2015-10-19 15:38:57,273 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor: Adding new storage ID DS-40fd35ff-cb26-47e8-aa50-654dbdfbbd4a for DN 192.168.0.83:50010
2015-10-19 15:38:57,291 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Processing first storage report for DS-40fd35ff-cb26-47e8-aa50-654dbdfbbd4a from datanode b99253b7-6535-4d0e-85ad-7ca6332181af
2015-10-19 15:38:57,291 INFO BlockStateChange: BLOCK* processReport: from storage DS-40fd35ff-cb26-47e8-aa50-654dbdfbbd4a node DatanodeRegistration(192.168.0.83, datanodeUuid=b99253b7-6535-4d0e-85ad-7ca6332181af, infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=CID-c06612c4-aa83-4145-b2b5-e4faffa08074;nsid=531546365;c=0), blocks: 0, hasStaleStorage: false, processing time: 0 msecs
2015-10-19 15:38:59,844 INFO BlockStateChange: BLOCK* BlockManager: ask 192.168.0.80:50010 to delete [blk_1073742974_2164, blk_1073742975_2165]
2015-10-19 15:39:26,848 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2015-10-19 15:39:26,848 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).

本機節點的日誌中也無報錯。但這會影響原始機xd0的數據,之後發現上面的數據會有損壞,只好將其datanode下面的數據刪除後再開啓datanode。

開啓平衡同步腳本:
hadoop@xd3:/usr/local/hadoop$ sbin/start-balancer.sh

日誌記錄如下:
hadoop@xd3:/usr/local/hadoop$ more logs/hadoop-hadoop-balancer-xd3.log 
2015-10-19 15:43:44,875 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 172 Total time for transactions(ms): 13 Number of transactions batched in Syncs: 2 Number of syncs: 133 SyncTimes(ms): 2650 
2015-10-19 15:43:44,950 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /system/balancer.id. BP-668197744-192.168.0.9-1444641099117 blk_1073742981_2173{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-40fd35ff-cb26-47e8-aa50-654dbdfbbd4a:NORMAL:192.168.0.83:50010|RBW], ReplicaUnderConstruction[[DISK]DS-f7b1834a-9174-445a-8802-8c6f908d0a94:NORMAL:192.168.0.81:50010|RBW], ReplicaUnderConstruction[[DISK]DS-6b35073c-4673-4fce-8d46-fe0377a17739:NORMAL:192.168.0.82:50010|RBW]]}
2015-10-19 15:43:45,226 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* fsync: /system/balancer.id for DFSClient_NONMAPREDUCE_912969379_1
2015-10-19 15:43:45,285 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.0.82:50010 is added to blk_1073742981_2173{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-40fd35ff-cb26-47e8-aa50-654dbdfbbd4a:NORMAL:192.168.0.83:50010|RBW], ReplicaUnderConstruction[[DISK]DS-f7b1834a-9174-445a-8802-8c6f908d0a94:NORMAL:192.168.0.81:50010|RBW], ReplicaUnderConstruction[[DISK]DS-6b35073c-4673-4fce-8d46-fe0377a17739:NORMAL:192.168.0.82:50010|RBW]]} size 3
2015-10-19 15:43:45,287 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.0.81:50010 is added to blk_1073742981_2173{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-40fd35ff-cb26-47e8-aa50-654dbdfbbd4a:NORMAL:192.168.0.83:50010|RBW], ReplicaUnderConstruction[[DISK]DS-f7b1834a-9174-445a-8802-8c6f908d0a94:NORMAL:192.168.0.81:50010|RBW], ReplicaUnderConstruction[[DISK]DS-6b35073c-4673-4fce-8d46-fe0377a17739:NORMAL:192.168.0.82:50010|RBW]]} size 3
2015-10-19 15:43:45,308 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.0.83:50010 is added to blk_1073742981_2173 size 3
2015-10-19 15:43:45,309 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /system/balancer.id is closed by DFSClient_NONMAPREDUCE_912969379_1
2015-10-19 15:43:45,331 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073742981_2173 192.168.0.81:50010 192.168.0.82:50010 192.168.0.83:50010 
2015-10-19 15:43:47,861 INFO BlockStateChange: BLOCK* BlockManager: ask 192.168.0.81:50010 to delete [blk_1073742981_2173]
2015-10-19 15:43:47,862 INFO BlockStateChange: BLOCK* BlockManager: ask 192.168.0.82:50010 to delete [blk_1073742981_2173]
2015-10-19 15:43:50,862 INFO BlockStateChange: BLOCK* BlockManager: ask 192.168.0.83:50010 to delete [blk_1073742981_2173]

11、hadoop decommission時因block的replicas不夠時久不能退役

hadoop decommission一個節點Datanode,幾萬個block都同步過去了,但是唯獨剩下2個block一直停留在哪,導致該節點幾個小時也無法下線。hadoop UI中顯示在Under Replicated Blocks裏面有2個塊始終無法消除。

Under Replicated Blocks 2 Under Replicated Blocks In Files Under Construction 2

Under Replicated Blocks 2
Under Replicated Blocks In Files Under Construction 2

Namenode日誌裏面一直有這樣的滾動:
2015-01-20 15:04:47,978 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Block: blk_8859027644264991843_26141120, Expected Replicas: 3, live replicas: 2, corrupt replicas: 0, decommissioned replicas: 1, excess replicas: 0, Is Open File: true, Datanodes having this block: 10.11.12.13:50010 10.11.12.14:50010 10.11.12.15:50010 , Current Datanode: 10.11.12.13:50010, Is current datanode decommissioning: true

2015-01-20 15:04:47,978 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Block: blk_8859027644264991843_26141120,Expected Replicas: 3, live replicas: 2, corrupt replicas: 0, decommissioned replicas: 1, excess replicas: 0, Is Open File: true, Datanodes having this block: 10.11.12.13:50010 10.11.12.14:50010 10.11.12.15:50010 , Current Datanode: 10.11.12.13:50010, Is current datanode decommissioning:true

google了好久,貌似是一個hadoop的bug,https://issues.apache.org/jira/browse/HDFS-5579
NameNode發現block的Replicas不夠(期待應該有3個,實際有兩個),或許是namenode認爲數據不完整,執着地不讓這個DataNode下架。。。

最終嘗試如下方式解決,把replications設置成2:
hadoop fs -setrep -R 2 /

執行完後很快,該節點就下線了,神奇的replications。

移除任何一個Datanode都會導致某些文件不能滿足replica factor的最低要求。當試圖移除一個Datanode的時候,會一直處在Decommissioning的狀態,因爲它找不到別的機器來遷移它的數據了。這個問題通常容易出現在小集羣上。

一個解決辦法就是臨時把相應文件的replica factor調低。用如下命令來查看HDFS中所有文件的replica factor
hdfs fsck / -files -blocks

其中repl=1表示該文件的該block的replica factor爲1,通過這個命令就可以找到那些replica factor比較高的文件了。

調整文件的replicafactor

需要注意的是,replica factor是文件的屬性,而不是集羣的屬性,也就是說同一個集羣中的文件可以有不同的replica factor。因此需要針對文件修改replica factor。相應的命令是:
hdfs dfs -setrep [-R] [-w] <rep><path>

其中
-R表示recursive,可以對一個目錄及其子目錄設置replica factor
<rep>表示需要設置的replica factor的值
<path>表示需要設置的replica factor的文件或目錄路徑
-w表示等待複製完成,可能需要等待很長時間


通過搜索引擎找到關於它們使用過程中的問題集的文章:
Hadoop常見錯誤及解決辦法彙總【時常更新】

感謝原作者!


12、Hadoop開啓安全模式引起的hbase列出表時出錯

在列出表時報錯 
hbase(main):001:0> list
TABLE                                                                                                                                
ERROR: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet

在其日誌文件中會有如下報錯:

2016-07-13 10:33:14,576 INFO  [master:hadoop1:60000] catalog.CatalogTracker: Failed verification of hbase:meta,,1 at address=hadoop3,60020,1453280257576, exception=org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region is not online: 1468377187016

經查看原因原來是 hadoop 集羣處於安全模式。執行如下指令:
hdfs dfsadmin -safemode leave

重新啓動habase,問題可得到解決。相關參數:
safemode enter|leave|get|wait

安全模式維護命令,安全模式是Namenode的一個狀態,這種狀態:
1.不接受對名字空間的更改(只讀)
2.不復制或刪除塊

Namenode會在啓動時自動進入安全模式,當配置的塊最小百分比數滿足最小的副本數條件時,會自動離開安全模式。安全模式可以手動進入,但是這樣的話也必須手動關閉安全模式。

13、測試datanode上的塊恢復

測試將一datanode機器上的一個數據存儲目錄(至少兩個存儲目錄)清空

hadoop@htcom:/usr/local/hadoop$ bin/hdfs fsck /
17/03/17 15:57:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connecting to namenode via http://htcom:50070
FSCK started by hadoop (auth:SIMPLE) from /192.168.0.9 for path / at Fri Mar 17 15:57:24 CST 2017
.....
/hbase/data/default/wz/7c8b18620febde82c7ec60de1d26e362/.regioninfo:  Under replicated BP-1176962362-192.168.0.9-1489729250270:blk_1073742108_1284. Target Replicas is 3 but found 2 replica(s).
.
/hbase/data/default/wz/7c8b18620febde82c7ec60de1d26e362/cf/efe459c273c24635b4a571c6edbb907d:  Under replicated BP-1176962362-192.168.0.9-1489729250270:blk_1073742112_1288. Target Replicas is 3 but found 2 replica(s).
....
/hbase/data/default/wz/9b3289559bb9e511a6dcafcf95b32419/cf/22b36ccb56124f51a0537ccb6825d965:  Under replicated BP-1176962362-192.168.0.9-1489729250270:blk_1073742056_1232. Target Replicas is 3 but found 2 replica(s).
.....
/hbase/data/default/wz/d6791ab507e752b01de90b3c07efdb72/.regioninfo:  Under replicated BP-1176962362-192.168.0.9-1489729250270:blk_1073742022_1198. Target Replicas is 3 but found 2 replica(s).
.
/hbase/data/default/wz/d6791ab507e752b01de90b3c07efdb72/cf/5267d4e39ac84a14942e26d4f97d3ed2:  Under replicated BP-1176962362-192.168.0.9-1489729250270:blk_1073742026_1202. Target Replicas is 3 but found 2 replica(s).
.....
/hbase/data/default/wz/f639a4fddc8be893b8489f85b9f4bb67/.regioninfo:  Under replicated BP-1176962362-192.168.0.9-1489729250270:blk_1073742136_1312. Target Replicas is 3 but found 2 replica(s).
............Status: HEALTHY
 Total size:    2126046745 B
 Total dirs:    85
 Total files:    57
 Total symlinks:        0 (Files currently being written: 3)
 Total blocks (validated):    58 (avg. block size 36655978 B) (Total open file blocks (not validated): 3)
 Minimally replicated blocks:    58 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    11 (18.965517 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    2
 Average block replication:    2.6724138
 Corrupt blocks:        0
 Missing replicas:        11 (6.626506 %)
 Number of data-nodes:        3
 Number of racks:        1
FSCK ended at Fri Mar 17 15:57:24 CST 2017 in 21 milliseconds


The filesystem under path '/' is HEALTHY

namenode的日誌中會不斷的報錯:
2017-03-17 16:02:40,150 WARN org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough replicas: expected size is 1 but only 0 storage types can be selected (replication=3, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
2017-03-17 16:02:40,150 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 3 (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) All required storage types are unavailable:  unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
2017-03-17 16:02:40,150 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
2017-03-17 16:02:40,150 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 3 (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
2017-03-17 16:02:40,150 WARN org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough replicas: expected size is 1 but only 0 storage types can be selected (replication=3, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
2017-03-17 16:02:40,150 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 3 (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) All required storage types are unavailable:  unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}

datanode日誌會報錯:
2017-03-17 15:44:16,301 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: xd1:50010:DataXceiver error processing READ_BLOCK operation  src: /192.168.0.81:47294 dst: /192.168.0.81:50010
java.io.IOException: Block BP-1176962362-192.168.0.9-1489729250270:blk_1073742112_1288 is not valid. Expected block file at /opt/edfs/current/BP-1176962362-192.168.0.9-1489729250270/current/finalized/subdir0/subdir1/blk_1073742112 does not exist.
    at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:585)
    at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:375)
    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:514)
    at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
    at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:237)
    at java.lang.Thread.run(Thread.java:745)


bin/hdfs dfsadmin -report
會發現該datanode節點依然是正常狀態,'fsck /'時起初不會報錯,後來會慢慢地如上的錯誤,但文件系統依然是"HEALTHY"。被清空的目錄也慢慢地生成新分配給它的數據塊。重啓後該數據節點會快速地將數據塊同步過來。不知道會不會隨着時間在不重啓該數據節點的情況下,完成該節點數據塊的重新同步。

將另外一臺datanode節點xd0的一半的目錄清空(/opt/edfs),過幾個小時看看。

過幾個小時去看日誌時,還是有報錯,第二天中午再看時對應的被清空目錄文件大小已經恢復到之前,數據節點和namenode的日誌中不在有塊不存在的報警日誌。從web(端口50070)頁面中也能看到Blocks、Block pool used這兩項比較均勻。同時覈對測試表中的數據數量與上次也是相同的。

重啓對應的datanode
重啓成功後,該datanode會很快將相應目錄缺少的Block從對應的機器上同步過來:
2017-03-17 16:02:48,598 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received BP-1176962362-192.168.0.9-1489729250270:blk_1073741876_1052 src: /192.168.0.83:51588 dest: /192.168.0.81:50010 of size 73
2017-03-17 16:02:48,643 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1176962362-192.168.0.9-1489729250270:blk_1073741864_1040 src: /192.168.0.83:51589 dest: /192.168.0.81:50010
2017-03-17 16:02:51,373 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1176962362-192.168.0.9-1489729250270:blk_1073741833_1009 src: /192.168.0.80:52879 dest: /192.168.0.81:50010
2017-03-17 16:02:51,383 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received BP-1176962362-192.168.0.9-1489729250270:blk_1073741833_1009 src: /192.168.0.80:52879 dest: /192.168.0.81:50010 of size 1745
2017-03-17 16:02:51,483 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1176962362-192.168.0.9-1489729250270:blk_1073741846_1022 src: /192.168.0.80:52878 dest: /192.168.0.81:50010
2017-03-17 16:02:51,527 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received BP-1176962362-192.168.0.9-1489729250270:blk_1073741846_1022 src: /192.168.0.80:52878 dest: /192.168.0.81:50010 of size 1045
2017-03-17 16:02:51,728 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1176962362-192.168.0.9-1489729250270:blk_1073741860_1036 src: /192.168.0.83:51590 dest: /192.168.0.81:50010
2017-03-17 16:02:51,751 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received BP-1176962362-192.168.0.9-1489729250270:blk_1073741860_1036 src: /192.168.0.83:51590 dest: /192.168.0.81:50010 of size 54
2017-03-17 16:02:53,841 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received BP-1176962362-192.168.0.9-1489729250270:blk_1073742130_1306 src: /192.168.0.83:51586 dest: /192.168.0.81:50010 of size 79136111
2017-03-17 16:02:55,065 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received BP-1176962362-192.168.0.9-1489729250270:blk_1073741894_1070 src: /192.168.0.80:52875 dest: /192.168.0.81:50010 of size 95193917
2017-03-17 16:02:55,289 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received BP-1176962362-192.168.0.9-1489729250270:blk_1073741880_1056 src: /192.168.0.80:52877 dest: /192.168.0.81:50010 of size 76640469


2017-03-17 16:07:40,683 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1176962362-192.168.0.9-1489729250270:blk_1073742151_1329 src: /192.168.0.81:50162 dest: /192.168.0.81:50010
2017-03-17 16:07:40,717 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.81:50162, dest: /192.168.0.81:50010, bytes: 27729, op: HDFS_WRITE, cliID: DFSClient_hb_rs_xd1,60020,1489730648396_-196772784_32, offset: 0, srvID: 06fc9536-5e47-4c43-884e-7be0d65a6aed, blockid: BP-1176962362-192.168.0.9-1489729250270:blk_1073742151_1329, duration: 17845556
2017-03-17 16:07:40,718 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-1176962362-192.168.0.9-1489729250270:blk_1073742151_1329, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2017-03-17 16:09:04,905 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1176962362-192.168.0.9-1489729250270:blk_1073742152_1330 src: /192.168.0.81:50168 dest: /192.168.0.81:50010
2017-03-17 16:09:16,552 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification succeeded for BP-1176962362-192.168.0.9-1489729250270:blk_1073741994_1170
2017-03-17 16:10:32,560 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification succeeded for BP-1176962362-192.168.0.9-1489729250270:blk_1073742130_1306


namenode 上的日誌
2017-03-17 16:03:16,065 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
2017-03-17 16:03:46,064 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2017-03-17 16:03:46,064 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2017-03-17 16:04:14,715 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 102 
2017-03-17 16:04:14,734 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /hbase/WALs/xd1,60020,1489730648396/xd1%2C60020%2C1489730648396.1489737854594.meta. BP-1176962362-192.168.0.9-1489729250270 blk_1073742150_1328{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-032e61b4-2a55-4b0f-b497-0856d1aef4ee:NORMAL:192.168.0.81:50010|RBW], ReplicaUnderConstruction[[DISK]DS-68cfba4b-8112-43b1-864d-fc9622c36ab8:NORMAL:192.168.0.83:50010|RBW], ReplicaUnderConstruction[[DISK]DS-79a15dc0-0c3e-4cf3-8898-7ad1e6f25788:NORMAL:192.168.0.80:50010|RBW]]}
2017-03-17 16:04:14,974 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* fsync: /hbase/WALs/xd1,60020,1489730648396/xd1%2C60020%2C1489730648396.1489737854594.meta for DFSClient_hb_rs_xd1,60020,1489730648396_-196772784_32
2017-03-17 16:04:16,064 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2017-03-17 16:04:16,065 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).


2017-03-17 16:09:04,998 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 19 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 10 SyncTimes(ms): 199 
2017-03-17 16:09:05,022 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /hbase/WALs/xd1,60020,1489730648396/xd1%2C60020%2C1489730648396.1489738144857. BP-1176962362-192.168.0.9-1489729250270 blk_1073742152_1330{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-493b36f7-1dbf-4d2f-83d4-288360686852:NORMAL:192.168.0.81:50010|RBW], ReplicaUnderConstruction[[DISK]DS-79a15dc0-0c3e-4cf3-8898-7ad1e6f25788:NORMAL:192.168.0.80:50010|RBW], ReplicaUnderConstruction[[DISK]DS-60259b9b-c57d-46d8-841e-100fcbf9ff18:NORMAL:192.168.0.83:50010|RBW]]}
2017-03-17 16:09:05,052 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* fsync: /hbase/WALs/xd1,60020,1489730648396/xd1%2C60020%2C1489730648396.1489738144857 for DFSClient_hb_rs_xd1,60020,1489730648396_-196772784_32

從web頁面的"Datanode Information"頁的"Block pool used"能再次看到其所使用的空間與其它節點相近。


如果不想在日誌文件看到這些報警,可以在配置文件中就是設定。
<property>
    <name>dfs.datanode.data.dir</name>
    <value>/freeoa/pdb,/freeoa/pdc,/freeoa/pdd,</value>
</property>

這些目錄都是獨立硬盤的掛載點,當它們中有磁盤出現i/o錯誤時就會報上述的錯誤。如果不想讓hadoop報出這些,可將"dfs.datanode.failed.volumes.tolerated"這個選項設置爲1,默認爲0。
The number of volumes that are allowed to fail before a datanode stops offering service. By default any volume failure will cause a datanode to shutdown.

<property>
    <name>dfs.datanode.failed.volumes.tolerated</name>
    <value>1</value>
</property>

14、修復hadoop因強制退出而又開啓平衡後corrupt問題

因同事將hadoop v2.2.0 的一臺datanode強制關閉(因長時間不能Decommissioned),而後在另外一臺新加入的datanode開啓balancer.sh平衡,導致下面的錯誤警告.

...................................................................................................
Status: CORRUPT
 Total size:    17877963862236 B (Total open files size: 1601620148224 B)
 Total dirs:    18291
 Total files:    16499
 Total symlinks:        0 (Files currently being written: 9)
 Total blocks (validated):    146222 (avg. block size 122265896 B) (Total open file blocks (not validated): 11939)
  ********************************
  CORRUPT FILES:    2
  MISSING BLOCKS:    70
  MISSING SIZE:        9395240960 B
  CORRUPT BLOCKS:     70
  ********************************
 Minimally replicated blocks:    146152 (99.952126 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    2
 Average block replication:    1.9990425
 Corrupt blocks:        70
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        13
 Number of racks:        1
FSCK ended at Mon Apr 17 10:31:04 CST 2017 in 1357 milliseconds

The filesystem under path '/' is CORRUPT

看了下官方的FAQ,大致意思就是,有幾個塊的數據,在現有的DataNode節點上,沒有一個存儲的,但是在NameNode的元數據裏卻存在。觀察hbase master web ui的主頁,也會有大量的警告。

怎麼解決這個問題呢,下面介紹一個hadoop的健康監測命令fsck。

fsck工具來檢驗HDFS中的文件是否正常可用。這個工具可以檢測文件塊是否在DataNode中丟失,是否低於或高於文件副本。 fsck命令用法如下:
注:此處必須是啓動hadoop hdfs的賬號纔有權查看
Usage: DFSck <path> [-list-corruptfileblocks | [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]]
 <path>  檢查的起始目錄
 -move   將損壞的文件移到到/lost+found
 -delete 刪除損壞的文件
 -files  打印出所有被檢查的文件
 -openforwrite   打印出正在寫的文件
 -list-corruptfileblocks print out list of missing blocks and files they belong to
 -blocks 打印出block報告
 -locations      打印出每個block的位置
 -racks  打印出data-node的網絡拓撲結構

默認情況下,fsck會忽略正在寫的文件,使用-openforwrite可以彙報這種文件。

最後一點,需要注意的是,這個命令在namenode文件信息較大的時候,會比較影響hadoop性能,所以應該慎用,通常可以在集羣空閒的時間段,執行一次,查看整體的HDFS副本健康狀況!

重啓hbase或hadoop也不能解決該問題。可以將那臺強制退出的那臺datanode的機器重新加入回來,停止平衡腳本,讓hadoop其自動恢復,在fsck時沒有錯誤報出時再將這臺datanode機器移除;如果這臺機器不可找回時,可以考慮下面的方法(數據會丟失,但整個hadoop系統會起來)。


嘗試使用 hbase hbck -fix以及 hbase hbck -repair 命令來修復,結果失敗

通過hdfs fsck / -delete bad_region_file 直接幹掉壞掉的hbase corrupt blocks,然後重啓hbase集羣,發現region全部online,問題解決。

在看看塊文件
hdfs fsck / -files –blocks

hdfs fsck / | egrep -v '^\.+$' | grep -i "corrupt blockpool"| awk '{print $1}' |sort |uniq |sed -e 's/://g' >corrupted.flst
hdfs dfs -rm /path/to/corrupted.flst
hdfs dfs -rm -skipTrash /path/to/corrupted.flst

How would I repair a corrupted file if it was not easy to replace?

This might or might not be possible, but the first step would be to gather information on the file's location, and blocks.

hdfs fsck /path/to/filename/fileextension -locations -blocks -files
hdfs fsck hdfs://ip.or.hostname.of.namenode:50070/path/to/filename/fileextension -locations -blocks -files

注意:通過 hdfs fsck / -delete 方式刪除了壞掉的hdfs block會造成數據丟失。

15、HBase創建快照(snapshot)出現異常的處理方法

在hbase中創建快照的時候遇到了如下錯誤:
> snapshot 'freeoa', 'freeoa-snapshot-20140912'

ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot 
...
Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via timer-java.util.Timer@69db0cb4:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! Source:Timeout caused Foreign Exception Start:1410453067992, End:1410453127992, diff:60000, max:60000 ms  
    at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83)  
    at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:320)  
    at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:332)  
    … 10 more  
Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! Source:Timeout caused Foreign Exception Start:1410453067992, End:1410453127992, diff:60000, max:60000 ms  
    at org.apache.hadoop.hbase.errorhandling.TimeoutExceptionInjector$1.run(TimeoutExceptionInjector.java:70)  
    at java.util.TimerThread.mainLoop(Timer.java:555)  
    at java.util.TimerThread.run(Timer.java:505)

出現這種問題的原因是因爲和服務器通信超時導致的。所以需要將下面兩個參數的默認值進行調整。
1、hbase.snapshot.region.timeout
2、hbase.snapshot.master.timeoutMillis

這兩個值的默認值爲60000,單位是毫秒,也即1min。如果通信時間超過該值,就會報上面的錯誤。

Snapshot就是一個metadata info集合,它能夠讓admin將一個table回覆到先前的的一個狀態。
Operations:
Take a snapshot: 對一個指定的table創建snapshot,在table進行balance,split,compact時,可能會失敗;
Clone a snapshot: 基於上述創建的snapshot,創建一個新的table,該table和上述的table有相同的schema和data, 新表的操作不會影響原始表;
Restore a snapshot: 將一個table回覆到一個snapshot狀態;
Delete a snapshot: 刪除一個snapshot,釋放空間,不會影響clone的表和其他的snapshot;
Export a snapshot: 將一個snapshot的metadata和data copy到另一個集羣中,HDFS層面的操作,不會影響RS;     

具體操作:
hbase> snapshot ‘tableName’, ‘snapshotName’
hbase> clone_snapshot 'snapshotName', 'newTableName'
hbase> delete_snapshot 'snapshotName'
hbase> restore_snapshot 'snapshotName'
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot SnapshotName -copy-to hdfs:///srv2:8082/hbase

一些侷限
涉及到snapshot的region合併時,在snapshot和clone table中會丟失數據。
一個帶有replication屬性的table進行恢復到一個snapshot狀態時,該table在另外一個集羣裏replica不會進行恢復。

這裏提供一個自寫的hbase snapshot 腳本,它通過crontab調用,生產snapshot的同時會刪除7天前的快照。

use v5.12;
use utf8;
use Encode;
use Mojo::Log;
use Time::Piece;
use Data::Dumper;
use Time::Seconds;
use HBase::JSONRest;
use Cwd qw(abs_path realpath);
use File::Basename qw(dirname);

binmode(STDIN, ":encoding(utf8)");
binmode(STDOUT, ":encoding(utf8)");

my $mydir=dirname(abs_path($0));
chdir($mydir);

my $cdt=localtime;
my $pdt7=$cdt - 7 * ONE_DAY;

my $log = Mojo::Log->new(path => 'log/hbase.snapshot.log');

$log = $log->format(sub {
  my ($time, $level, @lines) = @_;
  #my $idt=strftime("%Y-%m-%d %H:%M:%S",localtime);
  my $idt=localtime->datetime;
  return qq{[$idt] [$level] @lines \n};
});

#my $ymd=$cdt->ymd;#取得當前年月日
my $ymd=$cdt->strftime("%Y%m%d");#同上,但其中不含有'-'
my $ymd7=$pdt7->strftime("%Y%m%d");#同上,前7天的

#Hbase
my ($hostname,$hbdir)=('192.168.1.120:8080','/usr/local/hbase');
my $hbase=HBase::JSONRest->new(host=>$hostname);
#取得所有表的名稱
my $hbtabs=$hbase->list;

foreach my $tab (@$hbtabs){
    #say 'Take SnapShot for table:'.$tab->{name};
    make_snap_shot($tab->{name});
    purge_snap_shot($tab->{name});
}
#做快照
sub make_snap_shot{
    my $tab=shift;
    my $cmd=qq[echo "snapshot '$tab', 'snapshot_$tab\_$ymd'" | $hbdir/bin/hbase shell];
    my ($rs,$henv)=(system($cmd));
    #$henv.="$_:$ENV{$_}\n" foreach (keys %ENV);
    $log->info("Take snapshot on $tab,return code is:$rs.");
}
#刪除之前的快照
sub purge_snap_shot{
    my $tab=shift;
    my $cmd=qq[echo "delete_snapshot 'snapshot_$tab\_$ymd7'" | $hbdir/bin/hbase shell];
    my $rs=system($cmd);
    $log->info("Delete snapshot for table:$tab by snapshot name:snapshot_$tab\_$ymd7,return code is:$rs.");
}


HBase Snapshot 相關操作原理

16、快照時間過短導致的快照不能完成且日誌中有大量臨時文件找不到的錯誤

hbase(main):004:0> snapshot 'table','snapshot_freeoa'

ERROR: Snapshot 'snapshot_freeoa' wasn't completed in expectedTime:60000 ms

Here is some help for this command:
Take a snapshot of specified table. Examples:

  hbase> snapshot 'sourceTable', 'snapshotName'
  hbase> snapshot 'namespace:sourceTable', 'snapshotName', {SKIP_FLUSH => true}

默認的60s在有許多Regions的表上顯得不夠用,需要加長一些。

hbase-site.xml

<property>
    <name>hbase.snapshot.enabled</name>
    <value>true</value>
</property>
<property>
    <name>hbase.snapshot.master.timeoutMillis</name>
    <value>1800000</value>
</property>
<property>
    <name>hbase.snapshot.region.timeout</name>
    <value>1800000</value>
</property>

17、大量寫入導致整個系統變慢的問題

too many store files; delaying flush up to 90000ms

2017-08-22 12:35:21,506 WARN  [B.DefaultRpcServer.handler=2969,queue=469,port=60020] hdfs.DFSClient: Failed to connect to /192.168.20.50:50010 for file /hbase/data/default/wz_content/aa010d3b1f1d325063edb83d9c971057/content/bf6454cd2bcd49c890cea6afaa5fe99b for block BP-1591618693-192.168.20.125-1431793334085:blk_1184188910_110455556:java.io.IOException: Connection reset by peer
2017-08-22 12:35:21,506 WARN  [B.DefaultRpcServer.handler=2969,queue=469,port=60020] hdfs.DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 2665.0594974049063 msec.
2017-08-22 12:35:21,521 WARN  [B.DefaultRpcServer.handler=1790,queue=290,port=60020] hdfs.DFSClient: Failed to connect to /192.168.20.50:50010 for file /hbase/data/default/wz_content/aa010d3b1f1d325063edb83d9c971057/content/bf6454cd2bcd49c890cea6afaa5fe99b for block BP-1591618693-192.168.20.125-1431793334085:blk_1184188990_110455636:java.net.ConnectException: Connection timed out
2017-08-22 12:35:21,981 WARN  [B.DefaultRpcServer.handler=3370,queue=370,port=60020] hdfs.DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 2089.5811774229255 msec.
2017-08-22 12:35:22,091 WARN  [B.DefaultRpcServer.handler=2525,queue=25,port=60020] hdfs.DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 1630.3766567661871 msec.
2017-08-22 12:39:59,789 WARN  [regionserver60020.logRoller] regionserver.ReplicationSource: Queue size: 1394 exceeds value of replication.source.log.queue.warn: 2
2017-08-22 12:47:58,802 WARN  [regionserver60020.logRoller] regionserver.ReplicationSource: Queue size: 1395 exceeds value of replication.source.log.queue.warn: 2
2017-08-22 12:48:34,586 WARN  [MemStoreFlusher.1] regionserver.MemStoreFlusher: Region filter_content,c0007ac84,1502808497104.198fe43f10a1c7d980f38a4ff661957c. has too many store files; delaying flush up to 90000ms
2017-08-22 12:48:49,947 WARN  [B.DefaultRpcServer.handler=1031,queue=31,port=60020] hdfs.DFSClient: Failed to connect to /192.168.20.50:50010 for file /hbase/data/default/filter_content/cf69a9be117f48f8c084d61bf9c71290/content/c20b398c292d4adebc410cda249c29c1 for block BP-1591618693-192.168.20.125-1431793334085:blk_1184568976_110835622:java.net.ConnectException: Connection timed out
2017-08-22 12:48:49,947 WARN  [B.DefaultRpcServer.handler=1031,queue=31,port=60020] hdfs.DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 658.8251327374326 msec.

參考來源:

Hbase萬億級存儲性能優化總結

HBase最佳實踐--寫性能優化策略

實時系統HBase讀寫優化--大量寫入無障礙

Hbase寫入量大導致region過大無法split問題 

18、壓縮功能在某些無法在RegionServer上開啓的問題

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
[regionserver60020-smallCompactions-1507591166085] compress.CodecPool: Got brand-new decompressor [.gz]
[B.DefaultRpcServer.handler=3900,queue=400,port=60020] compress.CodecPool: Got brand-new decompressor [.gz]

問題表現:
在新增的兩臺RegionServer的日誌裏,'compress.CodecPool'此類的日誌刷過不停。經過分析初步定位是與壓縮功能的問題,因爲我有相當多的表開啓了壓縮功能。難道是這兩臺RS上的壓縮功能支持有問題,測試一下:
$ ./bin/hbase --config ~/conf_hbase org.apache.hadoop.util.NativeLibraryChecker
2017-10-13 15:11:38,717 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Native library checking:
hadoop: false
zlib:   false
snappy: false
lz4:    false
bzip2:  false
2017-10-13 15:11:38,863 INFO  [main] util.ExitUtil: Exiting with status 1

而在老的RS上至少支持zlib與lz4,看來是沒有正常安裝了。在'/usr/local'目錄下,只發現了hbase目錄,沒有與之平級hadoop目錄,而在另外老的三臺RS上是有的。這樣問題找到了,將老RS上的hadoop複製到'/usr/local'下,設置好hadoop路徑變量(/etc/profile.d/hadoop.sh),問題得以解決。

$ ./bin/hbase --config ~/conf_hbase org.apache.hadoop.util.NativeLibraryChecker
2017-10-13 15:16:59,739 WARN  [main] bzip2.Bzip2Factory (Bzip2Factory.java:isNativeBzip2Loaded(73)) - Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
2017-10-13 15:16:29,744 INFO  [main] zlib.ZlibFactory (ZlibFactory.java:<clinit>(48)) - Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /usr/local/hadoop/lib/native/libhadoop.so.1.0.0
zlib:   true /lib64/libz.so.1
snappy: false 
lz4:    true revision:43
bzip2:  false

原因分析:hbase是基於hadoop開發而來,其中的一些高級功能還是需要調用hadoop中的接口,因此,在其運行環境中提供相兼容的hadoop環境是必須的。

參考來源:
Appendix A: Compression and Data Block Encoding In HBase
Hadoop Unable to load native-hadoop library for your platform warning
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

19、datanode節點的數據盤出現損壞時意外退出或拒絕啓動

日誌如下:
...
2017-11-02 10:42:03,361 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up storage: nsid=1500672018;bpid=BP-1591618693-192.168.20.125-1431793334085;lv=-47;nsInfo=lv=-47;cid=CID-4227a144-0f0a-4034-a71d-e7c7ebcb13bf;nsid=1500672018;c=0;bpid=BP-1591618693-192.168.20.125-1431793334085
2017-11-02 10:42:03,380 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-1591618693-192.168.20.125-1431793334085 (storage id DS-1034782909-192.168.20.51-50010-1489925973516) service to nameNode.hadoop1/192.168.20.125:9000
org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 5, volumes configured: 6, volumes failed: 1, volume failures tolerated: 0
    at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.<init>(FsDatasetImpl.java:201)
...
2017-11-02 10:42:05,482 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2017-11-02 10:42:05,485 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2017-11-02 10:42:05,488 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at dataNode11.hadoop1/192.168.20.51
************************************************************/

這臺dn有6塊硬盤,其中一塊損壞了,導致進程中止並拒絕啓動,後修改了hdfs-site.xml配置文件,將這塊硬盤的掛載點從配件文件中移除,該節點可以被啓動,但容量會少去這部分。分析一下不能啓動的原因:

dfs.datanode.failed.volumes.tolerated - By default this is set to 0 in hdfs-site.xml

其含義是:The number of volumes that are allowed to fail before a datanode stops offering service. By default any volume failure will cause a datanode to shutdown. 即datanode可以忍受的磁盤損壞的個數,默認爲0。

在hadoop集羣中,經常會發生磁盤損壞的情況。datanode在啓動時會使用dfs.datanode.data.dir下配置的文件夾(用來存儲block),若是有一些不可以用且個數>配置的值,DataNode就會啓動失敗。volFailuresTolerated和volsConfigured的值都爲1,所以會導致代碼裏的設定起作用從而拒絕啓動。

 

轉載於  http://www.freeoa.net/osuport/db/my-hbase-usage-problem-sets_2979.html

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章