hbase问题记录

1. [hbase]hadoop 异常记录 ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 times 
#-----------------------------------------------------------------------------------------------------------------------------------
将hbase/lib目录下的hadoop-core的jar文件删除,将hadoop目录下的hadoop-0.20.2-core.jar拷贝到hbase/lib下面,然后重新启动hbase即可。 

####################################################################################################################################
2. [chukwa]Caused by: java.lang.ClassNotFoundException: org.mortbay.jetty.HandlerContainer 
#-----------------------------------------------------------------------------------------------------------------------------------
chukwa lib里缺jetty-6.1.26.jar 
 
####################################################################################################################################
3. slf4j-api-1.5.8.jar slf4j-log4j12-1.5.8.jar版本不一致(hadoop1.0.4和hbase0.92.1) 

用1.5.8替换hadoop下的slf4j-log4j12-1.4.3.jar和slf4j-api-1.4.3.jar 
 
####################################################################################################################################
4. [Hadoop]Could not connect to remote log4j server  忽略 
 
####################################################################################################################################
5. 由于使用的Hadoop1.0.0,HBase0,90.5并不支持这个版本,需要替换相关的Jar包: hbase/lib下找到:hadoop-core-0.20-append-r1056497.jar 后删除它。 从Hadoop/lib下找到hadoop-core-1.0.0.jar和commons-configuration-1.6.jar,并拷贝到hbase/lib下。 
 
####################################################################################################################################
6. [hbase shell]ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 times 

A:如果rootdir配置的是使用hdfs,检查是否hdfs服务进程没有打开,请启动hadoop。 

B:确认hadoop已经启动,请检查hbase和hadoop的版本是否不一致,将hbase/lib目录下的hadoop-core的jar文件删除,    将hadoop目录下的hadoop-0.20.2-core.jar拷贝到hbase/lib下面,然后重新启动hbase即可。 

C:可能hadoop启动的事安全模式,执行 :/bin$ hadoop dfsadmin -safemode leave   即可。 
 
####################################################################################################################################
7. Hbase某个节点单独启动HRegionServer报错
错误:starting regionserver, logging to /data/java/hbase-0.90.3/logs/hbase-root-regionserver-SFserver25.localdomain.out
Exception in thread "regionserver60020" java.lang.NullPointerException
        at org.apache.hadoop.hbase.regionserver.HRegionServer.join(HRegionServer.java:1417)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:683)
        at java.lang.Thread.run(Thread.java:662)

有以下原因:1、hbase中服务器时间不同步造成的regionserver启动失败
rg.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server hadoop-node6,60020,1337908009841 has been rejected; Reported time is too far out of sync with master. Time difference of 882788ms > max allowed of 30000ms

方案1
在hbase-site.xml添加配置
<property>
<name>hbase.master.maxclockskew</name>
<value>180000</value>
<description>Time difference of regionserver from master</description>
</property>

方案2

错误里指出节点机的时间和master的时间差距大于30000ms,就是30秒时无法启动服务。
修改各结点时间,使其误差在30s内

####################################################################################################################################
8. unexpected error, closing socket connection and attempting reconnect

方案: service iptables stop

####################################################################################################################################
9. ERROR: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to
ZooKeeper but the connection closes immediately. This could be a sign that the server has too many
connections (30 is the default).

需要修改 limits.conf

vi /etc/security/limits.conf

在最后添加两行:

hdfs  -       nofile  32768
hbase  -       nofile  32768

重启下 hbase
[root@master bin]# ./stop-hbase.sh 
stopping hbase.............................
[root@master bin]# ./start-hbase.sh
(如果stop-hbase一直处于………………的状态,怎么办?我教你一个方法,先去重新start-hbase,肯定说hbase还没有停止,需要先停止,给你一个PID,哈哈,之后你就kill -9 pid,在执行start-hbase.sh)

hbase-site.xml的配置
<configuration>
        <property>
                <name>/tmp/hbase-${user.name}</name>
                <value>file:///home/bell/software/HBase/hbase-0.90.4/data</value>
        </property>
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
</configuration>

####################################################################################################################################
10.执行$ hbase hbck 命令时,出现以下提示:
Invalid maximum heap size: -Xmx4096m
The specified size exceeds the maximum representable size.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
原因:jvm设置的内存过大,减小配置文件hbase-env.sh内的设置即可。例如:
export HBASE_HEAPSIZE=1024

####################################################################################################################################
11. 无法启动hbase,regionserver log里会有这样的错误,zookeeper也有初始化问题的错误
FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 10.210.70.57,60020,1340088145399: Initialization of RS failed. Hence aborting RS.
因为之前安装配置的时候是好好的,中间经历过强行kill daemon的过程,又是报错初始化问题,所以估计是有缓存影响了,所以清理了tmp里的数据,然后发现HRegionServer依然无法启动,不过还好的是zookeeper启动了,一怒之下把hdfs里的hbase数据也都清理了,同时再清理tmp,检查各个节点是否有残留hbase进程,kill掉,重启hbase,然后这个世界都正常了。不知道具体哪里影响了,不推荐这种暴力解决办法,如果有谁知道原因请告之。

####################################################################################################################################
12. 无法启动reginserver daemon,报错如下:
Exception in thread “main” java.lang.RuntimeException: Failed construction of Regionserver: class org.apache.hadoop.hbase.regionserver.HRegionServer

Caused by: java.net.BindException: Problem binding to /10.210.70.57:60020 : Cannot assign requested address
根据错误提示,检查ip对应的机器是否正确,如果出错机器的ip正确,检查60020端口是否被占用。

####################################################################################################################################
13. 执行hbase程序orshell命令出现如下提示:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hbase-0.92.1/lib/slf4j-log4j12-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop-1.0.3/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
是因为hbase和hadoop里都有这个jar包,选择其一移除即可。

####################################################################################################################################
14. 执行hbase的mapreduce作业,有些节点无任何报错正常执行,有些节点总报类似Status : FAILED
java.lang.NullPointerException的错误,查看tasktracker的log日志有如下错误:
WARN org.apache.zookeeper.ClientCnxn: Session 0×0 for server null, unexpected error, closing socket connection and attempting reconnect

caused by java.net.ConnectException: Connection refused
官方对这个错误给了说明,
Errors like this… are either due to ZooKeeper being down, or unreachable due to network issues.
当初配置zookeeper时只说尽量配置奇数节点防止down掉一个节点无法选出leader,现在看这个问题貌似所以想执行任务的节点都必须配置zookeeper啊。

####################################################################################################################################
15. 报告找不到方法异常,但是报告的函数并非自己定义的,也并没有调用这样的函数,类似信息如下:
java.lang.RuntimeException: java.lang.NoSuchMethodException: com.google.hadoop.examples.Simple$MyMapper.()
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:45)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:32)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:53)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:209)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1210)
Caused by: java.lang.NoSuchMethodException: com.google.hadoop.examples.Simple$MyMapper.()
at java.lang.Class.getConstructor0(Class.java:2705)
at java.lang.Class.getDeclaredConstructor(Class.java:1984)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:41)
… 4 more
网上找到解决方法如下:
This is actually the() function. The display on the web page doesn’t translate into html, but dumps plain text, sois treated as a (nonexistant) tag by your browser. This function is created as a default initializer for non-static classes. This is most likely caused by having a non-static Mapper or Reducer class. Try adding the static keyword to your class declaration, ie:
意思是缺少static关键字~添加上即可。如下:
public static class MyMapper extends MapReduceBase implements Mapper {…}

####################################################################################################################################
16. 使用mapreduce程序写HFile操作hbase时,可能会有这样的错误:

java.lang.IllegalArgumentException: Can’t read partitions file

Caused by: java.io.IOException: wrong key class: org.apache.hadoop.io.*** is not class org.apache.hadoop.hbase.io.ImmutableBytesWritable
这里需要注意的是无论是map还是reduce作为最终的输出结果,输出的key和value的类型应该是:< ImmutableBytesWritable, KeyValue> 或者< ImmutableBytesWritable, Put>。改成这样的类型就行了。

####################################################################################################################################
17. 如果启动hbase集群出现regionserver无法启动,日志报告如下类似错误时,说明是集群的时间不同步,只需要同步即可解决。
FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 10.210.78.22,60020,1344329095415: Unhandled exceptio
n: org.apache.hadoop.hbase.ClockOutOfSyncException: Server 10.210.78.22,60020,1344329095415 has been rejected; Reported time is too far out of sync with mast
er. Time difference of 90358ms > max allowed of 30000ms
org.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server 10.210.78.22,60020,1344329095415 has been rejected;
Reported time is too far out of sync with master. Time difference of 90358ms > max allowed of 30000ms
……
Caused by: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server 10.210.78.22,60020,1344329095415 has been rejected;
Reported time is too far out of sync with master. Time difference of 90358ms > max allowed of 30000ms
只需要执行一下这条命令即可同步国际时间:
/usr/sbin/ntpdate tick.ucla.edu tock.gpsclock.com ntp.nasa.gov timekeeper.isi.edu usno.pa-x.dec.com;/sbin/hwclock –systohc > /dev/null


发布了14 篇原创文章 · 获赞 5 · 访问量 9万+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章