cdh无法启动cloudera-scm-server 通过重装cdh集群修复原cdh集群

1 由于cdh集群突然断电,导致五台服务器中cloudera-scm-server无法运行和某台虚拟机的Centos7系统进不去。虽然利用笔记本上的虚拟机重搭CM和cdh集群导出元数据库脚本,以使原cdh集群的mysql中恢复其元数据库scmdb,但cloudera-scm-server启动后立刻挂掉。

1.1 我按照这篇文中的方法:https://stackoverflow.com/questions/41340949/answer/submit
[root@master ~]# sudo /opt/cm-5.11.2/share/cmf/schema/scm_prepare_database.sh mysql scmdb root 'xxx'
JAVA_HOME=/usr/java/jdk1.8.0_171
Verifying that we can write to /opt/cm-5.11.2/etc/cloudera-scm-server
Creating SCM configuration file in /opt/cm-5.11.2/etc/cloudera-scm-server
Executing:  /usr/java/jdk1.8.0_171/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/java/oracle-connector-java.jar:/opt/cm-5.11.2/share/cmf/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /opt/cm-5.11.2/etc/cloudera-scm-server/db.properties com.cloudera.cmf.db.
Mon Jul 01 17:28:08 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
[                          main] DbCommandExecutor              INFO  Successfully connected to database.
All done, your SCM database is configured correctly!

1.2 重启server端
[root@master ~]# /opt/cm-5.11.2/etc/init.d/cloudera-scm-server start
Starting cloudera-scm-server:                              [  确定  ]
[root@master ~]# /opt/cm-5.11.2/etc/init.d/cloudera-scm-server status
cloudera-scm-server 已死,但 pid 文件存在
1.3 查看server端日志:

2019-07-01 17:32:30,405 ERROR main:com.cloudera.server.cmf.bootstrap.EntityManagerFactoryBean:
***************************************************************************
* This version of Cloudera Manager does not support upgrade from CM       *
* version 3.x. You need to follow steps outlined in                       *
* http://tiny.cloudera.com/downgrade-cm5 to downgrade CM and then upgrade *
* to CM 4.x. You also need to upgrade to CDH4 (using CM 4.x) before you   *
* can upgrade to the latest CM version.                                   *
***************************************************************************
2019-07-01 17:32:30,406 INFO main:org.springframework.beans.factory.support.DefaultListableBeanFactory: Destroying singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@bccb269: defining beans [commandLineConfigurationBean,entityManagerFactoryBean,com.cloudera.server.cmf.TrialState,com.cloudera.server.cmf.TrialManager,com.cloudera.cmf.crypto.LicenseLoader]; root of factory hierarchy
2019-07-01 17:32:30,407 ERROR main:com.cloudera.server.cmf.Main: Server failed.
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'com.cloudera.server.cmf.TrialState': Cannot resolve reference to bean 'entityManagerFactoryBean' while setting constructor argument; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'entityManagerFactoryBean': FactoryBean threw exception on object creation; nested exception is java.lang.RuntimeException: Upgrade not allowed from CM3.x.
        at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:328)
        at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveValueIfNecessary(BeanDefinitionValueResolver.java:106)
        at org.springframework.beans.factory.support.ConstructorResolver.resolveConstructorArguments(ConstructorResolver.java:616)
        at org.springframework.beans.factory.support.ConstructorResolver.autowireConstructor(ConstructorResolver.java:148)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.autowireConstructor(AbstractAutowireCapableBeanFactory.java:1003)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:907)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:485)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:456)
        at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:293)
        at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222)
        at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:290)
        at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:192)
        at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:585)
        at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:895)
        at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:425)
        at com.cloudera.server.cmf.Main.bootstrapSpringContext(Main.java:387)
        at com.cloudera.server.cmf.Main.<init>(Main.java:242)
        at com.cloudera.server.cmf.Main.main(Main.java:216)
Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'entityManagerFactoryBean': FactoryBean threw exception on object creation; nested exception is java.lang.RuntimeException: Upgrade not allowed from CM3.x.
        at org.springframework.beans.factory.support.FactoryBeanRegistrySupport.doGetObjectFromFactoryBean(FactoryBeanRegistrySupport.java:149)
        at org.springframework.beans.factory.support.FactoryBeanRegistrySupport.getObjectFromFactoryBean(FactoryBeanRegistrySupport.java:102)
        at org.springframework.beans.factory.support.AbstractBeanFactory.getObjectForBeanInstance(AbstractBeanFactory.java:1440)
        at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:247)
        at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:192)
        at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:322)
        ... 17 more
Caused by: java.lang.RuntimeException: Upgrade not allowed from CM3.x.

at com.cloudera.server.cmf.bootstrap.EntityManagerFactoryBean.CM3Fail(EntityManagerFactoryBean.java:327)
        at com.cloudera.server.cmf.bootstrap.EntityManagerFactoryBean.checkVersionDoFail(EntityManagerFactoryBean.java:274)
        at com.cloudera.server.cmf.bootstrap.EntityManagerFactoryBean.getObject(EntityManagerFactoryBean.java:127)
        at com.cloudera.server.cmf.bootstrap.EntityManagerFactoryBean.getObject(EntityManagerFactoryBean.java:65)
        at org.springframework.beans.factory.support.FactoryBeanRegistrySupport.doGetObjectFromFactoryBean(FactoryBeanRegistrySupport.java:142)
        ... 22 more

请问如何解决?若重装CM,我这集群中的HBase数仓数据怎么办?

-------------------------------------------下面开始漫长的重装以及修复工作-------------------------------------------------------------------

2 重装CM和cdh(安装zookeeper、hdfs 、yarn组件)。在安装hdfs时,想到原namenode所在机器的配置低,本次重装何不趁机换台机器(注意:此种做法最终被证实不应该更换namenode所在的机器)

2.1 若是重装hdfs组件时更改了namenode,那么启动hdfs时报错以及解决方案,见下文:

https://www.cnblogs.com/gaojiang/p/8418780.html

花了九牛二虎之力,虽然hdfs集群正常了。

2.2 紧接着安装hbase(最后被证实应该先安装phoenix,原因见下文hbase修复过程)但hbase却无数据,那就修复呗。

2.2.1 安装完hbase组件后,启动报错如下:

经查hbase在hdfs上的数据目录,发现如下图

注意:此处只需运行:hdfs dfs -chown -R hbase:hbase /hbase,不需要用chmod修改属性。在下文的操作过程中,多次出现/hbase/.tmp又变回了hdfs:hbase权限,在此运行上面这一条条命令就行。

2.2.2 关于hbase读写超时问题:可参考下文:

https://blog.csdn.net/weixin_33797791/article/details/91669906

不过我是将超时时间延长,如下操作:

此处若不修改zookeeper最大会话超时(默认40000)保存修改后报错。

下图为第12页的“Service Monitor 客户端配置替代”配置项修改和添加内容:

1)修改zookeeper.session.timeout由默认30000改为1200000

2)修改hbase.rpc.timeout由默认10000改为600000

3)添加hbase.client.scanner.timeout.period为600000

2.2.3 hbase修复问题,如下命令均已尝试:
sudo -u'hbase' hbase hbck -repair EXXXXXX1
sudo -u'hbase' hbase hbck -fixAssignments
sudo -u'hbase' hbase hbck -fixMeta
sudo -u hbase hbase hbck -repair
sudo -u hbase hbase hbck -repairHoles
sudo -u hbase hbase hbck -fixHdfsHoles -fixMeta

可奇怪的现象就发生了:

2.2.3.1 hbase的数据目录hdfs上/hbase/data/default/EXXXXXX1目录下表明该表明明有数据(如图一),为什么hbase shell无法查到报错(如图二),修复时也报错(如图三)

网友说是hbase遇到了RIT问题,我的hbase版本Version 1.2.0-cdh5.11.2,若是由于RIT问题,按常理用hbase hbck –repair就可以修复,实际上无法修复。

2.2.3.2 最后我发现,凡是被phoenix映射(视图、表、二级索引)了的hbase表,其数据全部丢失(证据如图四)且修复时报错如上;

凡是没被phoenix映射的hbase表,经过以上修复命令已不知不觉恢复正常了。

综述,我的现状是1)凡是被phoenix映射了的表,其Region均为not online。
2)凡是没被phoenix映射了的表,其Region均为online。

2.2.3.3 这时让我纳闷的是为什么凡是曾经被phoenix映射了的其数据都无法修复,是因为phoenix所用的SYSTEM.CATALOG也还没有修复造成的这现象???

结合上述现象,我想到是否和被phoenix映射有关吗?于是不情愿地把原来的phoenix也安装上吧,cdh上安装phoenix教程如下图:

就在此刻奇迹发生了:那些无法通过hbase hbck的六种修复命令修复的hbase表恢复正常了。

综述,原集群由于断电,我在原集群上重新搭建CM和cdh,注意:namenode角色实例必须与原先一模一样,不过datanode由原先5台(由于其中一台在虚拟机上,其Centos系统进不去,报错如下几张图)选择了4台。然后重新搭建hbase其数据目录必须与原先一样。在修复hbase数据时,由于被phoenix映射了的表无法通过hbase hbck六种命令修复,此时只需把原先的phoenix安装上,无须任何操作即可在hbase shell和phoenix shell下操作此表。

3 测试kafka时,发现原topic仅能通过kafka命令查看所有的topic

./kafka-topics.sh --list --zookeeper master:2181,worker:2181,worker2:2181

和zookeeper命令查看某topic某消费组的offerset

./zkCli.sh  -server worker2:2181然后get /consumers/test_group/offsets/test_topic/0

问:为什么查看原topic和新topic的消费条数均卡在此处???

网友说是因为其vi /opt/cloudera/parcels/KAFKA-0.8.2.0-1.kafka1.4.0.p0.56/etc/kafka/conf.dist/server.properties中的log.dirs缺失!!!

4 虚拟机上其Centos7系统进不去

此救援模式就给出三种命令,输入root密码后,1)输入systemctl reboot进来还是此页面。2)输入journalctl -xb查看日志,日志中的时间竟然也是错的,如下图

试着翻页往下再看看吧,完全看不懂,就找到一处“标红”的信息,如下图

此台机器决定不搞了,准备到时候重装此虚拟机。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章