cdh無法啓動cloudera-scm-server 通過重裝cdh集羣修復原cdh集羣

1 由於cdh集羣突然斷電,導致五臺服務器中cloudera-scm-server無法運行和某臺虛擬機的Centos7系統進不去。雖然利用筆記本上的虛擬機重搭CM和cdh集羣導出元數據庫腳本,以使原cdh集羣的mysql中恢復其元數據庫scmdb,但cloudera-scm-server啓動後立刻掛掉。

1.1 我按照這篇文中的方法:https://stackoverflow.com/questions/41340949/answer/submit
[root@master ~]# sudo /opt/cm-5.11.2/share/cmf/schema/scm_prepare_database.sh mysql scmdb root 'xxx'
JAVA_HOME=/usr/java/jdk1.8.0_171
Verifying that we can write to /opt/cm-5.11.2/etc/cloudera-scm-server
Creating SCM configuration file in /opt/cm-5.11.2/etc/cloudera-scm-server
Executing:  /usr/java/jdk1.8.0_171/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/java/oracle-connector-java.jar:/opt/cm-5.11.2/share/cmf/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /opt/cm-5.11.2/etc/cloudera-scm-server/db.properties com.cloudera.cmf.db.
Mon Jul 01 17:28:08 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
[                          main] DbCommandExecutor              INFO  Successfully connected to database.
All done, your SCM database is configured correctly!

1.2 重啓server端
[root@master ~]# /opt/cm-5.11.2/etc/init.d/cloudera-scm-server start
Starting cloudera-scm-server:                              [  確定  ]
[root@master ~]# /opt/cm-5.11.2/etc/init.d/cloudera-scm-server status
cloudera-scm-server 已死,但 pid 文件存在
1.3 查看server端日誌:

2019-07-01 17:32:30,405 ERROR main:com.cloudera.server.cmf.bootstrap.EntityManagerFactoryBean:
***************************************************************************
* This version of Cloudera Manager does not support upgrade from CM       *
* version 3.x. You need to follow steps outlined in                       *
* http://tiny.cloudera.com/downgrade-cm5 to downgrade CM and then upgrade *
* to CM 4.x. You also need to upgrade to CDH4 (using CM 4.x) before you   *
* can upgrade to the latest CM version.                                   *
***************************************************************************
2019-07-01 17:32:30,406 INFO main:org.springframework.beans.factory.support.DefaultListableBeanFactory: Destroying singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@bccb269: defining beans [commandLineConfigurationBean,entityManagerFactoryBean,com.cloudera.server.cmf.TrialState,com.cloudera.server.cmf.TrialManager,com.cloudera.cmf.crypto.LicenseLoader]; root of factory hierarchy
2019-07-01 17:32:30,407 ERROR main:com.cloudera.server.cmf.Main: Server failed.
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'com.cloudera.server.cmf.TrialState': Cannot resolve reference to bean 'entityManagerFactoryBean' while setting constructor argument; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'entityManagerFactoryBean': FactoryBean threw exception on object creation; nested exception is java.lang.RuntimeException: Upgrade not allowed from CM3.x.
        at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:328)
        at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveValueIfNecessary(BeanDefinitionValueResolver.java:106)
        at org.springframework.beans.factory.support.ConstructorResolver.resolveConstructorArguments(ConstructorResolver.java:616)
        at org.springframework.beans.factory.support.ConstructorResolver.autowireConstructor(ConstructorResolver.java:148)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.autowireConstructor(AbstractAutowireCapableBeanFactory.java:1003)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:907)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:485)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:456)
        at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:293)
        at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222)
        at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:290)
        at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:192)
        at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:585)
        at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:895)
        at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:425)
        at com.cloudera.server.cmf.Main.bootstrapSpringContext(Main.java:387)
        at com.cloudera.server.cmf.Main.<init>(Main.java:242)
        at com.cloudera.server.cmf.Main.main(Main.java:216)
Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'entityManagerFactoryBean': FactoryBean threw exception on object creation; nested exception is java.lang.RuntimeException: Upgrade not allowed from CM3.x.
        at org.springframework.beans.factory.support.FactoryBeanRegistrySupport.doGetObjectFromFactoryBean(FactoryBeanRegistrySupport.java:149)
        at org.springframework.beans.factory.support.FactoryBeanRegistrySupport.getObjectFromFactoryBean(FactoryBeanRegistrySupport.java:102)
        at org.springframework.beans.factory.support.AbstractBeanFactory.getObjectForBeanInstance(AbstractBeanFactory.java:1440)
        at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:247)
        at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:192)
        at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:322)
        ... 17 more
Caused by: java.lang.RuntimeException: Upgrade not allowed from CM3.x.

at com.cloudera.server.cmf.bootstrap.EntityManagerFactoryBean.CM3Fail(EntityManagerFactoryBean.java:327)
        at com.cloudera.server.cmf.bootstrap.EntityManagerFactoryBean.checkVersionDoFail(EntityManagerFactoryBean.java:274)
        at com.cloudera.server.cmf.bootstrap.EntityManagerFactoryBean.getObject(EntityManagerFactoryBean.java:127)
        at com.cloudera.server.cmf.bootstrap.EntityManagerFactoryBean.getObject(EntityManagerFactoryBean.java:65)
        at org.springframework.beans.factory.support.FactoryBeanRegistrySupport.doGetObjectFromFactoryBean(FactoryBeanRegistrySupport.java:142)
        ... 22 more

請問如何解決?若重裝CM,我這集羣中的HBase數倉數據怎麼辦?

-------------------------------------------下面開始漫長的重裝以及修復工作-------------------------------------------------------------------

2 重裝CM和cdh(安裝zookeeper、hdfs 、yarn組件)。在安裝hdfs時,想到原namenode所在機器的配置低,本次重裝何不趁機換臺機器(注意:此種做法最終被證實不應該更換namenode所在的機器)

2.1 若是重裝hdfs組件時更改了namenode,那麼啓動hdfs時報錯以及解決方案,見下文:

https://www.cnblogs.com/gaojiang/p/8418780.html

花了九牛二虎之力,雖然hdfs集羣正常了。

2.2 緊接着安裝hbase(最後被證實應該先安裝phoenix,原因見下文hbase修復過程)但hbase卻無數據,那就修復唄。

2.2.1 安裝完hbase組件後,啓動報錯如下:

經查hbase在hdfs上的數據目錄,發現如下圖

注意:此處只需運行:hdfs dfs -chown -R hbase:hbase /hbase,不需要用chmod修改屬性。在下文的操作過程中,多次出現/hbase/.tmp又變回了hdfs:hbase權限,在此運行上面這一條條命令就行。

2.2.2 關於hbase讀寫超時問題:可參考下文:

https://blog.csdn.net/weixin_33797791/article/details/91669906

不過我是將超時時間延長,如下操作:

此處若不修改zookeeper最大會話超時(默認40000)保存修改後報錯。

下圖爲第12頁的“Service Monitor 客戶端配置替代”配置項修改和添加內容:

1)修改zookeeper.session.timeout由默認30000改爲1200000

2)修改hbase.rpc.timeout由默認10000改爲600000

3)添加hbase.client.scanner.timeout.period爲600000

2.2.3 hbase修復問題,如下命令均已嘗試:
sudo -u'hbase' hbase hbck -repair EXXXXXX1
sudo -u'hbase' hbase hbck -fixAssignments
sudo -u'hbase' hbase hbck -fixMeta
sudo -u hbase hbase hbck -repair
sudo -u hbase hbase hbck -repairHoles
sudo -u hbase hbase hbck -fixHdfsHoles -fixMeta

可奇怪的現象就發生了:

2.2.3.1 hbase的數據目錄hdfs上/hbase/data/default/EXXXXXX1目錄下表明該表明明有數據(如圖一),爲什麼hbase shell無法查到報錯(如圖二),修復時也報錯(如圖三)

網友說是hbase遇到了RIT問題,我的hbase版本Version 1.2.0-cdh5.11.2,若是由於RIT問題,按常理用hbase hbck –repair就可以修復,實際上無法修復。

2.2.3.2 最後我發現,凡是被phoenix映射(視圖、表、二級索引)了的hbase表,其數據全部丟失(證據如圖四)且修復時報錯如上;

凡是沒被phoenix映射的hbase表,經過以上修復命令已不知不覺恢復正常了。

綜述,我的現狀是1)凡是被phoenix映射了的表,其Region均爲not online。
2)凡是沒被phoenix映射了的表,其Region均爲online。

2.2.3.3 這時讓我納悶的是爲什麼凡是曾經被phoenix映射了的其數據都無法修復,是因爲phoenix所用的SYSTEM.CATALOG也還沒有修復造成的這現象???

結合上述現象,我想到是否和被phoenix映射有關嗎?於是不情願地把原來的phoenix也安裝上吧,cdh上安裝phoenix教程如下圖:

就在此刻奇蹟發生了:那些無法通過hbase hbck的六種修復命令修復的hbase表恢復正常了。

綜述,原集羣由於斷電,我在原集羣上重新搭建CM和cdh,注意:namenode角色實例必須與原先一模一樣,不過datanode由原先5臺(由於其中一臺在虛擬機上,其Centos系統進不去,報錯如下幾張圖)選擇了4臺。然後重新搭建hbase其數據目錄必須與原先一樣。在修復hbase數據時,由於被phoenix映射了的表無法通過hbase hbck六種命令修復,此時只需把原先的phoenix安裝上,無須任何操作即可在hbase shell和phoenix shell下操作此表。

3 測試kafka時,發現原topic僅能通過kafka命令查看所有的topic

./kafka-topics.sh --list --zookeeper master:2181,worker:2181,worker2:2181

和zookeeper命令查看某topic某消費組的offerset

./zkCli.sh  -server worker2:2181然後get /consumers/test_group/offsets/test_topic/0

問:爲什麼查看原topic和新topic的消費條數均卡在此處???

網友說是因爲其vi /opt/cloudera/parcels/KAFKA-0.8.2.0-1.kafka1.4.0.p0.56/etc/kafka/conf.dist/server.properties中的log.dirs缺失!!!

4 虛擬機上其Centos7系統進不去

此救援模式就給出三種命令,輸入root密碼後,1)輸入systemctl reboot進來還是此頁面。2)輸入journalctl -xb查看日誌,日誌中的時間竟然也是錯的,如下圖

試着翻頁往下再看看吧,完全看不懂,就找到一處“標紅”的信息,如下圖

此臺機器決定不搞了,準備到時候重裝此虛擬機。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章