【kettle集成cdh6.1】hadoop file output瀏覽目錄報錯:java.lang.NoClassDefFoundError: com/ctc/wstx/io/SystemId

【kettle集成cdh6.1】外部數據源讀寫hdfs若干錯

前言

最近試着上手了一下kettle,搭建過程很簡單,就是下載個包解壓一下,但是在配置數據源的過程中着實踩了不少坑,這裏記錄一下。

環境

這裏介紹一下幾個組件的版本

kettle: 8.0
CDH: 6.1.0
HADOOP: 3.0.0
MYSQL: 5.5.62

報錯

在此之前,我已經從CDH HDFS管理頁面將所需要的core-site.xml、hdfs-site.xml等文件下載並放置至相應的插件位置,又從HADOOP在裏將hadoop-client-3.0.0-cdh6.1.0.jar、hadoop-common-3.0.0-cdh6.1.0.jar等jar包下載並放置至lib文件夾中,像網上通用的教程我基本上跟着都做了一遍,這裏再貼一下CDH集羣配置:
在這裏插入圖片描述
hdfs用戶密碼我沒有寫,因爲我壓根不知道密碼(後來的操作也不受影響),點擊測試一下:
在這裏插入圖片描述
看起來沒什麼大問題,但是當我將mysql的數據往hdfs上寫,打算瀏覽一下hhdfs目錄的時候,報錯了:
在這裏插入圖片描述
報錯明細:

無法打開這個步驟窗口
java.lang.NoClassDefFoundError: com/ctc/wstx/io/SystemId
	at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2825)
	at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2814)
	at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2865)
	at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2839)
	at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2716)
	at org.apache.hadoop.conf.Configuration.set(Configuration.java:1353)
	at org.apache.hadoop.conf.Configuration.set(Configuration.java:1325)
	at org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem.resolveFile(HdfsFileSystem.java:116)
	at org.apache.commons.vfs2.provider.AbstractOriginatingFileProvider.findFile(AbstractOriginatingFileProvider.java:84)
	at org.apache.commons.vfs2.provider.AbstractOriginatingFileProvider.findFile(AbstractOriginatingFileProvider.java:64)
	at org.apache.commons.vfs2.impl.DefaultFileSystemManager.resolveFile(DefaultFileSystemManager.java:790)
	at org.pentaho.di.core.vfs.ConcurrentFileSystemManager.resolveFile(ConcurrentFileSystemManager.java:91)
	at org.apache.commons.vfs2.impl.DefaultFileSystemManager.resolveFile(DefaultFileSystemManager.java:712)
	at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:152)
	at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:107)
	at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:103)
	at org.pentaho.big.data.kettle.plugins.hdfs.trans.HadoopFileOutputDialog$29.widgetSelected(HadoopFileOutputDialog.java:1207)
	at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
	at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
	at org.eclipse.swt.widgets.Display.sendEvent(Unknown Source)
	at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
	at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
	at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
	at org.pentaho.big.data.kettle.plugins.hdfs.trans.HadoopFileOutputDialog.open(HadoopFileOutputDialog.java:1316)
	at org.pentaho.di.ui.spoon.delegates.SpoonStepsDelegate.editStep(SpoonStepsDelegate.java:127)
	at org.pentaho.di.ui.spoon.Spoon.editStep(Spoon.java:8728)
	at org.pentaho.di.ui.spoon.trans.TransGraph.editStep(TransGraph.java:3214)
	at org.pentaho.di.ui.spoon.trans.TransGraph.mouseDoubleClick(TransGraph.java:780)
	at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
	at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
	at org.eclipse.swt.widgets.Display.sendEvent(Unknown Source)
	at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
	at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
	at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
	at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1366)
	at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7984)
	at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:9245)
	at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:692)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.pentaho.commons.launcher.Launcher.main(Launcher.java:92)
Caused by: java.lang.ClassNotFoundException: com.ctc.wstx.io.SystemId
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

問題分析

筆者一看這報錯信息,不就是找不到一個名叫com.ctc.wstx.io.SystemId的類麼,找來不就得了!
於是去往namenode所在節點的hadoop的lib目錄下一瞧,真有“woodstox-core-5.0.3.jar”這個包:
在這裏插入圖片描述
毫無疑問,將其拖下放入kettle的lib目錄並重啓kettle,然後再查看一下hdfs的目錄,這回報了個新錯:
在這裏插入圖片描述
報錯明細:

無法打開這個步驟窗口
java.lang.NoSuchMethodError: com.ctc.wstx.io.StreamBootstrapper.getInstance(Ljava/lang/String;Lcom/ctc/wstx/io/SystemId;Ljava/io/InputStream;)Lcom/ctc/wstx/io/StreamBootstrapper;
	at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2831)
	at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2814)
	at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2865)
	at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2839)
	at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2716)
	at org.apache.hadoop.conf.Configuration.set(Configuration.java:1353)
	at org.apache.hadoop.conf.Configuration.set(Configuration.java:1325)
	at org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem.resolveFile(HdfsFileSystem.java:116)
	at org.apache.commons.vfs2.provider.AbstractOriginatingFileProvider.findFile(AbstractOriginatingFileProvider.java:84)
	at org.apache.commons.vfs2.provider.AbstractOriginatingFileProvider.findFile(AbstractOriginatingFileProvider.java:64)
	at org.apache.commons.vfs2.impl.DefaultFileSystemManager.resolveFile(DefaultFileSystemManager.java:790)
	at org.pentaho.di.core.vfs.ConcurrentFileSystemManager.resolveFile(ConcurrentFileSystemManager.java:91)
	at org.apache.commons.vfs2.impl.DefaultFileSystemManager.resolveFile(DefaultFileSystemManager.java:712)
	at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:152)
	at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:107)
	at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:103)
	at org.pentaho.big.data.kettle.plugins.hdfs.trans.HadoopFileOutputDialog$29.widgetSelected(HadoopFileOutputDialog.java:1207)
	at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
	at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
	at org.eclipse.swt.widgets.Display.sendEvent(Unknown Source)
	at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
	at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
	at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
	at org.pentaho.big.data.kettle.plugins.hdfs.trans.HadoopFileOutputDialog.open(HadoopFileOutputDialog.java:1316)
	at org.pentaho.di.ui.spoon.delegates.SpoonStepsDelegate.editStep(SpoonStepsDelegate.java:127)
	at org.pentaho.di.ui.spoon.Spoon.editStep(Spoon.java:8728)
	at org.pentaho.di.ui.spoon.trans.TransGraph.editStep(TransGraph.java:3214)
	at org.pentaho.di.ui.spoon.trans.TransGraph.mouseDoubleClick(TransGraph.java:780)
	at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
	at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
	at org.eclipse.swt.widgets.Display.sendEvent(Unknown Source)
	at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
	at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
	at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
	at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1366)
	at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7984)
	at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:9245)
	at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:692)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.pentaho.commons.launcher.Launcher.main(Launcher.java:92)

這回不是報找不到類com.ctc.wstx.io.StreamBootstrapper,而是報找不到類com.ctc.wstx.io.StreamBootstrapper中的getInstance方法了,筆者找來反編譯軟件進去看了看:
在這裏插入圖片描述
What fuck?該有的方法一個不少的好吧!看來問題絕不是缺包、包衝突那麼簡單,冷靜下來想想(實則慌的一批,臨近過年,老子還想早點出去神遊嬉戲一番),似是數據源版本和kettle之間的問題。
改變了思路之後,筆者又經過一番查閱之後,終於通過換Keettle版本的方式搞定了這個棘手的問題,效果如下:
在這裏插入圖片描述

解決辦法

不管是7.版本的kettle還是8.0的kettle,我們從“\data-integration\plugins\pentaho-big-data-plugin\hadoop-configurations”路徑下看只能看到對hadoop2.、cdh5.*的支持,因此要想支持cdh6.1唯有自己尋找插件,或者是使用更高版本的kettle,這裏貼一下kettle下載路徑:

https://sourceforge.net/projects/pentaho/files/Pentaho%208.3/client-tools/

訪問url下拉至底部,下載pdi-ce-8.3.0.0-371.zip:
在這裏插入圖片描述
cdh6.1-hadoop3.0插件url:

https://sourceforge.net/projects/pentaho/files/Pentaho%208.3/shims/

下拉找到pentaho-hadoop-shims-cdh61-package-8.3.2019.05.00-371-dist.zip點擊下載:
在這裏插入圖片描述

後記

通過這個例子可以看出,版本選不對,後患是無窮,大家在選擇組件的時候一定要擦亮眼睛選互相兼容的組件~

發佈了48 篇原創文章 · 獲贊 71 · 訪問量 20萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章