hue上workflow之shell-sqoop腳本配置實錄

workflow之shell-sqoop腳本實錄

前面講解了sqoop的密碼明文問題解決與sqoop導入分庫分表mysql數據問題解決,那麼這裏就詳細介紹下在hue上配置shell-sqoop腳本時所遇到的問題!這裏的shell腳本會以上篇的腳本爲例!

一、配置hue的workflow

在這裏插入圖片描述

二、所遇問題

2.1 不能加載mysql驅動

報錯如下:

ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver
java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver
	at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:874)
	at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
	at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:762)
	at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:785)
	at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:288)
	at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:259)
	at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:245)
	at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:333)
	at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1858)
	at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1657)
	at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106)
	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:494)
	at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:252)

問題解決:由於是使用的shell腳本進行的導入所以回去SQOOP_HOME的lib目錄下去加載mysql驅動,所以這裏只需要把mysql驅動拷貝到sqoop的lib目錄下,然後重啓服務就好了。操作過程如下

# 獲取驅動
wget http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.46.tar.gz/from/http://cdn.mysql.com/
# 拷貝驅動到SQOOP_HOME的lib目錄下
cp /usr/share/java/mysql-connector-java-5.1.46.jar /data/cloudera/parcels/CDH-5.14.4-1.cdh5.14.4.p0.3/lib/sqoop/lib
# 分發每臺機器,xsync這裏是自定義的分發腳本,參考之前博客hadoop,也可使用scp命名去進行分發
xsync /data/cloudera/parcels/CDH-5.14.4-1.cdh5.14.4.p0.3/lib/sqoop/lib/mysql-connector-java-5.1.46.jar

2.2 調度時報錯文件不存在

報錯如下:

org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://nameservice1/user/admin/.staging/job_1590901453442_0029/job.splitmetainfo
	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1580)
	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1444)
	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1402)
	at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1366)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1142)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1573)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1569)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1502)
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://nameservice1/user/admin/.staging/job_1590901453442_0029/job.splitmetainfo
	at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1269)
	at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1261)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1261)
	at org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:51)
	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1575)
	... 17 more

問題解決:由於hue執行shell腳本時是隨機發往一臺集羣中的機器執行,而當我們運行該任務時,任務會在當前hue用戶下創建job等任務信息,但是在下發至集羣機器執行時沒有指定執行任務的用戶使其無法獲取任務信息從而提示文件不存在。具體解決如下:

HADOOP_USER_NAME=${wf:user()}

在這裏插入圖片描述

2.3 oozie調度腳本java.io.IOException: output.properties data exceeds its limit [2048]

shell腳本中一次提交的作業量太大,其中包含的信息超過oozie launcher一次容許的最大值2K(2K是默認值)

問題解決:CDH集羣中修改 oozie-site.xml 的 Oozie Server 高級配置代碼段(安全閥),然後重啓,修改如下:

<property>
<name>oozie.action.max.output.data</name>
<value>204800</value>
</property>

在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章