通過java api併發調用sqoop,發現如下相關異常
2020-07-03 15:10:44 [ pool-1-thread-6:350039 ] - [ ERROR ] Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException
at java.util.Objects.requireNonNull(Objects.java:203)
at java.util.Arrays$ArrayList.<init>(Arrays.java:3813)
at java.util.Arrays.asList(Arrays.java:3800)
at org.apache.sqoop.util.FileListing.getFileListingNoSort(FileListing.java:76)
at org.apache.sqoop.util.FileListing.getFileListingNoSort(FileListing.java:82)
at org.apache.sqoop.util.FileListing.getFileListing(FileListing.java:67)
at com.cloudera.sqoop.util.FileListing.getFileListing(FileListing.java:39)
at org.apache.sqoop.orm.CompilationManager.addClassFilesFromDir(CompilationManager.java:289)
at org.apache.sqoop.orm.CompilationManager.jar(CompilationManager.java:374)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:108)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:494)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at cn.xxx.xxx.sync.SqoopSync$sqoopSyncTask.run(SqoopSync.java:196)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
通過跟蹤源碼發現原因爲SqoopOptions中jarOutputDir目錄爲空導致,而jarOutputDir又是在SqoopOptions初始化時調用getNonceJarDir賦值的,代碼如下:
private void initDefaults(Configuration baseConfiguration) {
...
this.jarOutputDir = getNonceJarDir(this.tmpDir + "sqoop-" + localUsername + "/compile");
...
}
private static String getNonceJarDir(String tmpBase)
{
int MAX_DIR_CREATE_ATTEMPTS = true;
if (null != curNonce) {
return curNonce;
} else {
File baseDir = new File(tmpBase);
File hashDir = null;
for(int attempts = 0; attempts < 32; ++attempts) {
for(hashDir = new File(baseDir, RandomHash.generateMD5String()); hashDir.exists(); hashDir = new File(baseDir, RandomHash.generateMD5String())) {
}
if (hashDir.mkdirs()) {
hashDir.deleteOnExit();
break;
}
}
if (hashDir != null && hashDir.exists()) {
LOG.debug("Generated nonce dir: " + hashDir.toString());
curNonce = hashDir.toString();
return curNonce;
} else {
throw new RuntimeException("Could not create temporary directory: " + hashDir + "; check for a directory permissions issue on /tmp.");
}
}
}
如果沒設置,默認使用curNonce,curNonce爲靜態變量,同個java進程sqoop會使用同一個編譯目錄,當jarOutputDir被其它已完成sqoop任務刪除,而報NullPointerException
,如果使用命令行啓動sqoop是沒這個問題的,因爲每個sqoop都是一個單獨的進程。
通過查詢官網,需要設置如下參數,可以直接使用UUID爲目錄名,防止衝突
--bindir <dir> 編譯對象的輸出目錄
同時最好配置如下設置,避免多sqoop任務處理同一張表衝突,class-name可以表名加上UUID後綴
--outdir <dir> 生成代碼的輸出目錄
--class-name <name> 設置生成的類名。