運行SQL時出了個錯:
SQL: INSERT OVERWRITE DIRECTORY 'result/testConsole' select count(1) from nutable;
錯誤信息:
Failed with exception Unable to rename: hdfs://indigo:8020/tmp/hive-root/hive_2013-08-22_17-35-05_006_3570546713731431770/-ext-10000 to: result/testConsoleFAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
另一個SQL的錯誤,這是日誌中的:
1042 2013-08-22 17:08:54,411 INFO exec.Task (SessionState.java:printInfo(412)) - Moving data to: result/userName831810250/54cbcd2980a64fe78cf54abb3116d2dc from hdfs://indigo:8020/tmp/hive-hive/hive_2013-08-22_17-08-40_062_3976325306495167351/-ext-10000
1043 2013-08-22 17:08:54,414 ERROR exec.Task (SessionState.java:printError(421)) - Failed with exception Unable to rename: hdfs://indigo:8020/tmp/hive-hive/hive_2013-08-22_17-08-40_062_3976325306495167351/-ext-10000 to: result/userName831810250/54cbcd2980a64fe78cf54abb3116d2dc
下面看看出現異常的地方。
執行SQL時,最後一個任務是MoveTask,它的作用是將運行SQL生成的Mapeduce任務結果文件放到SQL中指定的存儲查詢結果的路徑中,具體方法就是重命名
下面是 org.apache.hadoop.hive.ql.exec.MoveTask 中對結果文件重命名的一段代碼:
//這個sourcePath參數就是存放Mapeduce結果文件的目錄,所以它的值可能是
//hdfs://indigo:8020/tmp/hive-root/hive_2013-08-22_18-42-03_218_2856924886757165243/-ext-10000
if (fs.exists(sourcePath)) {
Path deletePath = null;
// If it multiple level of folder are there fs.rename is failing so first
// create the targetpath.getParent() if it not exist
if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_INSERT_INTO_MULTILEVEL_DIRS)) {
deletePath = createTargetPath(targetPath, fs);
}
//這裏targetPath的值就是指定的放置結果文件的目錄,值可能是 result/userName154122639/4e574b5d9f894a70b074ccd3981ca0f1
if (!fs.rename(sourcePath, targetPath)) {//上面產生的異常就是因爲這裏rename失敗,進了if,throw了個異常
try {
if (deletePath != null) {
fs.delete(deletePath, true);
}
} catch (IOException e) {
LOG.info("Unable to delete the path created for facilitating rename"
+ deletePath);
}
throw new HiveException("Unable to rename: " + sourcePath
+ " to: " + targetPath);
}
}
rename的targetPath必須存在。其實之前已經檢查和創建targetPath了:
private Path createTargetPath(Path targetPath, FileSystem fs) throws IOException {
Path deletePath = null;
Path mkDirPath = targetPath.getParent();
if (mkDirPath != null & !fs.exists(mkDirPath)) {
Path actualPath = mkDirPath;
while (actualPath != null && !fs.exists(actualPath)) {
deletePath = actualPath;
actualPath = actualPath.getParent();
}
fs.mkdirs(mkDirPath);
}
return deletePath;//返回新創建的最頂層的目錄,萬一失敗用來刪除用
}
Apache出現過這個問題,已經解決掉了
CDH 竟然加了個參數 hive.insert.into.multilevel.dirs,默認是false,意思是我還有這BUG呢哈。
當你被坑了,想打個patch時,會發現改個配置就可以了。
意思是我保留這個BUG,但你要是被坑了也不能說我有BUG,自己改配置好了。$+@*^.!"?......
目前還沒發現其他地方用到了這個參數,在這裏唯一作用就是限制SQL中指定存放結果文件不存在的目錄的深度不能大於1.
不過也沒發現這有什麼好處。
折騰半天,加個配置就可以了:
<property>
<name>hive.insert.into.multilevel.dirs</name>
<value>true</value>
</property>