Hive源碼分析(1)——CLi輸入處理

Hive源碼分析(一)——CLi輸入處理

北京時間：2020年04月28日10:30

環境Hive3.1.1

1、程序棧主要執行流程

main:683, CliDriver (org.apache.hadoop.hive.cli)

程序入口：

public static void main(String[] args) throws Exception {
    int ret = new CliDriver().run(args);
    System.exit(ret);
}

run:759, CliDriver (org.apache.hadoop.hive.cli)

public  int run(String[] args) throws Exception {
	// 參數解析OptionsProcessor( -f -hiveconf -d -i 等等參數解析) 
    OptionsProcessor oproc = new OptionsProcessor();
    if (!oproc.process_stage1(args)) {
        return 1;
    }

     // log4j 日誌加載 initHiveLog4j
    // ( 此時會調用HiveConf 並將裏面的一些靜態變量初始化了，獲取日誌的一些配置)
    boolean logInitFailed = false;
    String logInitDetailMessage;
    try {
        logInitDetailMessage = LogUtils.initHiveLog4j();
    } catch (LogInitializationException e) {
        logInitFailed = true;
        logInitDetailMessage = e.getMessage();
    }
	
    // 創建CliSessionState 、HiveConf
    CliSessionState ss = new CliSessionState(new HiveConf(SessionState.class));
    ss.in = System.in;
    try {
        ss.out = new PrintStream(System.out, true, "UTF-8");
        ss.info = new PrintStream(System.err, true, "UTF-8");
        ss.err = new CachingPrintStream(System.err, true, "UTF-8");
    } catch (UnsupportedEncodingException e) {
        return 3;
    }
	
    // 繼續解析參數（-S -database -e -f -v -i等參數，並將其設置到CliSessionState ss）
    if (!oproc.process_stage2(ss)) {
        return 2;
    }

    // 當前會話是否在 silent 模式運行
    // 如果不是 silent 模式，info 級打在日誌中的消息，都將以標準錯誤流的形式輸出到控制檯。
    if (!ss.getIsSilent()) {
        if (logInitFailed) {
            System.err.println(logInitDetailMessage);
        } else {
            SessionState.getConsole().printInfo(logInitDetailMessage);
        }
    }
    
	// 到此：以上代碼主要是解析命令行中的配置參數
    
    // 設置通過命令行指定的所有屬性
    HiveConf conf = ss.getConf();
    for (Map.Entry<Object, Object> item : ss.cmdProperties.entrySet()) {
        conf.set((String) item.getKey(), (String) item.getValue());
        ss.getOverriddenConfigurations().put((String) item.getKey(), 
                                             (String) item.getValue());
    }

    // 讀取提示符配置（hive.cli.prompt）和替換變量
    prompt = conf.getVar(HiveConf.ConfVars.CLIPROMPT);
    prompt = new VariableSubstitution(new HiveVariableSource() {
        @Override
        public Map<String, String> getHiveVariable() {
            return SessionState.get().getHiveVariables();
        }
    }).substitute(conf, prompt);
    prompt2 = spacesForString(prompt);
	
    // 這裏的if else 方法都是調用org.apache.hadoop.hive.ql.session.SessionState#start方法
    // 只是傳入的參數不同
    if (HiveConf.getBoolVar(conf, ConfVars.HIVE_CLI_TEZ_SESSION_ASYNC)) {
        // 傳入：start(startSs, true, console);
        SessionState.beginStart(ss, console);
    } else {
       	// 傳入：start(startSs, false, null);
        SessionState.start(ss);
    }

    // 更新並且打印當前線程的名稱：
    ss.updateThreadName();

   	// 創建和初始化視圖註冊緩存
    HiveMaterializedViewsRegistry.get().init();

    try {
        /****** 核心方法：下面開始執行 ******/
        return executeDriver(ss, conf, oproc);
    } finally {
        ss.resetThreadName();
        ss.close();
    }
}

小結：主要是處理Hive命令參數

解析Hive命令後接參數
同時設置到到HiveConf
執行命令

executeDriver:821, CliDriver (org.apache.hadoop.hive.cli)

/**
 * 執行命令
 * @param ss CliSessionState
 * @param conf HiveConf
 * @param oproc 命令行所設置的參數
 * @return status 返回執行狀態
 * @throws Exception
 */
private int executeDriver(CliSessionState ss, HiveConf conf, OptionsProcessor oproc)
    throws Exception {
	// 創建CliDriver，並且設置命令行中所設置的參數
    CliDriver cli = new CliDriver();
    cli.setHiveVariables(oproc.getHiveVariables());

    // 使用指定的數據庫（會自動執行：use hive_temp(具體的數據庫名);）
    cli.processSelectDatabase(ss);

    // 初始化指定的SQL腳本文件
    cli.processInitFiles(ss);
	
    // 命令行使用-e參數(不進入hive模式，直接執行SQL字符串)
    if (ss.execString != null) {
        int cmdProcessStatus = cli.processLine(ss.execString);
        // 返回結果
        return cmdProcessStatus;
    }

    try {
        // 命令行使用-f參數（不進入hive模式，直接執行SQL腳本文件）
        if (ss.fileName != null) {
            /****** 處理SQL文件 ******/
            return cli.processFile(ss.fileName);
        }
    } catch (FileNotFoundException e) {
        System.err.println("Could not open input file for reading. (" + e.getMessage() + 
                           																	")");
        return 3;
    }
    if ("mr".equals(HiveConf.getVar(conf, ConfVars.HIVE_EXECUTION_ENGINE))) {
        console.printInfo(HiveConf.generateMrDeprecationWarning());
    }

    setupConsoleReader();
	// 下面是在hive模式中
    String line;
    int ret = 0;
    // 用於多行輸入拼接爲完整的一行SQL
    String prefix = "";
    // hive.cli.print.current.db：是否顯示當前數據
    String curDB = getFormattedDb(conf, ss);
    String curPrompt = prompt + curDB;
    String dbSpaces = spacesForString(curDB);
	// 從控制檯讀取每一行
    while ((line = reader.readLine(curPrompt + "> ")) != null) {
        if (!prefix.equals("")) {
            prefix += '\n';
        }
        // 遇到註釋跳過
        if (line.trim().startsWith("--")) {
            continue;
        }
        // 當前SQL遇到了結尾的一行(以";"爲結尾)
        // 注意：這裏只是以";"爲結尾，如果";"在語句中間此處沒有處理分割
        if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) {
            // 拼接SQL
            line = prefix + line;
            /****** 執行SQL(核心方法) ******/
            ret = cli.processLine(line, true);
            // 重置拼接的SQ
            prefix = "";
            curDB = getFormattedDb(conf, ss);
            curPrompt = prompt + curDB;
            dbSpaces = dbSpaces.length() == curDB.length() ? dbSpaces : 
            															spacesForString(curDB);
        } else {
            // 沒有SQL結束，繼續拼接
            prefix = prefix + line;
            curPrompt = prompt2 + dbSpaces;
            continue;
        }
    }

    return ret;
}

小結：根據參數處理相應不同輸入類型的SQL

是否是Hive命令執行SQL字符串，直接處理SQL字符串，返回結果（processLine方法）
是否是HIve命令執行SQL文件，直接處理SQL文件，返回結果（processFile方法）
只能是Hive命令行模式，處理輸入SQL（processLine方法）

processLine:402, CliDriver (org.apache.hadoop.hive.cli)

public int processLine(String line, boolean allowInterrupting) {
    SignalHandler oldSignal = null;
    Signal interruptSignal = null;
	// 是否允許中斷
    if (allowInterrupting) {
        // 當前執行Ctrl+C
        interruptSignal = new Signal("INT");
        oldSignal = Signal.handle(interruptSignal, new SignalHandler() {
            private boolean interruptRequested;

            @Override
            public void handle(Signal signal) {
                boolean initialRequest = !interruptRequested;
                interruptRequested = true;

                 // 第一次Ctrl+C，關閉當前的java程序
                if (!initialRequest) {
                    console.printInfo("Exiting the JVM");
                    System.exit(127);
                }

                // 中斷信息提示
                console.printInfo("Interrupting... Be patient, this might take some time.");
                console.printInfo("Press Ctrl+C again to kill JVM");

                // 第一次Ctrl+C，關閉所有在執行的MR jobs
                HadoopJobExecHelper.killRunningJobs();
                TezJobExecHelper.killRunningJobs();
                HiveInterruptUtils.interrupt();
            }
        });
    }

    try {
        int lastRet = 0, ret = 0;

        // 對命令行字符串，用';'分割開來，去除';'
        // 不能使用split方法，直接使用split可能會導致依舊含有';'
        List<String> commands = splitSemiColon(line);

        String command = "";
        // 遍歷傳入的命令串
        for (String oneCmd : commands) {
            // 如果是'\'結尾，則刪除掉'\',並且加上";"
            if (StringUtils.endsWith(oneCmd, "\\")) {
                command += StringUtils.chop(oneCmd) + ";";
                continue;
            } else {
                command += oneCmd;
            }
            // 空字符串，跳過
            if (StringUtils.isBlank(command)) {
                continue;
            }
			 /****** 執行SQL（核心方法） ******/
            ret = processCmd(command);
            // 重置命令
            command = "";
            // 最後一次執行結果
            lastRet = ret;
            // 是否忽略錯誤
            boolean ignoreErrors = HiveConf.getBoolVar(conf, 
                                                       	HiveConf.ConfVars.CLIIGNOREERRORS);
            // 執行出錯，並且不忽略錯誤，返回錯誤結果
            if (ret != 0 && !ignoreErrors) {
                return ret;
            }
        }
        return lastRet;
    } finally {
        // Once we are done processing the line, restore the old handler
        if (oldSignal != null && interruptSignal != null) {
            Signal.handle(interruptSignal, oldSignal);
        }
    }
}

小結：判斷中斷，處理SQL字符串

判斷是否允許中斷
處理SQL字符串
繼續執行SQL

processCmd:127, CliDriver (org.apache.hadoop.hive.cli)

public int processCmd(String cmd) {
    CliSessionState ss = (CliSessionState) SessionState.get();
    ss.setLastCommand(cmd);

    ss.updateThreadName();

    // 刷新打印流，因此它不包括最後一個命令的輸出
    ss.err.flush();
    String cmd_trimmed = HiveStringUtils.removeComments(cmd).trim();
    String[] tokens = tokenizeCmd(cmd_trimmed);
    int ret = 0;

    // 處理退出程序
    if (cmd_trimmed.toLowerCase().equals("quit") || 
        											cmd_trimmed.toLowerCase().equals("exit")) {
        // 如果我們已經走到這一步，要麼前面的命令都成功，要麼這是退出命令行
        // 無論哪種情況，這都算是成功運行
        ss.close();
        System.exit(0);

    } 
    // 在Hive模式下執行SQL腳本
    else if (tokens[0].equalsIgnoreCase("source")) {
        String cmd_1 = getFirstCmd(cmd_trimmed, tokens[0].length());
        cmd_1 = new VariableSubstitution(new HiveVariableSource() {
            @Override
            public Map<String, String> getHiveVariable() {
                return SessionState.get().getHiveVariables();
            }
        }).substitute(ss.getConf(), cmd_1);
		 // 獲取文件
        File sourceFile = new File(cmd_1);
        if (! sourceFile.isFile()){
            console.printError("File: "+ cmd_1 + " is not a file.");
            ret = 1;
        } else {
            try {
                /****** 處理SQL文件 ******/
                ret = processFile(cmd_1);
            } catch (IOException e) {
                console.printError("Failed processing file "+ cmd_1 +" "+ 
                                   e.getLocalizedMessage(), stringifyException(e));
                ret = 1;
            }
        }
    } 
    // 對於shell命令的處理
    else if (cmd_trimmed.startsWith("!")) {
        // for shell commands, use unstripped command
        String shell_cmd = cmd.trim().substring(1);
        shell_cmd = new VariableSubstitution(new HiveVariableSource() {
            @Override
            public Map<String, String> getHiveVariable() {
                return SessionState.get().getHiveVariables();
            }
        }).substitute(ss.getConf(), shell_cmd);

        // shell_cmd = "/bin/bash -c \'" + shell_cmd + "\'";
        try {
            ShellCmdExecutor executor = new ShellCmdExecutor(shell_cmd, ss.out, ss.err);
            ret = executor.execute();
            if (ret != 0) {
                console.printError("Command failed with exit code = " + ret);
            }
        } catch (Exception e) {
            console.printError("Exception raised from Shell command " + 
                               				e.getLocalizedMessage(),stringifyException(e));
            ret = 1;
        }
    }  else { 
        try {
            // 進入本地模式，也就是Hive模式下的命令行輸入SQL
            try (CommandProcessor proc = CommandProcessorFactory.get(tokens, 
                                                                     (HiveConf) conf)) {
                if (proc instanceof IDriver) {
                    /****** 繼續處理 ******/
                    ret = processLocalCmd(cmd, proc, ss);
                } else {
                    ret = processLocalCmd(cmd_trimmed, proc, ss);
                }
            }
        } catch (SQLException e) {
            console.printError("Failed processing command " + tokens[0] + " " + 											e.getLocalizedMessage(), 
                               org.apache.hadoop.util.StringUtils.stringifyException(e));
            ret = 1;
        }
        catch (Exception e) {
            throw new RuntimeException(e);
        }
    }

    ss.resetThreadName();
    return ret;
}

小結：處理命令

繼續處理命令，如果使用SQL文件，則執行processFile方法
本地SQL命令，則執行processLocalCmd方法

下面來看下對SQL文件的處理

org.apache.hadoop.hive.cli.CliDriver#processFile

public int processFile(String fileName) throws IOException {
    Path path = new Path(fileName);
    FileSystem fs;
    if (!path.toUri().isAbsolute()) {
        fs = FileSystem.getLocal(conf);
        path = fs.makeQualified(path);
    } else {
        fs = FileSystem.get(path.toUri(), conf);
    }
    BufferedReader bufferReader = null;
    int rc = 0;
    try {
        bufferReader = new BufferedReader(new InputStreamReader(fs.open(path)));
        // 到上面這一步爲止都是在讀取SQL文件
        
        /****** 處理SQL文件 ******/
        rc = processReader(bufferReader);
    } finally {
        IOUtils.closeStream(bufferReader);
    }
    return rc;
}

org.apache.hadoop.hive.cli.CliDriver#processReader

public int processReader(BufferedReader r) throws IOException {
    String line;
    StringBuilder qsb = new StringBuilder();
	// 讀取文件中的每一行
    while ((line = r.readLine()) != null) {
        // 忽略SQL文件中的註釋
        if (! line.startsWith("--")) {
            // SQL 拼接
            qsb.append(line + "\n");
        }
    }
	/****** 這裏還是調用processLine方法 ******/
    return (processLine(qsb.toString()));
}

小結：只是添加了一個文件的讀取，最後還是調用org.apache.hadoop.hive.cli.CliDriver#processReader方法

到這裏爲止，對於輸入的處理已經完成，下一步便是對SQL的編譯執行。

Hive源碼分析(1)——CLi輸入處理

Hive源碼分析(一)——CLi輸入處理

1、程序棧主要執行流程

杭州的 IT 崩盤了麼？

VS2022 解決方案打不開 .NET Framework 4.0 、 4.5 等老項目

Vue3 運行可以，build 打包發佈報錯，app.config.globalProperties 用法坑

機器學習（八）——學習理論

批量excel轉csv

Spark修改爲python3.6.5

SpringBoot部署docker

機器學習（十三）——獨立成分分析(ICA)

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結