run()的執行邏輯

代碼：

/**
  * 執行run操作
  *
  * @param args 運行操作的命令行參數。
  */
protected void run(String[] args) throws Exception {
    LOG.info("Running 'run' command.");
	// 獲取配置信息
    final Options commandOptions = CliFrontendParser.getRunCommandOptions();

    final Options commandLineOptions = CliFrontendParser.mergeOptions(commandOptions, customCommandLineOptions);

    final CommandLine commandLine = CliFrontendParser.parse(commandLineOptions, args, true);

    final RunOptions runOptions = new RunOptions(commandLine);

    // 1.判斷下是否是help操作
    if (runOptions.isPrintHelp()) {
        CliFrontendParser.printHelpForRun(customCommandLines);
        return;
    }

	if (!runOptions.isPython()) {
      // Java program should be specified a JAR file
      // 2.檢查用戶指定jar包路徑是否爲空，如果爲空，則打印異常信息，並拋出異常：CliArgsException，
      if (runOptions.getJarFilePath() == null) {
        throw new CliArgsException("Java program should be specified a JAR file.");
      }
    }
     /**
     * 3.創建PackagedProgram對象，org.apache.flink.client.program.PackagedProgram類用於在用戶指定的jar包中
     *
     * <p>(1).尋找程序入口
     *
     * <p>(2).解析客戶代碼獲取任務拓撲圖
     *
     * <p>(3).提取嵌套庫
     *
     * <p>在這裏是去尋找用戶的程序入口
     */
    final PackagedProgram program;
    try {
        LOG.info("Building program from JAR file");
        program = buildProgram(runOptions);
    }
    catch (FileNotFoundException e) {
        throw new CliArgsException("Could not build the program from JAR file.", e);
    }

    final CustomCommandLine<?> customCommandLine = getActiveCustomCommandLine(commandLine);

    try {
        // 4. 【重點】執行任務程序.調用runProgram方法，將之前準備好的用戶程序入口，配置項傳入方法中，運行程序
        runProgram(customCommandLine, commandLine, runOptions, program);
    } finally {
    //  刪除爲了打包所創建的臨時文件
        program.deleteExtractedLibraries();
    }
}

繼續執行 runProgram(customCommandLine, commandLine, runOptions, program);

/**
	 * 執行邏輯
	 * @param customCommandLine
	 * @param commandLine
	 * @param runOptions
	 * @param program
	 * @param <T>
	 * @throws ProgramInvocationException
	 * @throws FlinkException
	 */
private <T> void runProgram(
    CustomCommandLine<T> customCommandLine,
    CommandLine commandLine,
    RunOptions runOptions,
    PackagedProgram program) throws ProgramInvocationException, FlinkException {
	//      根據用戶命令行參數，創建ClusterDescriptor，ClusterDescriptor是一個集羣屬性的描述對象，
    //  	用於部署集羣（例如 Yarn、Mesos），並且返回一個與集羣通信的客戶端
    final ClusterDescriptor<T> clusterDescriptor = customCommandLine.createClusterDescriptor(commandLine);

    try {
        final T clusterId = customCommandLine.getClusterId(commandLine);
        // 集羣客戶端，ClusterClient封裝了提交一個程序到遠程集羣的必要的功能
        final ClusterClient<T> client;

        // directly deploy the job if the cluster is started in job mode and detached
        //如果clusterId爲空且運行模式爲Detached（分離）模式，即命令行中添加 -d 參數
        if (clusterId == null && runOptions.getDetachedMode()) {
        	// --（1）獲取默認並行度
            int parallelism = runOptions.getParallelism() == -1 ? defaultParallelism : runOptions.getParallelism();

            // -- (2）根據用戶自定義程序，解析DAG拓撲圖,構建JobGraph
            final JobGraph jobGraph = PackagedProgramUtils.createJobGraph(program, configuration, parallelism);
			// --（3）根據命令行獲取集羣配置
            final ClusterSpecification clusterSpecification = customCommandLine.getClusterSpecification(commandLine);
            // 裝載任務
            // --（4）調用解析好的Job 圖，獲取的配置信息，運行模式等傳入，調用clusterDescriptor.deployJobCluster方法，
        	// 獲得ClusterClient，並運行程序client端將job部署到集羣
            client = clusterDescriptor.deployJobCluster(
                clusterSpecification,
                jobGraph,
                runOptions.getDetachedMode());

            logAndSysout("Job has been submitted with JobID " + jobGraph.getJobID());

            try {
             // -- （5）關閉client
                client.shutdown();
            } catch (Exception e) {
                LOG.info("Could not properly shut down the client.", e);
            }
        } else {
        	// -- 使用正常方式進行提交
        	// --（1）創建shutdownHook，用於關閉cluster，在非分離模式下，需要使用shutdownHook在client退出後關閉cluster
            final Thread shutdownHook;
            // -- (2)  判斷cluster id 是否爲空，如果爲空，獲取ClusterClient，如果不爲空，在client退出後，關閉cluster
            if (clusterId != null) {
                client = clusterDescriptor.retrieve(clusterId);
                shutdownHook = null;
            } else {
                // also in job mode we have to deploy a session cluster because the job
                // might consist of multiple parts (e.g. when using collect)
                final ClusterSpecification clusterSpecification = customCommandLine.getClusterSpecification(commandLine);
                client = clusterDescriptor.deploySessionCluster(clusterSpecification);
                // if not running in detached mode, add a shutdown hook to shut down cluster if client exits
                // there's a race-condition here if cli is killed before shutdown hook is installed
                if (!runOptions.getDetachedMode() && runOptions.isShutdownOnAttachedExit()) {
                    shutdownHook = ShutdownHookUtil.addShutdownHook(client::shutDownCluster, client.getClass().getSimpleName(), LOG);
                } else {
                    shutdownHook = null;
                }
            }

            try {
                client.setPrintStatusDuringExecution(runOptions.getStdoutLogging());
                client.setDetached(runOptions.getDetachedMode());
                LOG.debug("Client slots is set to {}", client.getMaxSlots());

                LOG.debug("{}", runOptions.getSavepointRestoreSettings());

                int userParallelism = runOptions.getParallelism();
                LOG.debug("User parallelism is set to {}", userParallelism);
                if (client.getMaxSlots() != MAX_SLOTS_UNKNOWN && userParallelism == -1) {
                    logAndSysout("Using the parallelism provided by the remote cluster ("
                                 + client.getMaxSlots() + "). "
                                 + "To use another parallelism, set it at the ./bin/flink client.");
                    userParallelism = client.getMaxSlots();
                } else if (ExecutionConfig.PARALLELISM_DEFAULT == userParallelism) {
                    userParallelism = defaultParallelism;
                }

                // 執行程序核心邏輯--調用executeProgram方法去執行程序
                executeProgram(program, client, userParallelism);
            } finally {
                if (clusterId == null && !client.isDetached()) {
                    // terminate the cluster only if we have started it before and if it's not detached
                    try {
                        client.shutDownCluster();
                    } catch (final Exception e) {
                        LOG.info("Could not properly terminate the Flink cluster.", e);
                    }
                    if (shutdownHook != null) {
                        // we do not need the hook anymore as we have just tried to shutdown the cluster.
                        ShutdownHookUtil.removeShutdownHook(shutdownHook, client.getClass().getSimpleName(), LOG);
                    }
                }
                try {
                    client.shutdown();
                } catch (Exception e) {
                    LOG.info("Could not properly shut down the client.", e);
                }
            }
        }
    } finally {
        try {
            clusterDescriptor.close();
        } catch (Exception e) {
            LOG.info("Could not properly close the cluster descriptor.", e);
        }
    }
}

接着分析executeProgram(program, client, userParallelism)的邏輯：

protected void executeProgram(PackagedProgram program, ClusterClient<?> client, int parallelism) throws ProgramMissingJobException, ProgramInvocationException {
		logAndSysout("Starting execution of program");
       // 執行任務
		final JobSubmissionResult result = client.run(program, parallelism);

		if (null == result) {
			throw new ProgramMissingJobException("No JobSubmissionResult returned, please make sure you called " +
				"ExecutionEnvironment.execute()");
		}
       // 判斷是否返回了任務程序執行的結果。即代表任務正常執行完了。
		if (result.isJobExecutionResult()) {
			logAndSysout("Program execution finished");
			JobExecutionResult execResult = result.getJobExecutionResult();
			System.out.println("Job with JobID " + execResult.getJobID() + " has finished.");
			System.out.println("Job Runtime: " + execResult.getNetRuntime() + " ms");
			Map<String, Object> accumulatorsResult = execResult.getAllAccumulatorResults();
			if (accumulatorsResult.size() > 0) {
				System.out.println("Accumulator Results: ");
				System.out.println(AccumulatorHelper.getResultsFormatted(accumulatorsResult));
			}
		} else {
			logAndSysout("Job has been submitted with JobID " + result.getJobID());
		}
	}

這裏是通過ClusterClient來運行已經打包好的任務。並且獲取到執行完之後的結果JobSubmissionResult。

ClusterClient運行任務的邏輯如下：

/**
	 * 從CliFronted中運行一個用戶自定義的jar包來運行任務程序。運行模式有阻塞（blocking)模式和分離（detached）模式。
	 * 具體是什麼模式，主要看{@code setDetached(true)} or {@code setDetached(false)}.
	 * @param prog 打包過的程序
	 * @param parallelism 執行Flink job的並行度
	 * @return 執行的結果
	 * @throws ProgramMissingJobException
	 * @throws ProgramInvocationException
	 */
public JobSubmissionResult run(PackagedProgram prog, int parallelism)
    throws ProgramInvocationException, ProgramMissingJobException {
    Thread.currentThread().setContextClassLoader(prog.getUserCodeClassLoader());
    
    // 1. 如果程序指定了執行入口
    if (prog.isUsingProgramEntryPoint()) {
        final JobWithJars jobWithJars;
        if (hasUserJarsInClassPath(prog.getAllLibraries())) {
            jobWithJars = prog.getPlanWithoutJars();
        } else {
            jobWithJars = prog.getPlanWithJars();
        }
        // 執行主邏輯
        return run(jobWithJars, parallelism, prog.getSavepointSettings());
    }
   // 2. 如果沒有指定執行入口，那麼就利用交互模式執行程序
    else if (prog.isUsingInteractiveMode()) {
        log.info("Starting program in interactive mode (detached: {})", isDetached());

        final List<URL> libraries;
        if (hasUserJarsInClassPath(prog.getAllLibraries())) {
            libraries = Collections.emptyList();
        } else {
            libraries = prog.getAllLibraries();
        }

        ContextEnvironmentFactory factory = new ContextEnvironmentFactory(this, libraries,
                                                                          prog.getClasspaths(), prog.getUserCodeClassLoader(), parallelism, isDetached(),
                                                                          prog.getSavepointSettings());
        ContextEnvironment.setAsContext(factory);

        try {
            // invoke main method
            prog.invokeInteractiveModeForExecution();
            if (lastJobExecutionResult == null && factory.getLastEnvCreated() == null) {
                throw new ProgramMissingJobException("The program didn't contain a Flink job.");
            }
            if (isDetached()) {
                // in detached mode, we execute the whole user code to extract the Flink job, afterwards we run it here
                return ((DetachedEnvironment) factory.getLastEnvCreated()).finalizeExecute();
            }
            else {
                // in blocking mode, we execute all Flink jobs contained in the user code and then return here
                return this.lastJobExecutionResult;
            }
        }
        finally {
            ContextEnvironment.unsetContext();
        }
    }
    else {
        throw new ProgramInvocationException("PackagedProgram does not have a valid invocation mode.");
    }
}

我們這裏不考慮交互模型，即只考慮任務程序的執行入口給定的情況。所以重點分析run(jobWithJars, parallelism, prog.getSavepointSettings())的邏輯。

/**
	 * 通過客戶端，在Flink集羣中運行程序。調用將一直阻塞，知道執行結果返回。
	 *
	 * @param jobWithJars 任務jar包.
	 * @param parallelism  運行該任務的並行度
	 *                    
	 */
public JobSubmissionResult run(JobWithJars jobWithJars, int parallelism, SavepointRestoreSettings savepointSettings)
    throws CompilerException, ProgramInvocationException {
    // 獲取類加載器
    ClassLoader classLoader = jobWithJars.getUserCodeClassLoader();
    if (classLoader == null) {
        throw new IllegalArgumentException("The given JobWithJars does not provide a usercode class loader.");
    }
    // 得到優化執行計劃
    OptimizedPlan optPlan = getOptimizedPlan(compiler, jobWithJars, parallelism);
    // 執行
    return run(optPlan, jobWithJars.getJarFiles(), jobWithJars.getClasspaths(), classLoader, savepointSettings);
}

這裏重點是優化執行計劃是怎麼生成的。//TODO
進一步分析run(optPlan, jobWithJars.getJarFiles(), jobWithJars.getClasspaths(), classLoader, savepointSettings)流程。

public JobSubmissionResult run(FlinkPlan compiledPlan,
			List<URL> libraries, List<URL> classpaths, ClassLoader classLoader, SavepointRestoreSettings savepointSettings)
			throws ProgramInvocationException {
         // 得到JobGraph
		JobGraph job = getJobGraph(flinkConfig, compiledPlan, libraries, classpaths, savepointSettings);
         // 提交任務執行
		return submitJob(job, classLoader);
	}

提交任務執行的邏輯submitJob(job, classLoader)。//TODO

參考博客：https://blog.csdn.net/hxcaifly/article/details/87864154

Flink任務提交源碼閱讀（三）：Job提交 run()

文章目錄

run()的執行邏輯

linux安裝cuda和cudnn

模擬手機設備：使用 Playwright 實現移動端自動化測試

Mellanox網卡開啓SR-IOV

測試人員都是畫畫大神，讓我看看誰還不會用代碼圖？

Object.values()對象遍歷

我拍了拍Redis，被移出了羣聊···

網絡現代化通向雲原生應用的高速公路

面試官：說說你對序列化的理解

我宣佈，這是我找到的史上AI最全論文體系！

Scala Collection筆記

Scala中大箭頭的應用場景

Scala 基礎--對比 Java

Scala 方法與函數筆記

java獲取其他接口返回的json數據【工具類】

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結