Flink學習(二) job 執行流程

本片文章主要介紹以下2點：1、job代碼是如何被編譯成ExecutionGraph。2、任務是如何運行和調度。(以RemoteEnvironment 模式記錄而非Local)。

flink job 是如何生成ExecutionGraph

首先看下簡單的flink 消費Kafka的代碼：

public static void main(String[] args) throws Exception {
    StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironment();
    Properties properties = new Properties();
    properties.setProperty("bootstrap.servers", "localhost:9092");
    properties.setProperty("group.id", "test");
    FlinkKafkaConsumer010<String> consumer = new FlinkKafkaConsumer010<String>("pepsi-test",
            new SimpleStringSchema()
            ,properties);
    env.addSource(consumer)
            .process(...)
            .keyBy(1)
            .timeWindow(Time.seconds(10))
            .aggregate(....)
            .sink();
    env.execute();
}

算子(transform)的註冊 : 從代碼中可看到，首先我們獲取到當前運行的環境信息： StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironment();這個環境主要記錄了並行度配置、算子信息、checkpoint 等配置信息，同時提供了整個flink作業運行的方法入口：execute方法。
這裏每個操作都是一個算子(transform)，如：source、process、sink、keyby(虛擬節點)等都屬於算子。算子的註冊就是將其添到StreamExecutionEnvironment的transformations集合中。
代碼參考如下

public <R> SingleOutputStreamOperator<R> transform(String operatorName, TypeInformation<R> outTypeInfo, OneInputStreamOperator<T, R> operator) {

    // read the output type of the input Transform to coax out errors about MissingTypeInfo
    transformation.getOutputType();
    OneInputTransformation<T, R> resultTransform = new OneInputTransformation<>(
            this.transformation,
            operatorName,
            operator,
            outTypeInfo,
            environment.getParallelism());
    @SuppressWarnings({ "unchecked", "rawtypes" })
    SingleOutputStreamOperator<R> returnStream = new SingleOutputStreamOperator(environment, resultTransform);
    getExecutionEnvironment().addOperator(resultTransform);
    return returnStream;
}

StreamGraph生成 : 程序真正執行入口是env.execute 這裏，這裏最終調用的是StreamGraphGenerator.generate()方法生成的StreamGraph，這個方法參考如下：

public static StreamGraph generate(StreamExecutionEnvironment env, List<StreamTransformation<?>> transformations) {
        return new StreamGraphGenerator(env).generateInternal(transformations);
    }

    //從sink開始遞歸迭代一直向上找，直到最開始的算子(source)爲止，生成node和edge
    private StreamGraph generateInternal(List<StreamTransformation<?>> transformations) {
        for (StreamTransformation<?> transformation: transformations) {
            transform(transformation);
        }
        return streamGraph;
    }
   //transform 方法就是根據不同的處理類型做不同的處理，這裏一坨if else ，方法就不列了。
   //找個具體算子處理看下transformOneInputTransform
   private <IN, OUT> Collection<Integer> transformOneInputTransform(OneInputTransformation<IN, OUT> transform) {

        //算子之間的是否可以嵌到一起，group 
        String slotSharingGroup = determineSlotSharingGroup(transform.getSlotSharingGroup(), inputIds);

        //生成StreamNode(有些transform操作不會生成node，而是生成虛擬節點)
        streamGraph.addOperator(transform.getId(),
                slotSharingGroup,
                transform.getOperator(),
                transform.getInputType(),
                transform.getOutputType(),
                transform.getName());
        
        //keySelect
        if (transform.getStateKeySelector() != null) {
            TypeSerializer<?> keySerializer = transform.getStateKeyType().createSerializer(env.getConfig());
            streamGraph.setOneInputStateKey(transform.getId(), transform.getStateKeySelector(), keySerializer);
        }

        //爲當前節點和它的依賴節點建立邊
        for (Integer inputId: inputIds) {
            streamGraph.addEdge(inputId, transform.getId(), 0);
        }

        return Collections.singleton(transform.getId());
    }

這個方法有2大步構成：
1、調用generateInternal方法循環遍歷算子信息生成StreamNode以及Edge。
2、將Node和edge信息連接起來。edge 類似通道作用將2個Node連接起來。
3. JobGraph生成 : 從StreamGraph 到 JobGraph 的轉換可以參考StreamingJobGraphGenerator.createJobGraph方法，簡單描述如下：

private JobGraph createJobGraph() {
    ...代碼省略 主要遍歷StreamGraph，爲每個StreamNode生成一個byte類型hash(當算子運行失敗恢復時取值是根據JobVertexID取得)。

    // 設置chain，從source節點開始遍歷，將可以chain到一起的算子算入一個jobVertex，不能chain的單獨生成一個jobVertex。算子chain到一起需要滿足很多條件(下面會具體列出來)。
    setChaining(hashes, legacyHashes, chainedOperatorHashes);
    //設置輸入邊edge，將JobVertex 的入邊(StreamEdge)序列化到該 StreamConfig
    setPhysicalEdges();
    //根據 group name 爲每個 JobVertext 指定 SlotSharingGroup
    setSlotSharing();
    //配置檢查點
    configureCheckpointing();
    return jobGraph;
}

chain概念

爲了更高效地分佈式執行，Flink會盡可能地將operator的subtask鏈接（chain）在一起形成task。每個task在一個線程中執行。將operators鏈接成task是非常有效的優化：它能減少線程之間的切換，減少消息的序列化/反序列化，減少數據在緩衝區的交換，減少了延遲的同時提高整體的吞吐量。

從上圖可以看出將 keyagg 和 sink 算子 chain 到一起了，其實滿足chain的條件比較多。參考如下描述：

public static boolean isChainable(StreamEdge edge, StreamGraph streamGraph) {
    StreamNode upStreamVertex = edge.getSourceVertex();
    StreamNode downStreamVertex = edge.getTargetVertex();

    StreamOperator<?> headOperator = upStreamVertex.getOperator();
    StreamOperator<?> outOperator = downStreamVertex.getOperator();

    return  //下游節點職能有一個輸入
            downStreamVertex.getInEdges().size() == 1
            //上下游算子操作符不能爲空
            && outOperator != null
            && headOperator != null
            //上下游節點在一個槽位共享組內。
            && upStreamVertex.isSameSlotSharingGroup(downStreamVertex)
            //下游節點的連接策略是 ALWAYS同時上游節點的連接策略是 HEAD 或者 ALWAYS。
            && outOperator.getChainingStrategy() == ChainingStrategy.ALWAYS
            && (headOperator.getChainingStrategy() == ChainingStrategy.HEAD ||
                headOperator.getChainingStrategy() == ChainingStrategy.ALWAYS)
            //上下游節點分區方式是forward
            && (edge.getPartitioner() instanceof ForwardPartitioner)
            //下游節點只有一個輸入且上下游並行度一致
            && upStreamVertex.getParallelism() == downStreamVertex.getParallelism()
            && streamGraph.isChainingEnabled();
}

生成jobvertx節點
這裏從source節點開始循環遍歷，不能chain的單獨生成一個jobvertx.將可以chain到一起的節點從最開始的節點生成一個jobvertx,其他的chain節點通過寫入StreamConfig中。

在看 JobGraph 生成的時候我們發現是生成 JobGraph 後就 send 到JM 了，所以我們的 ExecutionGraph 不是在 client 端生成了，如下：

public JobSubmissionResult run(FlinkPlan compiledPlan,
        List<URL> libraries, List<URL> classpaths, ClassLoader classLoader, SavepointRestoreSettings savepointSettings)
        throws ProgramInvocationException {
    JobGraph job = getJobGraph(flinkConfig, compiledPlan, libraries, classpaths, savepointSettings);
    return submitJob(job, classLoader);
}

ExecutionGraph生成 : 這個生成ExecutionGraph代碼入口在ExecutionGraphBuilder#attachJobGraph方法中。參考代碼如下：

public void attachJobGraph(List<JobVertex> topologiallySorted) throws JobException {
    //循環已經排序的jobVertex，
    for (JobVertex jobVertex : topologiallySorted) {
        //生成 ExecutionJobVertex，1、構建 IntermediateResult、ExecutionVertex、IntermediateResultPartition等
        ExecutionJobVertex ejv = new ExecutionJobVertex(
            this,
            jobVertex,
            1,
            rpcTimeout,
            globalModVersion,
            createTimestamp);
        //構建 ExecutionEdge ，和前面生成的IntermediateResultPartition連接起來。
        for (IntermediateResult res : ejv.getProducedDataSets()) {
            IntermediateResult previousDataSet = this.intermediateResults.putIfAbsent(res.getId(), res);
            if (previousDataSet != null) {
                throw new JobException(String.format("Encountered two intermediate data set with ID %s : previous=[%s] / new=[%s]",
                        res.getId(), res, previousDataSet));
            }
        }

}

這裏生成ExecutionGraph的流程也可以分爲2個部分理解：
1、新建 ExecutionJobVertex。循環已排序 jobVertex，根據 jobVertex 構建IntermediateResult(個數根據下游節點的個數)
2、把每個ExecutorVertex和對應的IntermediateResult關聯起來。
這樣 ExecutionGraph 創建完成。
這裏官網提供了3層轉換的圖，如下

ExecutionGraph如何被調度和執行

flink Run Time 結構

Flink Runtime 層的主要架構如上圖所示，下面主要說下Dispatcher、ResourceManager 和 JobManager。

Dispatcher 負責接收用戶提供的作業，並且負責爲這個新提交的作業拉起一個新的 JobManager 組件。

ResourceManager 負責資源的管理，在整個 Flink 集羣中只有一個 ResourceManager。

JobManager 負責管理作業的執行，在一個 Flink 集羣中可能有多個作業同時執行，每個作業都有自己的 JobManager 組件。

任務的調度
上文介紹 flink 的運行結構，這裏先忽略Dispatch(有無dispatch 提交的對象不一樣，沒有Dispatch 則將job提交到YarnResourceManager，這裏我們直接先跳過)。最終我們提交的任務會來到JobManager，生成ExecuteGraph，以及建立了很多JobManagerActors、TaskManagerActor等。最終代碼如下：

private def submitJob(jobGraph: JobGraph, jobInfo: JobInfo, isRecovery: Boolean = false): Unit = {
    
    executionGraph = ExecutionGraphBuilder.buildGraph(
          executionGraph,
          jobGraph,
          flinkConfiguration,
          futureExecutor,
          ioExecutor,
          scheduler,
          userCodeLoader,
          checkpointRecoveryFactory,
          Time.of(timeout.length, timeout.unit),
          restartStrategy,
          jobMetrics,
          numSlots,
          blobServer,
          log.logger)
          
    ...
    executionGraph.scheduleForExecution()
    ...
}

最終經過executionGraph.scheduleForExecution()方法後將ExecutionVertex傳給taskManager用於執行，當TaskExecutor 收到 JobManager 提交的 Task 之後，會啓動一個新的線程來執行該 Task。這裏設計到很多資源申請如JobManager向RM申請資源來啓動任務，目前我也沒有明白其中的步驟，後續會對於任務資源和調度這塊會詳細在梳理一下。

Flink學習(二) job 執行流程

flink job 是如何生成ExecutionGraph

ExecutionGraph如何被調度和執行

如何使用 JS 判斷用戶是否處於活躍狀態

lightdb秒級增加列和刪除列（not null帶默認值）

lightdb數據庫超時相關控制參數

通過HPA+CronHPA組合應對業務複雜彈性伸縮場景

❤️‍🔥 Solon Cloud Event 新的事務特性與應用

lightdb mysql 8.0兼容之不可見主鍵

使用 JS 實現在瀏覽器控制檯打印圖片 console.image()

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（四）使用域名訪問網站應用

Flink學習(二) job 執行流程

flink學習(一) —Flink-Kafka-Connector基本描述

Kafka Consumer 消費邏輯

Hadoop 安裝(mac 環境，僞分佈式)

tomcat 源碼總結

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結