【Flink博客閱讀】 Flink 作業執行深度解析(WordCount) 讀後實戰總結

Flink 作業執行解析

所有有關Flink作業執行的介紹都包含以下的這個流程,今天我們就是實戰一些這些轉換是如何完成的?

Code
StreamGraph
JobGraph
ExecutionGraph
物理執行計劃
  • StreamGraph Class representing the streaming topology. It contains all the information necessary to build the jobgraph for the execution. 這個類表示流處理的拓撲結構,包含構造JobGraph的所有信息,從而滿足任務執行。
  • JobGraph: JobGraph代表Flinkdataflow程序,處於JobManager接受的底層。來自更高級別API的所有程序都將轉換爲JobGraphs。在此之前,都是在client裏面進行運行的。並且可以根據ExplainPlan獲取執行計劃
  • ExecutionGraph: ExecutionGraph是JobGraph的並行化版本,是調度層(Schduler)最核心的數據結構。
  • 物理執行計劃

image-20200608223431142

  • StreamGraph: 流處理節點拓撲圖

image-20200621123036155

  • JobGraph: Flink的數據流圖。

image-20200621112119230

  • JobGraph 屬性

  • Operator : 算子,理解爲function定義。

  • Transformation: 轉換,包含輸入、算子、與輸出,理解爲一個完整的流程,function runtime。

示例程序

flink-examples-streaming工程下面的org.apache.flink.streaming.examples.wordcount.WordCount

public static void main(String[] args) throws Exception {

		// Checking input parameters
		final MultipleParameterTool params = MultipleParameterTool.fromArgs(args);

		// set up the execution environment
		final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

		// make parameters available in the web interface
		env.getConfig().setGlobalJobParameters(params);

		// get input data
		DataStream<String> text = null;
		if (params.has("input")) {
			// union all the inputs from text files
			for (String input : params.getMultiParameterRequired("input")) {
				if (text == null) {
					text = env.readTextFile(input);
				} else {
					text = text.union(env.readTextFile(input));
				}
			}
			Preconditions.checkNotNull(text, "Input DataStream should not be null.");
		} else {
			System.out.println("Executing WordCount example with default input data set.");
			System.out.println("Use --input to specify file input.");
			// get default test text data
			text = env.fromElements(WordCountData.WORDS);
		}

		DataStream<Tuple2<String, Integer>> counts =
			// split up the lines in pairs (2-tuples) containing: (word,1)
			text.flatMap(new Tokenizer())
			// group by the tuple field "0" and sum up tuple field "1"
			.keyBy(0).sum(1);

		// emit result
		if (params.has("output")) {
			counts.writeAsText(params.get("output"));
		} else {
			System.out.println("Printing result to stdout. Use --output to specify output path.");
			counts.print();
		}
		// execute program
		env.execute("Streaming WordCount");
	}

Flink Client

WordCount程序

text = env.fromElements(WordCountData.WORDS);

DataStream<Tuple2<String, Integer>> counts =
			// split up the lines in pairs (2-tuples) containing: (word,1)
			text.flatMap(new Tokenizer())
			// group by the tuple field "0" and sum up tuple field "1"
			.keyBy(0).sum(1);

初始化

初始化是之前當前程序的定義,主要是整個數據流的定義元數據收集

Source

image-20200621114157046

FlatMap

  • (1) 首先調用DataStream的FlatMap方法 => transform 方法

​ DataStream#doTransform

  • (2)創建Transformation(包含輸入實例、算子 、輸出)

  • (3)創建結果流

  • (4)添加算子到當前上下文

    public void addOperator(Transformation<?> transformation) {
    		Preconditions.checkNotNull(transformation, "transformation must not be null.");
    		this.transformations.add(transformation);
    	}
    
  • (5)返回結果流

keyBy

  • 創建一個KeyStream 返回

    image-20200621121716437

sum

sum還是在當前的dataStream流上面。

需要注意的是,有些 transform 操作並不會生成StreamNode 如 PartitionTransformtion,而是生成個虛擬節點。

image-20200621121844254

調用transform即可flatmap一樣。會把自己作爲transformations。

image-20200621121834097

最終配置總覽

image-20200621122025735

image-20200621122136802

執行提交

    env.execute("Streaming WordCount");

===>

     public JobExecutionResult execute(String jobName) throws Exception {
		Preconditions.checkNotNull(jobName, "Streaming Job name should not be null.");

		return execute(getStreamGraph(jobName));
	}

生成StreamGraph(Pipeline)

public StreamGraph getStreamGraph(String jobName, boolean clearTransformations) {
		StreamGraph streamGraph = getStreamGraphGenerator().setJobName(jobName).generate();
		if (clearTransformations) {
			this.transformations.clear();
		}
		return streamGraph;
	}

Generate全圖

拓撲圖的生成邏輯,循環處理每一個節點

image-20200621122533549

Generate Transformation

這裏對操作符的類型進行判斷,並以此調用相應的處理邏輯.簡而言之,

處理的核心:是遞歸的將該節點和節點的上游節點加入圖

private Collection<Integer> transform(Transformation<?> transform) {

		if (alreadyTransformed.containsKey(transform)) {
			return alreadyTransformed.get(transform);
		}

		LOG.debug("Transforming " + transform);

		if (transform.getMaxParallelism() <= 0) {

			// if the max parallelism hasn't been set, then first use the job wide max parallelism
			// from the ExecutionConfig.
			int globalMaxParallelismFromConfig = executionConfig.getMaxParallelism();
			if (globalMaxParallelismFromConfig > 0) {
				transform.setMaxParallelism(globalMaxParallelismFromConfig);
			}
		}

		// call at least once to trigger exceptions about MissingTypeInfo
		transform.getOutputType();

		Collection<Integer> transformedIds;
		if (transform instanceof OneInputTransformation<?, ?>) {
			transformedIds = transformOneInputTransform((OneInputTransformation<?, ?>) transform);
		} else if (transform instanceof TwoInputTransformation<?, ?, ?>) {
			transformedIds = transformTwoInputTransform((TwoInputTransformation<?, ?, ?>) transform);
		} else if (transform instanceof SourceTransformation<?>) {
			// .......... 省略

		// need this check because the iterate transformation adds itself before
		// transforming the feedback edges
		if (!alreadyTransformed.containsKey(transform)) {
			alreadyTransformed.put(transform, transformedIds);
		}

		if (transform.getBufferTimeout() >= 0) {
			streamGraph.setBufferTimeout(transform.getId(), transform.getBufferTimeout());
		} else {
			streamGraph.setBufferTimeout(transform.getId(), defaultBufferTimeout);
		}

		if (transform.getUid() != null) {
			streamGraph.setTransformationUID(transform.getId(), transform.getUid());
		}
		if (transform.getUserProvidedNodeHash() != null) {
			streamGraph.setTransformationUserHash(transform.getId(), transform.getUserProvidedNodeHash());
		}

		if (!streamGraph.getExecutionConfig().hasAutoGeneratedUIDsEnabled()) {
			if (transform instanceof PhysicalTransformation &&
					transform.getUserProvidedNodeHash() == null &&
					transform.getUid() == null) {
				throw new IllegalStateException("Auto generated UIDs have been disabled " +
					"but no UID or hash has been assigned to operator " + transform.getName());
			}
		}

		if (transform.getMinResources() != null && transform.getPreferredResources() != null) {
			streamGraph.setResources(transform.getId(), transform.getMinResources(), transform.getPreferredResources());
		}

		streamGraph.setManagedMemoryWeight(transform.getId(), transform.getManagedMemoryWeight());

		return transformedIds;
	}

創建節點並加入圖

生成結果

生成JobGraph

執行

PipelinExecutor

execute 執行

生成jobGraph

image-20200621161237478

image-20200621161149714

  • 使用PipeLineTranslator生成JobGraph

image-20200621161440395

image-20200621161500468

Job Graph生成邏輯

image-20200621161545796

生成結果

image-20200621162619847

可以看到上面 key-aggregate-sink合併爲了一個chain。

JobGraph 對象結構如上圖所示,taskVertices 中只存在三個 TaskVertex,Sink operator 被嵌到 Keyed operator 中去了。

  • image-20200621161902396

    image-20200621161942724

image-20200621161959454

提交作業

這裏相當於把本地作業提交到集羣上面。就是一個作業元數據上傳的過程。底層的RestClient使用的是Netty。

ClientUtils#submitJob
	public static JobExecutionResult submitJob(
			ClusterClient<?> client,
			JobGraph jobGraph) throws ProgramInvocationException {
		checkNotNull(client);
		checkNotNull(jobGraph);
		try {
			return client
				.submitJob(jobGraph)
				.thenApply(DetachedJobExecutionResult::new)
				.get();
		} catch (InterruptedException | ExecutionException e) {
			ExceptionUtils.checkInterrupted(e);
			throw new ProgramInvocationException("Could not run job in detached mode.", jobGraph.getJobID(), e);
		}
	}

通過 RestClusterClient 提交到集羣上

public CompletableFuture<JobID> submitJob(@Nonnull JobGraph jobGraph) {
         // 生成二進制包,用於作業恢復
		CompletableFuture<java.nio.file.Path> jobGraphFileFuture = CompletableFuture.supplyAsync(() -> {
			try {
				final java.nio.file.Path jobGraphFile = Files.createTempFile("flink-jobgraph", ".bin");
				try (ObjectOutputStream objectOut = new ObjectOutputStream(Files.newOutputStream(jobGraphFile))) {
					objectOut.writeObject(jobGraph);
				}
				return jobGraphFile;
			} catch (IOException e) {
				throw new CompletionException(new FlinkException("Failed to serialize JobGraph.", e));
			}
		}, executorService);
         // 查詢所有需要上傳的包
		CompletableFuture<Tuple2<JobSubmitRequestBody, Collection<FileUpload>>> requestFuture = jobGraphFileFuture.thenApply(jobGraphFile -> {
			List<String> jarFileNames = new ArrayList<>(8);
			List<JobSubmitRequestBody.DistributedCacheFile> artifactFileNames = new ArrayList<>(8);
			Collection<FileUpload> filesToUpload = new ArrayList<>(8);

			filesToUpload.add(new FileUpload(jobGraphFile, RestConstants.CONTENT_TYPE_BINARY));

			for (Path jar : jobGraph.getUserJars()) {
				jarFileNames.add(jar.getName());
				filesToUpload.add(new FileUpload(Paths.get(jar.toUri()), RestConstants.CONTENT_TYPE_JAR));
			}

			for (Map.Entry<String, DistributedCache.DistributedCacheEntry> artifacts : jobGraph.getUserArtifacts().entrySet()) {
				final Path artifactFilePath = new Path(artifacts.getValue().filePath);
				try {
					// Only local artifacts need to be uploaded.
					if (!artifactFilePath.getFileSystem().isDistributedFS()) {
						artifactFileNames.add(new JobSubmitRequestBody.DistributedCacheFile(artifacts.getKey(), artifactFilePath.getName()));
						filesToUpload.add(new FileUpload(Paths.get(artifacts.getValue().filePath), RestConstants.CONTENT_TYPE_BINARY));
					}
				} catch (IOException e) {
					throw new CompletionException(
						new FlinkException("Failed to get the FileSystem of artifact " + artifactFilePath + ".", e));
				}
			}

			final JobSubmitRequestBody requestBody = new JobSubmitRequestBody(
				jobGraphFile.getFileName().toString(),
				jarFileNames,
				artifactFileNames);

			return Tuple2.of(requestBody, Collections.unmodifiableCollection(filesToUpload));
		});
         // 上傳jar包,提交作業
		final CompletableFuture<JobSubmitResponseBody> submissionFuture = requestFuture.thenCompose(
			requestAndFileUploads -> sendRetriableRequest(
				JobSubmitHeaders.getInstance(),
				EmptyMessageParameters.getInstance(),
				requestAndFileUploads.f0,
				requestAndFileUploads.f1,
				isConnectionProblemOrServiceUnavailable())
		);
         // 刪除生成的jobGraph文件
		submissionFuture
			.thenCombine(jobGraphFileFuture, (ignored, jobGraphFile) -> jobGraphFile)
			.thenAccept(jobGraphFile -> {
			try {
				Files.delete(jobGraphFile);
			} catch (IOException e) {
				LOG.warn("Could not delete temporary file {}.", jobGraphFile, e);
			}
		});
        // 返回jobId
		return submissionFuture
			.thenApply(ignore -> jobGraph.getJobID())
			.exceptionally(
				(Throwable throwable) -> {
					throw new CompletionException(new JobSubmissionException(jobGraph.getJobID(), "Failed to submit JobGraph.", ExceptionUtils.stripCompletionException(throwable)));
				});
	}
	
	// 提交到集羣
	private <M extends MessageHeaders<R, P, U>, U extends MessageParameters, R extends RequestBody, P extends ResponseBody> CompletableFuture<P>
	sendRetriableRequest(M messageHeaders, U messageParameters, R request, Collection<FileUpload> filesToUpload, Predicate<Throwable> retryPredicate) {
		return retry(() -> getWebMonitorBaseUrl().thenCompose(webMonitorBaseUrl -> {
			try {
				return restClient.sendRequest(webMonitorBaseUrl.getHost(), webMonitorBaseUrl.getPort(), messageHeaders, messageParameters, request, filesToUpload);
			} catch (IOException e) {
				throw new CompletionException(e);
			}
		}), retryPredicate);
	}

FlinkCluster Server

Dispatcher 接收請求

image-20200621163428029

image-20200621164048723

內部執行

image-20200621164308850

持久化和啓動作業

image-20200621164513430

持久化作業
創建JobManagerRunner(JobManagerRunnerImpl)

image-20200621164832663

屬性

在這裏插入圖片描述

翻譯JobGraph => ExecutionGraph

image-20200621173634085

ExecutionGraphBuilder# buildGraph邏輯

  • create a new execution graph, if none exists so far

  • set the basic properties

  • initialize the vertices that have a master initialization hook

  • file output formats create directories here, input formats create splits

  • topologically sort the job vertices and attach the graph to the existing one

    • 將 JobGraph 裏面的 jobVertex 從 Source 節點開始排序。
    • 在 executionGraph.attachJobGraph(sortedTopology)方法裏面,根據 JobVertex 生成 ExecutionJobVertex,在 ExecutionJobVertex 構造方法裏面,根據 jobVertex 的 IntermediateDataSet 構建 IntermediateResult,根據 jobVertex 併發構建 ExecutionVertex,ExecutionVertex 構建的時候,構建 IntermediateResultPartition(每一個 Execution 構建 IntermediateResult 數個IntermediateResultPartition );將創建的 ExecutionJobVertex 與前置的 IntermediateResult 連接起來。
    • 構建 ExecutionEdge ,連接到前面的 IntermediateResultPartition,最終從 ExecutionGraph 到物理執行計劃。
  • configure the state checkpointing

  • create all the metrics for the Execution Graph

實際啓動作業

image-20200621164714082

執行作業

JobMaster

image-20200621165828687

調度器執行

resetAndStartScheduler =》

image-20200621170153978

image-20200621170324945

總結


[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-jC5iQsH7-1592732400299)(images/image-20200621172505926.png)]

  • 第一層 StreamGraph 從 Source 節點開始,每一次 transform 生成一個 StreamNode,兩個 StreamNode 通過 StreamEdge 連接在一起,形成 StreamNode 和 StreamEdge 構成的DAG。
  • 第二層 JobGraph,依舊從 Source 節點開始,然後去遍歷尋找能夠嵌到一起的 operator,如果能夠嵌到一起則嵌到一起,不能嵌到一起的單獨生成 jobVertex,通過 JobEdge 鏈接上下游 JobVertex,最終形成 JobVertex 層面的 DAG。
  • JobVertex DAG 提交到任務以後,從 Source 節點開始排序,根據 JobVertex 生成ExecutionJobVertex,根據 jobVertex的IntermediateDataSet 構建IntermediateResult,然後 IntermediateResult 構建上下游的依賴關係,形成 ExecutionJobVertex 層面的 DAG 即 ExecutionGraph。
  • 最後通過 ExecutionGraph 層到物理執行層。

參考資料

https://ververica.cn/developers/advanced-tutorial-2-flink-job-execution-depth-analysis/ Flink 作業執行深度解析

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章