對於分佈式框架來說,我們經常聽到的一句話就是:移動計算,不移動數據。那麼對於 Flink 來說是如何移動計算的呢?我們一起重點看一下 ExecuteGraph
基本概念
ExecutionJobVertex:表示 JobGraph 的一個計算頂點,每個 ExecutionJobVertex 可能會有很多個 並行的 ExecutionVertex
ExecutionVertex:表示一個並行的 subtask
Execution: 表示 ExecutionVertex 的一次嘗試執行
Graph 變化
源代碼
由 一文搞定 Flink Job 提交全流程 我們可以知道在 創建 jobMaster 的同時還 create executionGraph ,一路追蹤至 ExecutionGraphBuilder.buildGraph 方法
......
// topologically sort the job vertices and attach the graph to the existing one
// 排好序的 topology source->flatMap Filter->sink
// 一個 operator chain 形成一個 JobVertex 。single operator as a special operator chain
List<JobVertex> sortedTopology = jobGraph.getVerticesSortedTopologicallyFromSources();
if (log.isDebugEnabled()) {
log.debug("Adding {} vertices from job graph {} ({}).", sortedTopology.size(), jobName, jobId);
}
executionGraph.attachJobGraph(sortedTopology);
......
進入 attachJobGraph
public void attachJobGraph(List<JobVertex> topologiallySorted) throws JobException {
assertRunningInJobMasterMainThread();
LOG.debug("Attaching {} topologically sorted vertices to existing job graph with {} " +
"vertices and {} intermediate results.",
topologiallySorted.size(),
tasks.size(),
intermediateResults.size());
final ArrayList<ExecutionJobVertex> newExecJobVertices = new ArrayList<>(topologiallySorted.size());
final long createTimestamp = System.currentTimeMillis();
//從 source operator chain 開始
for (JobVertex jobVertex : topologiallySorted) {
if (jobVertex.isInputVertex() && !jobVertex.isStoppable()) {
this.isStoppable = false;
}
/*
//在這裏生成 ExecutionGraph 的每個節點
//首先是進行了一堆賦值,將任務信息交給要生成的圖節點,以及設定並行度等等
//然後是創建本節點的 IntermediateResult,根據本節點的下游節點的個數確定創建幾份
//最後是根據設定好的並行度創建用於執行 task 的 ExecutionVertex
//如果 job 有設定 inputsplit 的話,這裏還要指定 inputsplits
*/
// create the execution job vertex and attach it to the graph
// 已經開始並行化了
ExecutionJobVertex ejv = new ExecutionJobVertex(
this,
jobVertex,
1,
rpcTimeout,
globalModVersion,
createTimestamp);
/*
//這裏要處理所有的JobEdge
//對每個edge,獲取對應的intermediateResult,並記錄到本節點的輸入上
//最後,把每個ExecutorVertex和對應的IntermediateResult關聯起來
*/
ejv.connectToPredecessors(this.intermediateResults);
ExecutionJobVertex previousTask = this.tasks.putIfAbsent(jobVertex.getID(), ejv);
if (previousTask != null) {
throw new JobException(String.format("Encountered two job vertices with ID %s : previous=[%s] / new=[%s]",
jobVertex.getID(), ejv, previousTask));
}
for (IntermediateResult res : ejv.getProducedDataSets()) {
IntermediateResult previousDataSet = this.intermediateResults.putIfAbsent(res.getId(), res);
if (previousDataSet != null) {
throw new JobException(String.format("Encountered two intermediate data set with ID %s : previous=[%s] / new=[%s]",
res.getId(), res, previousDataSet));
}
}
this.verticesInCreationOrder.add(ejv);
this.numVerticesTotal += ejv.getParallelism();
newExecJobVertices.add(ejv);
}
terminationFuture = new CompletableFuture<>();
failoverStrategy.notifyNewVertices(newExecJobVertices);
}
關鍵性方法 new ExecutionJobVertex,除了進行了一些基本的賦值操作外,還並行化了 intermediateResult,並行化了 ExecutionVertex。
說白點,就是創建了幾個特別相似的 intermediateResult 對象以及 ExecutionVertex 對象,具體如下
// 已經開始並行化了
public ExecutionJobVertex(
ExecutionGraph graph,
JobVertex jobVertex,
int defaultParallelism,
Time timeout,
long initialGlobalModVersion,
long createTimestamp) throws JobException {
if (graph == null || jobVertex == null) {
throw new NullPointerException();
}
this.graph = graph;
this.jobVertex = jobVertex;
int vertexParallelism = jobVertex.getParallelism();
// 最終的並行度
int numTaskVertices = vertexParallelism > 0 ? vertexParallelism : defaultParallelism;
final int configuredMaxParallelism = jobVertex.getMaxParallelism();
this.maxParallelismConfigured = (VALUE_NOT_SET != configuredMaxParallelism);
// if no max parallelism was configured by the user, we calculate and set a default
setMaxParallelismInternal(maxParallelismConfigured ?
configuredMaxParallelism : KeyGroupRangeAssignment.computeDefaultMaxParallelism(numTaskVertices));
// verify that our parallelism is not higher than the maximum parallelism
if (numTaskVertices > maxParallelism) {
throw new JobException(
String.format("Vertex %s's parallelism (%s) is higher than the max parallelism (%s). Please lower the parallelism or increase the max parallelism.",
jobVertex.getName(),
numTaskVertices,
maxParallelism));
}
this.parallelism = numTaskVertices;
this.taskVertices = new ExecutionVertex[numTaskVertices];
this.operatorIDs = Collections.unmodifiableList(jobVertex.getOperatorIDs());
this.userDefinedOperatorIds = Collections.unmodifiableList(jobVertex.getUserDefinedOperatorIDs());
this.inputs = new ArrayList<>(jobVertex.getInputs().size());
// take the sharing group
this.slotSharingGroup = jobVertex.getSlotSharingGroup();
this.coLocationGroup = jobVertex.getCoLocationGroup();
// setup the coLocation group
if (coLocationGroup != null && slotSharingGroup == null) {
throw new JobException("Vertex uses a co-location constraint without using slot sharing");
}
// create the intermediate results
this.producedDataSets = new IntermediateResult[jobVertex.getNumberOfProducedIntermediateDataSets()];
// intermediateResult 開始並行化
for (int i = 0; i < jobVertex.getProducedDataSets().size(); i++) {
final IntermediateDataSet result = jobVertex.getProducedDataSets().get(i);
this.producedDataSets[i] = new IntermediateResult(
result.getId(),
this,
numTaskVertices,
result.getResultType());
}
Configuration jobConfiguration = graph.getJobConfiguration();
int maxPriorAttemptsHistoryLength = jobConfiguration != null ?
jobConfiguration.getInteger(JobManagerOptions.MAX_ATTEMPTS_HISTORY_SIZE) :
JobManagerOptions.MAX_ATTEMPTS_HISTORY_SIZE.defaultValue();
// create all task vertices
// 移動計算
// ExecutionVertex 開始並行化
for (int i = 0; i < numTaskVertices; i++) {
ExecutionVertex vertex = new ExecutionVertex(
this,
i,
producedDataSets,
timeout,
initialGlobalModVersion,
createTimestamp,
maxPriorAttemptsHistoryLength);
this.taskVertices[i] = vertex;
}
// sanity check for the double referencing between intermediate result partitions and execution vertices
for (IntermediateResult ir : this.producedDataSets) {
if (ir.getNumberOfAssignedPartitions() != parallelism) {
throw new RuntimeException("The intermediate result's partitions were not correctly assigned.");
}
}
// set up the input splits, if the vertex has any
try {
@SuppressWarnings("unchecked")
InputSplitSource<InputSplit> splitSource = (InputSplitSource<InputSplit>) jobVertex.getInputSplitSource();
if (splitSource != null) {
Thread currentThread = Thread.currentThread();
ClassLoader oldContextClassLoader = currentThread.getContextClassLoader();
currentThread.setContextClassLoader(graph.getUserClassLoader());
try {
inputSplits = splitSource.createInputSplits(numTaskVertices);
if (inputSplits != null) {
splitAssigner = splitSource.getInputSplitAssigner(inputSplits);
}
} finally {
currentThread.setContextClassLoader(oldContextClassLoader);
}
}
else {
inputSplits = null;
}
}
catch (Throwable t) {
throw new JobException("Creating the input splits caused an error: " + t.getMessage(), t);
}
}
至此移動計算,就算清楚了