深入解析 Flink 的算子鏈機制
{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“爲什麼我的 Flink 作業 Web UI 中只顯示出了一個框,並且 Records Sent 和Records Received 指標都是 0 ?是我的程序寫得有問題嗎?”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Flink 算子鏈簡介"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"筆者在 Flink 社區羣裏經常能看到類似這樣的疑問。這種情況幾乎都不是程序有問題,而是因爲 Flink 的 operator chain ——即算子鏈機制導致的,即提交的作業的執行計劃中,所有算子的併發實例(即 sub-task )都因爲滿足特定條件而串成了整體來執行,自然就觀察不到算子之間的數據流量了。當然上述是一種特殊情況。我們更常見到的是隻有部分算子得到了算子鏈機制的優化,如官方文檔中出現過多次的下圖所示,注意 Source 和 map() 算子。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/72/720b4e0e2709778ba105ae601308d74d.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"算子鏈機制的好處是顯而易見的:所有 chain 在一起的 sub-task 都會在同一個線程(即 TaskManager 的 slot)中執行,能夠減少不必要的數據交換、序列化和上下文切換,從而提高作業的執行效率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/4b/4b9122f479e9ddeb5b03394bca2d367a.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"鋪墊了這麼多,接下來就通過源碼簡單看看算子鏈產生的條件,以及它是如何在 Flink Runtime 中實現的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"邏輯計劃中的算子鏈"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對 Flink Runtime 稍有了解的看官應該知道,Flink 作業的執行計劃會用三層圖結構來表示,即:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" StreamGraph —— 原始邏輯執行計劃"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" JobGraph —— 優化的邏輯執行計劃(Web UI 中看到的就是這個)"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ExecutionGraph —— 物理執行計劃"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"算子鏈是在優化邏輯計劃時加入的,也就是由 StreamGraph 生成 JobGraph 的過程中。那麼我們來到負責生成 JobGraph 的 o.a.f.streaming.api.graph.StreamingJobGraphGenerator 類,查看其核心方法 createJobGraph() 的源碼。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"private JobGraph createJobGraph() {\n // make sure that all vertices start immediately\n jobGraph.setScheduleMode(streamGraph.getScheduleMode());\n // Generate deterministic hashes for the nodes in order to identify them across\n // submission iff they didn't change.\n Map hashes = defaultStreamGraphHasher.traverseStreamGraphAndGenerateHashes(streamGraph);\n // Generate legacy version hashes for backwards compatibility\n List
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.