Flink Checkpoint 和 Large State 調優

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Overview","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了使 Flink 應用程序能夠可靠地大規模運行,必須滿足兩個條件:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"應用程序需要能夠可靠地獲取 Checkpoint","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在發生故障後,需要足夠的資源追上(catch up)輸入數據流","attrs":{}}]}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"監控 State 和 Checkpoint","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"監控 Checkpoint 行爲的最簡單方法是通過 WebUI 界面。有兩個 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/08733fa1ab3c2f656826fa659","title":"","type":null},"content":[{"type":"text","text":"Checkpoint Metric","attrs":{}}]},{"type":"text","text":" 最值得關注的是:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當觸發 checkpoint 的時間一直很高時,Operator 收到第一個 checkpoint barrier 的時間一直很高,這意味着 checkpoint barriers 需要很長時間才能從 Source 到 Operator。這通常表明系統在恆定背壓(backpressure)下工作。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對齊持續時間。在 Exactly-once 語義下,有多個輸入的 Operator,已經接收到 barrier 的通道將被阻止接收進一步的數據,直到所有剩餘的通道趕上並接收到它們的 barrier 的持續時間。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"理想情況下,這兩個值都應該是低值,持續出現較高的值意味着 checkpoint barrier 在 job graph 中緩慢移動,通常是由於 backpressure 存在(沒有足夠的資源來處理記錄)。也可以通過增加處理記錄的端到端延遲來觀察。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"調整 Checkpoint","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"應用程序可以配置固定時間間隔觸發 checkpoint。當一個 checkpoint 的完成時間長於固定間隔時,在進行中的 checkpoint 完成之前不會觸發下一個(默認情況下,下一個 checkpoint 將在正在進行的 checkpoint 完成後立即觸發)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當 checkpoint 結束的時間經常超過固定間隔時,系統會不斷地觸發 checkpoint(完成後立即啓動新)。這可能意味着在兩個 checkpoint 之間,Operator 處理進展過少,並且 checkpoint 佔用了過多的資源。此行爲對使用異步 checkpoint 的流應用程序的影響較小,但仍可能對整體應用程序性能產生影響。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了防止這種情況,應用程序可以定義一個 checkpoint 的最小間隔(在最新 checkpoint 結束和下一個 checkpoint 開始前必須經過的最小時間間隔。):","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"StreamExecutionEnvironment.getCheckpointConfig()\n .setMinPauseBetweenCheckpoints(milliseconds)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下圖說明了這是如何影響 checkpoint 的,避免了 checkpoint 持續不斷的進行。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8a/8ac4f3d7f24868a9eb33e2aa5bb2cf70.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以配置應用程序允許同時進行多個 checkpoint。當手動觸發 savepoint 時,可能與正在進行的 checkpoint 同時進行。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"調整 RocksDB","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"許多大規模 Flink 流計算應用程序的 State 存儲使用的是 RocksDB state Backend。擴展性遠遠超過主內存,並可靠地存儲大的 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/d5eed8ecec7ec859c5be5bd93","title":"","type":null},"content":[{"type":"text","text":"keyed state","attrs":{}}]},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocksDB 的性能會因配置而異,下面介紹一些使用 RocksDB state Backend 的最佳實踐。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"增量 Checkpoint","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在減少 checkpoint 所需時間方面,開啓增量 checkpoint 應該是首要考慮因素之一。與完全 checkpoint 相比,增量 checkpoint 可以顯著減少時間,因爲只記錄與前一次完成的 checkpoint 相比所做的更改。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Timer 存儲選擇","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"定時器(Timer)默人存儲在 RocksDB 中,當 Job 只有很少的 Timer 時,放在堆上存儲可以提高性能。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"請小心使用此功能,因爲基於堆的 Timer 可能會增加 checkpoint 時間,並且無法在內存之外擴展。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"調整 RocksDB 內存","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocksDB State Backend 的性能在很大程度上取決於其可用的內存量。爲了提高性能,增加內存會有很大幫助,或者調整內存使用。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"默認情況,RocksDB State Backend 使用 Flink 託管內存用於 RocksDBs buffer 和 cache(","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"state.backend.rocksdb.memory.managed: true","attrs":{}}],"attrs":{}},{"type":"text","text":")。 要調整與內存相關的性能問題,以下步驟可能會有所幫助:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"增加託管內存的大小,這通常會改善很多情況,並且不會增加調優 RocksDB 底層配置的複雜性。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"特別是對於大 Container/進程大小,除非應用程序邏輯本身需要大量 JVM 堆內存,否則總內存中的大部分通常都可以放到 RocksDB 使用(默認的託管內存比例 0.4 是保守的)。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocksDB 中 write buffer 的數量取決於應用程序中的 State 數量。每個 State 對應一個 ColumnFamily(需要獨立的 write buffer)。因此,具有大量 State 的應用程序通常需要更多內存才能獲得相同的性能。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過設置 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"state.backend.RocksDB.memory.managed:false","attrs":{}}],"attrs":{}},{"type":"text","text":",可以嘗試比較 RocksDB with managed memory 和 RocksDB with per column family memory 的性能。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不使用託管內存意味着 RocksDB 按照應用程序中的 State 數量按比例分配內存。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果應用程序有大量狀態,並且頻繁的 MemTable 刷新(寫入端瓶頸),如果不能提供更多內存,那麼可以增加進入寫入緩衝區的內存比率(","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"state.backend.rocksdb.memory.write buffer ratio","attrs":{}}],"attrs":{}},{"type":"text","text":")。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一個高級選項(面向 RocksDB 專家)可以減少具有許多狀態的設置中的 MemTable 刷新次數,是通過 RocksDBOptionsFactory 調整 RocksDB 的 Columnfamily 設置","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"public class MyOptionsFactory implements ConfigurableRocksDBOptionsFactory {\n\n @Override\n public DBOptions createDBOptions(DBOptions currentOptions, Collection handlesToClose) {\n // 當一個 Operator 中有多個狀態時,增加後臺最大刷新線程數\n // 這意味着在一個 RocksDB 實例中會有多個 Columnfamily\n return currentOptions.setMaxBackgroundFlushes(4);\n }\n\n @Override\n public ColumnFamilyOptions createColumnOptions(\n ColumnFamilyOptions currentOptions, Collection handlesToClose) {\n // 將 arena 塊大小從默認的8MB減少到1MB。\n return currentOptions.setArenaBlockSize(1024 * 1024);\n }\n\n @Override\n public OptionsFactory configure(Configuration configuration) {\n return this;\n }\n}","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"容量規劃","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本節討論如何決定一個 Flink 作業應該使用多少資源才能可靠地運行。容量規劃的基本經驗法則是:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"正常操作應具有足夠的容量,以避免在恆定背壓下操作。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在常規無背壓運行程序所需的資源之上提供一些額外的資源。用來在應用程序恢復時快速處理恢復期間積累的輸入數據,這取決於恢復操作通常需要多長時間(取決於故障轉移時需要加載到新 TaskManager 中的狀態的大小)以及要求故障恢復的速度。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"暫時的背壓通常是可以接受的,在負載峯值期間、Catchup 階段或外部系統出現臨時響應慢時。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"某些操作(如大型窗口)會導致其下游操作符的負載存在毛刺(spiky):在構建窗口時,下游 Operator 可能是空閒的,在發出窗口數據時,下游纔開始工作。下游並行性的規劃需要考慮窗口發出的量以及處理這種峯值的速度。","attrs":{}}]}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"壓縮","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Flink 爲所有 checkpoint 和 savepoint 提供可選的壓縮(默認值:off)。目前,壓縮總是使用 ","attrs":{}},{"type":"link","attrs":{"href":"https://github.com/xerial/snappy-java","title":"","type":null},"content":[{"type":"text","text":"snappy compression algorithm(version 1.1.4)","attrs":{}}]},{"type":"text","text":" 但計劃在未來支持自定義壓縮算法。壓縮的粒度是 keyed state 的 key-group,每個 key-group 可以單獨壓縮,這對於縮放程序非常重要。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"壓縮可以通過 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"ExecutionConfig","attrs":{}}],"attrs":{}},{"type":"text","text":" 開啓","attrs":{}}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"ExecutionConfig executionConfig = new ExecutionConfig();\nexecutionConfig.setUseSnapshotCompression(true);","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"壓縮選項對增量快照(RocksDB)沒有影響。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"任務本地恢復","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Motivation","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 Flink 的 checkpoint 中,每個 Task 都會生成一個 State snapshot,然後將其寫入分佈式存儲。每個 Task 通過發送一個描述 State 在分佈式存儲中的位置的句柄來確認 State 成功寫入 JobManager。JobManager 依次從所有 Task 收集句柄,並將綁定到到 checkpoint 對象中。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在恢復的情況下,JobManager 打開最新的 checkpoint 對象並將句柄發送回相應的 Task,然後這些 Task 可以從分佈式存儲中恢復 State。使用分佈式存儲來存儲 State 有兩個重要的優點。首先,存儲是容錯的,其次,分佈式存儲中的所有 State 對所有節點都是可訪問的,並且可以很容易地重新分配(例如,用於重新縮放)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然而,使用遠程分佈式存儲也有一個很大的缺點:所有 Task 都必須通過網絡從遠程位置讀取其狀態。在一些情況下,恢復可以將 Task 重新安排到與上一次運行相同的 TaskManager 中,但仍然要讀取遠程狀態。這可能會導致大狀態的恢復時間長。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Approach","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"任務本地 State 恢復是針對這一類問題,主要思想如下:對於每個 checkpoint,每個 Task 不僅將 State snapshot 寫入分佈式存儲,而且還將 state snapshot 的輔助副本保存在該 Task 所在的本地存儲中(例如,本地磁盤或內存中)。State 的主存儲必須仍然是分佈式存儲,因爲本地存儲不能確保節點故障下的持久性,也不能爲其他節點提供重新分發 State 的訪問。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於每個可以重新安排到上一個位置進行恢復的 Task,可以從本地輔助副本恢復 State,並避免遠程讀取的開銷。考慮到許多故障不是節點故障,節點故障通常一次隻影響一個或極少數節點,在恢復過程中,大多數 Task 很可能返回到其以前的位置,並發現其本地 State 完好無損,可以有效地縮短恢復時間。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"需要注意的是,根據所選的 state backend 和 checkpoint 策略,在創建和存儲本地輔助副本時,每個 checkpoint 可能需要一些額外的成本。在大多數情況下,實現只需將對分佈式存儲的寫入複製到本地文件。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e8/e8f701752d1a057cc2fb64bc6a8a8088.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"主副本和輔助副本的關係","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於 checkpoint,主副本必須成功並且生成輔助本地副本失敗不會使 checkpoint 失敗。如果無法創建主副本,即使已成功創建輔助副本,checkpoint 也被認爲失敗。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"只有主副本由 JobManager 確認和管理。輔助副本由 TaskManager 擁有,生命週期可以獨立於主副本。例如,可以將 3 個最新 checkpoint 的歷史記錄保留爲主副本,並且只保留最新 checkpoint 的本地副本。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於恢復,如果有匹配的輔助副本可用,Flink 將始終嘗試從任務本地 State 先還原。如果在從輔助副本恢復期間出現任何問題,Flink 將透明地重試,從主副本恢復。僅當主副本和(可選)輔助副本都恢復失敗時,恢復纔會失敗。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"任務本地副本可能只包含完整 State 的一部分(例如,寫入本地文件時出現異常)。在這種情況下,Flink 將首先嚐試在本地恢復本地部分,無法恢復的 State 是從主副本恢復的。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"任務本地副本可以具有與主副本不同的格式。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果 TaskManager 丟失,則其所有任務的本地副本都將丟失。","attrs":{}}]}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"配置任務本地恢復","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"任務本地恢復在默認情況下是停用的,可以通過 Flink 的配置開啓(","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"state.backend.local-recovery","attrs":{}}],"attrs":{}},{"type":"text","text":" 指定爲 false 或 true,還可以在 Job 上設置 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"CheckpointingOptions.LOCAL_RECOVERY","attrs":{}}],"attrs":{}},{"type":"text","text":")。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Allocation-preserving scheduling","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"任務本地恢復假設在失敗情況下保持分配的 Task 調度,其原理如下:每個 Task 都會記住之前分配的 Slot,在恢復過程中會請求完全相同的 Slot 進行重啓。如果 Slot 不可用,任務將從 Resource Manager 請求一個全新的 Slot。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果一個 TaskManager 不再可用,則之前分配該 TaskManager 上的 Task 必須在其他的 TaskManager 上運行,但是不會讓其他可以在原 Slot 上恢復的 Task 改變位置。在這種策略下,會讓儘可能多的 Task 在原 Slot 上啓動,並從本地恢復 State。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule","attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1","attrs":{}},{"type":"text","text":"]  ","attrs":{}},{"type":"link","attrs":{"href":"https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/state/large_state_tuning/","title":"","type":null},"content":[{"type":"text","text":"https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/state/large_state_tuning/ ","attrs":{}}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章