數據處理能力相差 2.4 倍?Flink 使用 RocksDB 和 Gemini 的性能對比實驗

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"微博機器學習平臺使用 Flink 實現多流 join 來生成在線機器學習需要的樣本。時間窗口內的數據會被緩存到 state 裏,且 state 訪問的延遲通常決定了作業的性能。開源 Flink 的狀態存儲主要包括 RocksDB 和 Heap 兩種,而在去年的 Flink Forward 大會上我們瞭解到阿里雲 VVP 產品自研了一款更高性能的狀態存儲插件 Gemini,並對其進行了測試和試用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在本篇文章中我們將對 RocksDB、Heap 和 Gemini 在相同場景下進行壓測,並對其資源消耗進行對比。測試的 Flink 內核版本爲 1.10.0。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"測試場景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們使用真實的樣本拼接業務作爲測試場景,通過將多個流的數據union後對指定key做聚合(keyby),在聚合函數裏從各個流中獲取相應的字段,並將需要的字段重新組合成一個新的對象存儲到 value state 裏。這裏對每個新的對象都定義一個 timer,用 timer 功能來替代 TimeWindow,窗口結束時將數據發射到下游算子。使用 timer 功能的主要原因是 timer 更靈活,更方便用戶自定義,在平臺的實用性,可擴展性上表現更好。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"MemoryStateBackend vs. RocksDBStateBackend"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先需要說明的是,MemoryStateBackend 不建議在線上使用,這裏主要是通過測試量化一下使用 Heap 存儲 state 的資源消耗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們在測試中對 checkpoint 的配置如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"CheckpointInterval:10分鐘\nCheckpointingMode: EXACTLY_ONCE\nCheckpointTimeout:3分鐘"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時對 RocksDB 增加了如下配置:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"setCompressionType:LZ4_COMPRESSION\nsetTargetFileSizeBase:128 * 1024 * 1024\nsetMinWriteBufferNumberToMerge:3\nsetMaxWriteBufferNumber:4\nsetWriteBufferSize:1G\nsetBlockCacheSize:10G\nsetBlockSize:4 * 1024\nsetFilter:BloomFilter(10, false)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"測試發現,相同作業處理相同的數據量時,使用 MemoryStateBackend 的作業吞吐和 RocksDB 類似(輸入 qps 爲 30 萬,聚合後輸出 qps 爲 2 萬),但所需要的內存(taskmanager.heap.mb)是 RocksDB 的 8 倍,對應的機器資源是 RocksDB 的 2 倍。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e9/e91555780349785e6d2ca31cf3fca1b8.webp","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由此我們得出以下結論:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用 MemoryStateBackend 需要增加非常多的 Heap 空間用於存儲窗口內的狀態數據(樣本),相對於把數據放到磁盤的優點是處理性能非常好,但缺點很明顯:由於 Java 對象在內存的存儲效率不高,GB 級別的內存只能存儲百兆級別的真實物理數據,所以會有很大的內存開銷,且 JVM 大堆 GC 停機時間相對較高,影響作業整體穩定,另外遇到熱點事件會有 OOM 風險。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用 RocksDB 則需要較少的 Heap 空間即可,加大 Native 區域用於讀緩存,結合 RocksDB 的高效磁盤讀寫策略仍然有很好的性能表現。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"GeminiStateBackend vs. RocksDBStateBackend "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以通過如下方式,在 Ververica Platform 產品中指定使用 Gemini state backend:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"state.backend=org.apache.flink.runtime.state.gemini.GeminiStateBackendFactory"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時我們對 Gemini 進行了如下基礎配置:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"// 指定Gemini存儲時的本地目錄\nkubernetes.taskmanager.replace-with-subdirs.conf-keys= state.backend.gemini.local.dir\nstate.backend.gemini.local.dir=/mnt/disk3/state,/mnt/disk5/state\n// 指定Gemini的page壓縮格式(page是Gemini存儲的最小物理單元)\nstate.backend.gemini.compression.in.page=Lz4\n// 指定Gemini允許使用的內存佔比\nstate.backend.gemini.heap.rate=0.7\n// 指定Gemini的單個存儲文件大小\nstate.backend.gemini.log.structure.file.size=134217728\n// 指定Gemini的工作線程數\nstate.backend.gemini.region.thread.num=8"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"機器配置"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/44/44f2640e888e8071156de791724ca2a1.webp","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"作業使用資源對應參數"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/2b/2b2d03aaa94852ee1f5e265e883fdcfd.webp","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"內存相關參數"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/4f/4f24b318de34b27ffb5d9cd27fe9f300.webp","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"對比結果"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/43/43e3eb4a46585831a7d778f34d2bd919.webp","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Note:"},{"type":"text","text":"全量的樣本拼接負載使用 16 臺機器無法完全服務,因此我們通過對數據進行不同比例的抽樣來進行壓測。當出現反壓時,我們認爲作業已經達到性能瓶頸。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"結論"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由以上對比可以看出,在數據、作業處理邏輯、硬件配置等都相同的前提下,使用 Gemini 成功處理的數據量是 RocksDB 的 2.4 倍(17280 vs 7200 條/s)。同時通過硬件資源消耗的對比可知,RocksDB 更快達到磁盤 IO 瓶頸,而 Gemini 則具備更高的內存和 CPU 利用率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者簡介:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"曹富強、晨馨,微博機器學習研發中心-高級系統工程師。現負責微博機器學習平臺數據計算/數據存儲模塊,主要涉及實時計算 Flink、Storm、Spark Streaming,數據存儲Kafka、Redis,離線計算 Hive、Spark 等。目前專注於Flink/Kafka/Redis在微博機器學習場景的應用,爲機器學習提供框架,技術,應用層面的支持。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章