流數據並行處理性能比較:Kafka vs Pulsar vs Pravega

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"引言"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"流式應用程序通常從各種各樣的來源(例如,傳感器、用戶、服務器)併發地採集數據,並形成一個事件流(stream of events)。使用單個流來捕獲由多個數據源生成的並行數據流可以使得應用程序能夠更好地理解數據,甚至更有效地處理數據。例如,將來自一組傳感器的數據輸入到單一數據流中,就可以使得應用程序通過引用單一數據流來分析所有這類傳感器數據。當這些單個的流可以以高並行度讀取時,應用程序就能自行決定如何映射自身的抽象設計到這些流進行數據讀取,而不是被人爲的基礎設施限制而決定。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"並行化在處理流數據時也很重要。當應用程序分析流中的數據時,它們通常依賴並行處理來降低延遲和提高吞吐量。爲了在讀取流式數據時支持並行性,流存儲系統允許在數據寫入時,根據事件負載進行分區。這通常基於"},{"type":"link","attrs":{"href":"https:\/\/pravega.io\/docs\/latest\/pravega-concepts\/#events","title":"","type":null},"content":[{"type":"text","text":"路由鍵"}]},{"type":"text","text":"(routing keys)的支持。通過分區,應用程序可以保留以應用本身概念(如標識符)的順序。在每個分區內,數據是有序的。在 Pravega 中,Stream 的並行單位被叫做"},{"type":"link","attrs":{"href":"https:\/\/pravega.io\/docs\/latest\/pravega-concepts\/#stream-segments","title":"","type":null},"content":[{"type":"text","text":"segment"}]},{"type":"text","text":",而在基於 topic 的系統中(如 Apache Kafka 和 Apache Pulsar),它被稱爲partitions。Pravega 的stream可以自動根據負載的變化改變segment的數量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在本文中,我們關注的是 Pravega 在多個寫客戶端同時寫入到一個多segment的stream時的表現。我們需要特別關注下數據的"},{"type":"text","marks":[{"type":"italic"}],"text":"添加路徑"},{"type":"text","text":"(append path),因爲它對於有效地讀取數據流至關重要。我們已經在"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/article\/fp1tlwaurokoxxq15qgf","title":"","type":null},"content":[{"type":"text","text":"之前的博客"}]},{"type":"text","text":"中分析了在只有一個寫入端和最多16個segment的基本IO性能。這一次,我們使用高度並行的負載,每個流最多有100個寫入端和5000個segment。這樣的設置參考了當今雲原生應用程序的需求,例如對於高度並行的工作負載,它們對於擴展和維持高性能的需求。我們將 Pravega 與 Kafka 和 Pulsar 進行比較,以瞭解這些系統因爲不同的設計而帶來的影響。我們注意到的兩個關鍵點是:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在最多有100個寫入端和5000個segment,數據流量在250MBps時,Pravega 可以在所有情況下都維持250MBps的速率。而 Kafka 在5000個partition時只有不到100MBps。Pulsar 則在大多數情況下會直接崩潰。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pravega 可以保證在95%中位數時,延遲在10毫秒以下,而 Kafka 的延遲卻高達幾十毫秒。對於 Pulsar,它有高達幾秒的延遲,並且是在我們僅僅能成功的那次。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了公平起見,我們已經測試了 Pulsar 的其他配置,以瞭解在哪些條件下它表現出良好的性能結果。因此,我們另外展示了一個對 Pulsar 更有利的配置,並且不會導致它經常崩潰。但這個配置對於我們的測試系統來說沒有那麼大的挑戰,如我們在下面進一步解釋的那樣。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在下面的章節中,我們將解釋是什麼能夠讓 Pravega 在這種情況下表現得更好,並詳細介紹我們的環境設置、實驗過程和結果。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"爲什麼 Pravega 性能更好?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們介紹一些關於 Pravega 添加路徑(append path)的設計特點,這些特點對於理解結果很重要。我們還討論了一些有關設計的權衡,並闡述了我們爲什麼在 Pravega 上選擇這種。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Pravega 的添加路徑(append path)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pravega 的添加路徑(append path)包括三個相關部分:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"添加數據的客戶端"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Segment Store,用以接收數據添加的請求,記錄其日誌並持久化存儲"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"持久化的日誌存儲,由 "},{"type":"link","attrs":{"href":"https:\/\/bookkeeper.apache.org\/","title":"","type":null},"content":[{"type":"text","text":"Apache BookKeeper"}]},{"type":"text","text":" 實現"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下圖闡釋了Pravega的添加路徑(append path):"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/f7\/ea\/f735ab2db5c49474cb07ae647e18cbea.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pravega 的添加路徑(append path)"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"客戶端添加由程序源生成的數據,並儘可能對這些數據進行批處理。在服務端收集客戶端的批處理數據,這樣做的好處時可以避免緩衝數據,但要注意是由客戶端來控制批處理何時開始和結束。客戶端使用了一種批處理跟蹤的啓發式算法,這個算法通過輸入速率和響應反饋來估計批處理的大小。有了這樣的估計,客戶端就可以決定何時關閉批處理(代碼在 "},{"type":"link","attrs":{"href":"https:\/\/github.com\/pravega\/pravega\/blob\/master\/client\/src\/main\/java\/io\/pravega\/client\/connection\/impl\/AppendBatchSizeTrackerImpl.java","title":"","type":null},"content":[{"type":"text","text":"AppendBatchSizeTrackerImpl.java"}]},{"type":"text","text":")。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於客戶端批處理的大小最終取決於應用程序源可以生成多少數據,因此很有可能單個客戶端自己無法生成足夠大的批處理。因此,當有多個寫入端時,我們有機會聚合來自多個客戶端的批處理,以形成更大的批處理。實際上,這就是segment store的關鍵作用之一: 在將數據寫入持久性日誌之前對其進行聚合,其中的細節讀者有興趣可以延申閱讀"},{"type":"link","attrs":{"href":"https:\/\/blog.pravega.io\/2019\/04\/22\/events-big-or-small-bring-them-on\/","title":"","type":null},"content":[{"type":"text","text":"本篇博客"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"segment store在一個稱爲"},{"type":"link","attrs":{"href":"https:\/\/pravega.io\/docs\/latest\/segment-containers\/","title":"","type":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"segment容器"}]},{"type":"text","text":"(segment container)的組件中執行這第二級批處理。一個segment store可以同時運行多個segment容器,並且每個容器都有自己的"},{"type":"link","attrs":{"href":"https:\/\/pravega.io\/docs\/latest\/segment-store-service\/#durable-log","title":"","type":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"持久化日誌"}]},{"type":"text","text":"(durable log)並追加到其中。這些segment container在單個segment store實例中可以並行地進行數據的append寫入。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每個segment容器負責在Pravega集羣中所有stream中一部分segment上的操作。在全局上,集羣會協調哪些segment容器對應哪些segment store實例。當增加segment store實例時(例如擴大系統規模)或減少segment store實例時(如縮減系統規模或局部宕機),segment容器集合將跨現有segment store實例重新平衡。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"持久性日誌目前是通過 Apache BookKeeper ledgers 實現的。Bookie(BookKeeper 的存儲服務器)將數據添加請求的日誌記錄到 ledgers 中,並在將數據添加加到 journals 之前執行另一層合併。這第三層的合併又是一個批量處理來自不同segment容器數據的機會。在我們用於 Pravega 的配置中,爲了保證持久性,bookie 只對將數據寫入 journal 的分segment容器做出響應。BookKeeper 還維護其他數據結構,但它們與本文的討論無關。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"低延遲、高吞吐量和持久性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"低延遲對於許多流式應用程序是至關重要的。這些應用要求數據在生成後不久就可以進行處理。高吞吐量同樣適用於需要從許多來源獲取大量數據的應用程序。如果系統不能夠維持高吞吐量的輸入,應用程序在數據峯值時可能會面臨不得不需要局部減載的危險。最後,在分析並處理這些流時,數據的丟失可能導致不正確的結果,因此,持久性對於企業應用程序也是至關重要。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然而,在一套系統裏同時實現這三個特性是具有挑戰性的。存儲設備通常通過更大塊的寫入來提高吞吐量,迫使系統緩存更多的數據並且犧牲了延遲。如果系統偏向更低的延遲,那麼每次寫入時的數據就會減少,從而降低了吞吐量。當然,如果我們不在數據刷新到磁盤後確認,那麼我們就可以不關注延遲和吞吐量之間的權衡,但是這種選擇犧牲了持久性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在我們評估的所有三個系統中,Pravega 在這三個方面總體上提供了最好的結果。與 Kafka 和 Pulsar 相比,它在保持高吞吐量和低延遲的同時保證了持久性。Kafka 則在這三個方面做出了不同的選擇。與其他兩種配置相比,默認情況下它可以獲得更高的吞吐量和更低的延遲,因爲它不會等待磁盤的寫入成功,但這種選擇犧牲了持久性。Pulsar 能夠像 Pravega 一樣保證持久性,因爲它建立在 BookKeeper 的基礎上; 儘管如此,它似乎並沒有實現一個"},{"type":"text","marks":[{"type":"italic"}],"text":"寫路徑"},{"type":"text","text":"(write path),能夠保證在多寫入客戶端和多partition的情況下依然足夠高效,正如之後與其他兩個系統相比的實驗結果所展示的那樣。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"評測與配置一覽"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們在AWS上進行了實驗。我們在 Pravega 中使用的方法非常接近我們在"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/article\/fp1tlwaurokoxxq15qgf","title":"","type":null},"content":[{"type":"text","text":"前一篇博文"}]},{"type":"text","text":"中描述的方法,可以參考那篇博文了解更多細節。與我們之前的博客不同的是,公平起見,對於 Pulsar,類似於對 Kafka 做的,我們使用了StreamNative之前的性能分析"},{"type":"link","attrs":{"href":"https:\/\/streamnative.io\/en\/blog\/tech\/2020-11-09-benchmark-pulsar-kafka-performance","title":"","type":null},"content":[{"type":"text","text":"博客"}]},{"type":"text","text":"中使用的相同的OpenMessaging測試工具"},{"type":"link","attrs":{"href":"https:\/\/github.com\/streamnative\/openmessaging-benchmark\/","title":"","type":null},"content":[{"type":"text","text":"代碼"}]},{"type":"text","text":"。在下面,我們爲這篇博文提供了主要的配置設置:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/8e\/00\/8e29ee67f43bc6f12542aaa3432a4500.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在我們性能測試中使用的主要設置"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在我們之前的博客中,我們對 Pravega、 Apache Pulsar 和 Apache Kafka 使用了相同的副本方案: 3個副本,每次收到2個確認進行寫入。對於數據持久性,Pravega 和 Pulsar 在默認情況下保證每次寫入的數據持久性,並且我們保留這種行爲。對於 Kafka,我們測試了兩種配置: 1) 默認配置no flush,這種配置下數據不會顯式地刷到磁盤中,那麼在有些相關的故障發生時可能導致數據丟失; 2) 磁盤刷新配置,其通過每次寫入刷入磁盤的方式保證數據的持久性。日誌在所有的系統都寫入了一個 NVMe 硬盤,這樣我們可以理解這三個系統如何在併發度上升的時候使用它。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們部署的硬件參數與之前的博客不同。這次的實驗使用了AWS更大的服務器實例(對於Pulsar Broker和Pravega segment store我們使用了 i3.16xlarge 實例)。更改實例的原因是,我們觀察到所有這些系統在有更多的partition\/segment時,會使用更多的 CPU。通過增加 CPU 資源,我們才能保證這些系統不會被讓 CPU 成爲性能瓶頸。我們還使用多個測試虛擬機。通過模擬一個分佈式並行數據源("},{"type":"link","attrs":{"href":"https:\/\/gist.github.com\/RaulGracia\/2616c5cbcd30ed3fd4b7f414b1766a54#file-pravega-teraform-tfvars","title":"","type":null},"content":[{"type":"text","text":"Pravega"}]},{"type":"text","text":"、 "},{"type":"link","attrs":{"href":"https:\/\/gist.github.com\/RaulGracia\/37147ba7c3640941db346e488155643e#file-kafka-terraform-tfvars","title":"","type":null},"content":[{"type":"text","text":"Kafka"}]},{"type":"text","text":" 和 "},{"type":"link","attrs":{"href":"https:\/\/gist.github.com\/RaulGracia\/2592cd3a0ec8c3e6eccee671e062c7a9#file-pulsar-terraform-tfvars","title":"","type":null},"content":[{"type":"text","text":"Pulsar"}]},{"type":"text","text":" 的 terraform 配置文件) ,這個部署符合我們在高負載下測試這些系統的目標。還要注意的是,在這個評估中,Pravega 是唯一將數據轉移到長期存儲(AWS EFS)的系統。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們使用 "},{"type":"link","attrs":{"href":"http:\/\/openmessaging.cloud\/docs\/benchmarks\/","title":"","type":null},"content":[{"type":"text","text":"OpenMessaging"}]},{"type":"text","text":" 基準測試來運行我們的實驗(請參閱這裏的"},{"type":"link","attrs":{"href":"https:\/\/blog.pravega.io\/2020\/10\/01\/when-speeding-makes-sense-fast-consistent-durable-and-scalable-streaming-data-with-pravega\/#experimental_setup","title":"","type":null},"content":[{"type":"text","text":"部署說明"}]},{"type":"text","text":"並在Pravega實驗中使用"},{"type":"link","attrs":{"href":"https:\/\/github.com\/pravega\/openmessaging-benchmark\/releases\/tag\/v0.2.0","title":"","type":null},"content":[{"type":"text","text":"此版本"}]},{"type":"text","text":")。我們將輸入數據速率固定在250MBps(每個事件1KB),而不是探索和評估系統的最大吞吐量。我們的目標是在客戶端數量和segment\/partition的數量改變時,比較的不同系統的行爲。這樣的設置使得實驗可以保證數據注入的固定以減小其他因素對性能測試的影響。爲了完整起見,我們仍然在博客的末尾討論了這些系統的最大吞吐量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"測試中的生產者和消費者線程分佈在很多虛擬機上(詳情請參見上表中的 Producers\/Consumers 行)。每個生產線程和消費線程都使用一個專用的 Kafka、 Pulsar 或 Pravega 客戶端實例。基準測試的生產者線程使用 Kafka 和 Pulsar 中的producer或 Pravega 中的writer ,而基準測試的消費者線程使用 Kafka 和 Pulsar 中的consumer或 Pravega 中的reader。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 Pravega 的具體案例中,寫入端使用了連接池技術: 這個特性允許應用程序使用一個公共的網絡連接池(默認情況下每個segment store10個連接)來處理大量的segment連接。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"數據輸入和並行性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們要評估的第一個方面是增加segment和客戶端數量對吞吐量的影響。下面的圖表顯示了 Pravega、 Kafka 和 Pulsar 的吞吐量,包括不同數量的segment和生產者。每一條線對應於在同一個工作負載下,向單一一個stream\/topic追加不同數量的生產者。對於 Kafka 和 Pulsar,我們也繪製了其他配置下的圖線。這些配置以犧牲功能性(持久性或者是no key)爲代價,爲這兩個系統提供了更好的結果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/1d\/cc\/1dfa242d10e4cd1ec5474b429ef9b9cc.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/2f\/65\/2f2a9da199116788582e95ca0c3c7f65.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該實驗可以通過"},{"type":"link","attrs":{"href":"https:\/\/github.com\/pravega\/p3_test_driver","title":"","type":null},"content":[{"type":"text","text":"P3測試程序"}]},{"type":"text","text":"復現,它使用以下的工作負載和配置文件作爲輸入。他們分別是Pravega("},{"type":"link","attrs":{"href":"https:\/\/gist.github.com\/RaulGracia\/1f9cf3698b009abc55771f4ac25c0d0d#file-testgen-pravega-parallelism-py","title":"","type":null},"content":[{"type":"text","text":"工作負載"}]},{"type":"text","text":","},{"type":"link","attrs":{"href":"https:\/\/gist.github.com\/RaulGracia\/43d2022a2f0bb0696b6903a4b18c5468#file-pravega-config-yaml","title":"","type":null},"content":[{"type":"text","text":"配置"}]},{"type":"text","text":"),Kafka("},{"type":"link","attrs":{"href":"https:\/\/gist.github.com\/RaulGracia\/c867e82d2b659b06881976e3b29f2a25#file-testgen-kafka-parallelism-py","title":"","type":null},"content":[{"type":"text","text":"工作負載"}]},{"type":"text","text":","},{"type":"link","attrs":{"href":"https:\/\/gist.github.com\/RaulGracia\/d8769e4a2ad36c5c216b2647f4a3a9db#file-kafka-config-yaml","title":"","type":null},"content":[{"type":"text","text":"配置"}]},{"type":"text","text":")和Pulsar("},{"type":"link","attrs":{"href":"https:\/\/gist.github.com\/RaulGracia\/bb544c5d685ba6a82a72b790497449c7#file-testgen-pulsar-parallelism-py","title":"","type":null},"content":[{"type":"text","text":"工作負載"}]},{"type":"text","text":","},{"type":"link","attrs":{"href":"https:\/\/gist.github.com\/RaulGracia\/0ec16e12bafce0472f8c79dae3c0945e#file-pulsar-config-yaml","title":"","type":null},"content":[{"type":"text","text":"配置"}]},{"type":"text","text":")。 這些系統的原始基準測試輸出可在此處獲得:"},{"type":"link","attrs":{"href":"https:\/\/blog.pravega.io\/wp-content\/uploads\/2021\/03\/03-03-2021-pravega-090-scalability-suite.zip","title":"","type":null},"content":[{"type":"text","text":"Pravega"}]},{"type":"text","text":","},{"type":"link","attrs":{"href":"https:\/\/blog.pravega.io\/wp-content\/uploads\/2021\/02\/07-12-2020-kafka-scalability-suite-noflush.zip","title":"","type":null},"content":[{"type":"text","text":"Kafka(不 flush)"}]},{"type":"text","text":","},{"type":"link","attrs":{"href":"https:\/\/blog.pravega.io\/wp-content\/uploads\/2021\/02\/30-11-2020-kafka-scalability-suite-flush.zip","title":"","type":null},"content":[{"type":"text","text":"Kafka(flush)"}]},{"type":"text","text":","},{"type":"link","attrs":{"href":"https:\/\/blog.pravega.io\/wp-content\/uploads\/2021\/03\/03-03-2021-pulsar-260-scalability-suite.zip","title":"","type":null},"content":[{"type":"text","text":"Pulsar"}]},{"type":"text","text":","},{"type":"link","attrs":{"href":"https:\/\/blog.pravega.io\/wp-content\/uploads\/2021\/03\/04-03-2021-pulsar-scalability-suite-favorable-config.zip","title":"","type":null},"content":[{"type":"text","text":"Pulsar(有利配置)"}]},{"type":"text","text":"。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/yy\/8a\/yycbe1921c14f547635890e0a9ab288a.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過研究上面的實驗圖表,我們觀察到以下關於吞吐量和並行性的關係:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pravega 是這些系統中唯一可以在250MBps數據流,5000個segment和100個生產者的負載下穩定工作的。這表明 Pravega 添加路徑(append path)的設計可以有效地處理高並行下的工作負載,特別是當許多寫入端的小數據的追加寫在segment容器中進行批量化處理的設計。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在我們增加partitions數量後,Kafka 的吞吐量降低了。更多的寫入者不會線性地增加吞吐量,而是存在一個性能上限。換句話來說,10個和50個生產者在吞吐量上會有顯著的區別,但是50個和100個生產者的差距對於吞吐量的影響不大。這個結果證明了很多Kafka用戶共同的擔憂,即增加partition數量後,Kafka 的性能會下降。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"值得一提的是,當我們在Kafka的配置中開啓了flush強制刷新到磁盤以確保數據持久性,吞吐量有着顯著的下降(例如對於100個生產者和500個分區時,有80%的下降)。 雖然強制刷新到磁盤會對系統造成一定的性能損失,但是這個實驗表明了在超過十個分區且保證持久性的情況下,吞吐量會有非常明顯的損失。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 在我們試驗過的大多數配置情況下都會崩潰。爲了瞭解 Pulsar 穩定性問題的根本原因,我們換了一個更有利的配置:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"等待所有來自 Bookies 的確認請求,這樣可以解決out-of-memory內存不足的錯誤(更多細節見這個"},{"type":"link","attrs":{"href":"https:\/\/github.com\/apache\/bookkeeper\/issues\/2521","title":"","type":null},"content":[{"type":"text","text":"issue"}]},{"type":"text","text":")"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不使用路由鍵來寫入數據(這樣會犧牲數據的有序性並且降低實際的寫入並行性)。如果啓動的話,我們會在 Pulsar 的 broker 中發現關於 Bookkeeper 的 "},{"type":"link","attrs":{"href":"https:\/\/github.com\/apache\/bookkeeper\/blob\/master\/bookkeeper-server\/src\/main\/java\/org\/apache\/bookkeeper\/bookie\/storage\/ldb\/DbLedgerStorage.java","title":"","type":null},"content":[{"type":"text","text":"DBLedgerStorage"}]},{"type":"text","text":" "},{"type":"link","attrs":{"href":"https:\/\/github.com\/apache\/bookkeeper\/blob\/67a02db73d62188fc8a143bd9a37038ae770e90a\/bookkeeper-server\/src\/main\/java\/org\/apache\/bookkeeper\/proto\/WriteEntryProcessorV3.java#L129","title":"","type":null},"content":[{"type":"text","text":"不能夠跟上寫入的速度"}]},{"type":"text","text":"的錯誤。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過這種配置,Pulsar 可以得到比基本情景(如10個生產者)更好的結果。然而,當實驗中有大量的生產者和分區時,它仍然顯示出性能下降和最終的不穩定性。注意,在寫操作中不使用路由鍵是 Pulsar 性能提升的主要原因。在內部,Pulsar 客戶端通過創建更大的批處理並以round-robin的方式使用segment來優化沒有鍵的情況(參見"},{"type":"link","attrs":{"href":"http:\/\/pulsar.apache.org\/docs\/en\/concepts-messaging\/#routing-modes","title":"","type":null},"content":[{"type":"text","text":"文檔"}]},{"type":"text","text":")。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總之,Kafka 和 Pulsar 在增加分區和生產者數量時都會顯著降低性能。需要高度並行性的應用程序可能無法滿足所需的性能要求,或者不得不在這個問題上投入更多資源。Pravega 是經過測試下,唯一一個可以在大量的生產者和segment的規模下依然保持持續高吞吐量的系統,同時在此配置下數據的持久性也能得到保證。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"關注寫入延遲"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在寫入流數據時,延遲也和吞吐量一樣重要。接下來,我們展示了95%中位數時寫入延遲和segment\/生產者數的關係。請注意,在本節中,我們展現了所有系統的延遲數據,而不考慮它們是否達到了要求的高吞吐量。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/b1\/be\/b1de4982b4ec1507c2b299061cbdf3be.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/95\/81\/95a3975b6064ffbf40b1f718c311d681.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/db\/63\/dbb4b2be5381c8122d2d4bcf6d8f0163.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過研究上面的實驗圖表,我們重點總結出了以下結論:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pravega 在95%中位數的情況下提供<10毫秒的延遲,而 Kafka,即使修改配置使其不刷新到磁盤,也有更高的延遲。回想一下上一節,Pravega 在有很多segment和生產者的時候也能達到目標的高吞吐量,而 Kafka 沒有。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於5000個segment,與100個生產者的情況相比,Pravega 在10個生產者下能獲得更低的延遲。這樣的結果是由於批量化處理的設計。隨着生產者的增加,Pravega 的添加路徑(append path)使得單個生產者的每一批次的變小,這導致了數據在服務器端排隊和需要更多的計算量。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka 在保證數據持久性(即打開flush開關)的模式下,延遲比默認配置更高了(95%中位數的延遲在100個生產者和500個segment的情況下達到了13.6倍的延遲)。在高並行的需求下,應用程序可能不得不在數據持久性和高並行的性能下二選一。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基本配置下的 Pulsar 只能在10個partition + 10個生產者的情況下保持低延遲。任何提升partition 數量或者寫客戶端數量都會導致系統不穩定。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當使用更有利的配置時,Pulsar 在10個生產者的情況下可以獲得<10毫秒的延遲。對於100個生產者,延遲會隨着partition數量的增加而迅速上升。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與 Kafka 相比,Pravega 擁有更低的延遲,甚至對於默認的不等待磁盤返回寫入確認的 Kafka 配置來說也是如此。對於 Pulsar,系統的不穩定性不允許我們對建立的配置進行乾淨的比較。對於更有利的no key + 等待所有bookie返回的配置下,Pulsar 能在10個生產者下保持了<10毫秒延遲,但是對於100個生產者的話延遲會增加的很快。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"關於最大吞吐量的附註"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然我們在上面的實驗中使用了一個固定的目標速率,但是我們還想了解這些系統在我們的場景中能夠達到的最大吞吐量。爲了縮小分析範圍,我們固定使用10個生產者,對比了在10和500個segment\/partition時的情況。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/c2\/bc\/c2ef6c7ce1d648833cb0577bd88630bc.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這些測試的原始測試結果可以在"},{"type":"link","attrs":{"href":"https:\/\/blog.pravega.io\/wp-content\/uploads\/2021\/03\/max_tp_parallelism.zip","title":"","type":null},"content":[{"type":"text","text":"這裏"}]},{"type":"text","text":"看到。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從基準測試的角度來看,Pravega 可以在10和500個segment下達到 720MBps 的最大吞吐量,換算在磁盤級別則大約是780MBps。這個差異是由於 Pravega 和 BookKeeper 添加了額外的元數據開銷(例如 Pravega 的"},{"type":"link","attrs":{"href":"https:\/\/blog.pravega.io\/2019\/11\/21\/segment-attributes\/","title":"","type":null},"content":[{"type":"text","text":"segment屬性"}]},{"type":"text","text":")。這個成績非常接近同步寫入磁盤的最大速度(約803MBps)。實驗如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"[ec2-user@ip-10-0-0-100 ~]$ sudo dd if=\/dev\/zero of=\/mnt\/journal\/test-500K bs=500K count=100000 oflag=direct\n100000+0 records in\n100000+0 records out\n51200000000 bytes (51 GB) copied, 63.7856 s, 803 MB\/s\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於自定義配置的 Pulsar,我們可以在基準測試上可靠地獲得近400MBps 的吞吐量。我們還測試了將客戶端的批處理時間增加到10毫秒,這使得吞吐量略有提高(515MBps)。儘管如此,我們注意到這遠遠不是磁盤寫入的最大速率,我們懷疑這是由於使用了路由鍵,因爲它減少了 Pulsar 客戶端批處理的機會。更糟糕的是,隨着分區數量的增加,Pulsar 的吞吐量很快受到了限制。這個結果表明,主要依賴客戶端來聚合數據成批具有很大的侷限性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於有10個partition的情況,我們觀察到,當 Kafka 保證持久性(“flush” 模式)時,它可以在等待寫返回時達到700MBps和不等待寫返回時達到900MBps。請注意,這隻發生在有10個partition的情況下,因爲當有500個partition時,吞吐量分別降至22MBps和140MBps。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了獲得更深入的瞭解,我們在執行實驗時使用 "},{"type":"link","attrs":{"href":"https:\/\/linux.die.net\/man\/1\/iostat","title":"","type":null},"content":[{"type":"text","text":"iostat"}]},{"type":"text","text":" 對服務器端實例進行了檢測。根據從 iostat 每秒收集的信息,Kafka 使用no flush的設置允許操作系統緩衝更多的數據,再寫到硬盤,從而導致更高的吞吐量。下面的圖表從操作系統的角度展示了 Kafka 寫入的行爲: 寫往往是250KB的大小,或者因爲緩衝根本沒有寫入(0大小)。相反,Pravega 顯示了一個更小但是一致的寫大小,因爲每一次寫都被刷新到硬盤,並且 Pravega 添加路徑(append path)定義了它們的大小。注意,即使犧牲持久性,Kafka 也只能在更少的分區上達到這樣的吞吐率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/d9\/d4\/d915e318bc9bdae4744a90670b224bd4.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在不同segment或寫入者的情況下,iostat監測的平均磁盤寫入的大小的累積分佈函數"}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着越來越多的需要讀寫並行化的實際使用情況,流存儲有效地、高效地適配這些工作負載變得至關重要。許多這樣的應用程序是雲原生的,並且它們需要有效地伸縮和並行化這些工作負載的能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這篇博客展示了,Pravega 能夠在保持低延遲和數據持久性的同時,維持數千個segment和數十個併發寫入者的高吞吐量。同時提供高吞吐量、低延遲和數據持久性是一個具有挑戰性的問題。我們比較的 Kafka 和 Pulsar 這些消息傳遞系統在同樣的測試數據集上還不夠優秀。Pravega 的添加路徑(append path)包括了多個批處理步驟,在保證數據持久性的同時,還能應對高並行度工作負載的挑戰。今後還會有更多的性能測試博客。請保持關注。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"相關文章:"},{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/theme\/56","title":null,"type":null},"content":[{"type":"text","text":"構建下一代大數據架構:流式存儲Pravega技術詳解"}],"marks":[{"type":"underline"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章