硬核測試:Pulsar 與 Kafka 在金融場景下的性能分析

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Apache Pulsar 是下一代分佈式消息流平臺,採用計算存儲分層架構,具備多租戶、高一致、高性能、百萬 topic、數據平滑遷移等諸多優勢。越來越多的企業正在使用 Pulsar 或者嘗試將 Pulsar 應用到生產環境中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"騰訊把 Pulsar 作爲計費系統的消息總線來支撐千億級在線交易。騰訊計費體量龐大,要解決的核心問題就是必須確保錢貨一致。首先,保證每一筆支付交易不出現錯賬,做到高一致、高可靠。其次,保證計費承載的所有業務 7*24 可用,做到高可用、高性能。計費消息總線必須具備這些能力。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Pulsar 架構解析"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在一致性方面,Pulsar 採用 Quorum 算法,通過 write quorum 和 ack quorum 來保證分佈式消息隊列的副本數和強一致寫入的應答數(A>W/2)。在性能方面,Pulsar 採用 Pipeline 方式生產消息,通過順序寫和條帶化寫入降低磁盤 IO 壓力,多種緩存減少網絡請求加快消費效率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/db/db8e6c77b495363765b58d7b5549ad56.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 性能高主要體現在網絡模型、通信協議、隊列模型、磁盤 IO 和條帶化寫入。下面我會一一詳細講解。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"網絡模型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar Broker 是一個典型的 Reactor 模型,主要包含一個網絡線程池,負責處理網絡請求,進行網絡的收發以及編解碼,接着把請求通過請求隊列推送給核心線程池進行處理。首先,Pulsar 採用多線程方式,充分利用現代系統的多核優勢,把同一任務請求分配給同一個線程處理,儘量避免線程之間切換帶來的開銷。其次,Pulsar 採用隊列方式實現了網絡處理模塊及核心處理模塊的異步解耦,實現了網絡處理和文件 I/O 並行處理,極大地提高了整個系統的效率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"通信協議"},{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"信息(message)採用二進制編碼,格式簡單;客戶端生成二進制數據直接發送給 Pulsar 後端 broker,broker 端不解碼直接發送給 bookie 存儲,存儲格式也是二進制,所以消息生產消費過程沒有任何編解碼操作。消息的壓縮以及批量發送都是在客戶端完成,這能進一步提升 broker 處理消息的能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"隊列模型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 對主題(topic)進行分區(partition),並儘量將不同的分區分配到不同的 Broker,實現水平擴展。Pulsar 支持在線調整分區數量,理論上支持無限吞吐量。雖然  ZooKeeper 的容量和性能會影響 broker 個數和分區數量,但該限制上限非常大,可以認爲沒有上限。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"磁盤 IO"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"消息隊列屬於磁盤 IO 密集型系統,所以優化磁盤 IO 至關重要。Pulsar 中的磁盤相關操作主要分爲操作日誌和數據日誌兩類。操作日誌用於數據恢復,採用完全順序寫的模式,寫入成功即可認爲生產成功,因此 Pulsar 可以支持百萬主題,不會因爲隨機寫而導致性能急劇下降。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"操作日誌也可以是亂序的,這樣可以讓操作日誌寫入保持最佳寫入速率,數據日誌會進行排序和去重,雖然出現寫放大的情況,但是這種收益是值得的:通過將操作日誌和數據日誌掛在到不同的磁盤上,將讀寫 IO 分離,進一步提升整個系統 IO 相關的處理能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"條帶化寫入"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"條帶化寫入能夠利用更多的 bookie 節點來進行 IO 分擔;Bookie 設置了寫緩存和讀緩存。最新的消息放在寫緩存,其他消息會批量從文件讀取加入到讀緩存中,提升讀取效率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從架構來看,Pulsar 在處理消息的各個流程中沒有明顯的卡點。操作日誌持久化只有一個線程來負責刷盤,可能會造成卡頓。根據磁盤特性,可以設置多塊盤,多個目錄,提升磁盤讀寫性能,這完全能夠滿足我們的需求。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"測試"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在騰訊計費場景中,我們設置相同的場景,分別對 Pulsar 和 Kafka 進行了壓測對比,具體的測試場景如下。"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1c/1cb3a7d8f7c8eb019b96de95427d1f3a.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"壓測數據如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/61/61f8d6c02f3d50fa5c105de3f3224f07.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/21/21543b294c60c029b11855691e2e9d11.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/35/3557683f1e3f321c8cb9b40ccbd6177b.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上數據可以看到,網絡 IO 方面,3 個副本多分區的情況下,Pulsar 幾乎要把 broker 網卡出流量跑滿,因爲一份數據需要在 broker 端分發 3 次,這是計算存儲分離的代價。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka 的性能數據有點讓人失望,整體性能沒有上去,這應該和 Kafka 本身的副本同步機制有關:Kafka 採用的是 follow 同步拉取的策略,導致整體效率並不高。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"延遲方面,Pulsar 在生產端表現更優越些,當資源沒有到達瓶頸時,整個時耗 99% 在 10 毫秒以內,在垃圾回收(Garbage Collection,GC)和創建操作日誌文件時會出現波動。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從壓測的結果來看,在高一致的場景下,Pulsar 性能優於 Kafka。如果設置 log.flush.interval.messages=1 的情況,Kafka 性能表現更差,kafka 在設計之初就是爲高吞吐,並沒有類似直接同步刷盤這些參數。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,我們還測試了其他場景,比如百萬 Topic 和跨地域複製等。在百萬 Topic 場景的生產和消費場景測試中,Pulsar 沒有因爲 Topic 數量增長而出現性能急劇下降的情況,而 Kafka 因爲大量的隨機寫導致系統快速變慢。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 原生支持跨地域複製,並支持同步和異步兩種方式。Kafka 在同城跨地域複製中,吞吐量不高,複製速度很慢,所以在跨地域複製場景中,我們測試了 Pulsar 同步複製方式,存儲集羣採用跨城部署,等待 ACK 時必須包含多地應答,測試使用的相關參數和同城一致。測試結果證明,在跨城情況下,Pulsar 吞吐量可以達到 28萬QPS。當然,跨城跨地域複製的性能很大程度依賴於當前的網絡質量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"可用性分析"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作爲新型分佈式消息流平臺,Pulsar 有很多優勢。得益於 bookie 的分片處理以及 ledger 選擇存儲節點的策略,運維 Pulsar 非常簡單,可以擺脫類似 Kafka 手動平衡數據煩擾。但 Pulsar 也不是十全十美,本身也存在一些問題,社區仍在改進中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Pulsar 對 ZooKeeper 的強依賴"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 對 ZooKeeper 有很強的依賴。在極限情況下,ZooKeeper 集羣出現宕機或者阻塞,會導致整個服務宕機。ZooKeeper 集羣奔潰的概率相對小,畢竟 ZooKeeper 經過了大量線上系統的考驗,使用還是相對廣泛的。但 ZooKeeper 堵塞的概率相對較高,比如在百萬 Topic 場景下,會產生百萬級的 ledger 元數據信息,這些數據都需要與 ZooKeeper 進行交互。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如,創建一次主題(topic),需要創建主題分區元數據、Topic 名、Topic 存儲 ledger 節點;而創建一次 ledger 又需要創建和刪除唯一的 ledgerid 和 ledger 元數據信息節點,一共需要 5 次 ZooKeeper 寫入操作,一次訂閱也需要類似的 4 次 ZooKeeper 寫入操作,所以總共需要 9 次寫入操作。如果同時集中創建百萬級的主題,勢必會對 ZooKeeper 造成很大的壓力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/16/168faf8df582521a5a808eced3b4addb.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 具有分散 ZooKeeper 部署的能力,能夠在一定程度上緩解 ZooKeeper 的壓力,依賴最大的是 zookeeperServer 這個 ZooKeeper 集羣。從之前的分析來看,寫操作相對可控,可以通過控制檯創建 Topic。bookie 依賴的 ZooKeeper 操作頻率最高,如果該 ZooKeeper 出現阻塞,當前寫入並不會造成影響。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/46/462bf9811bc53d29afd15fd3f42be4ac.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以按照同樣的思路優化對 zookeeperServerzk 的依賴。至少對於當前的服務可以持續一段時間,給 ZooKeeper 足夠的時間進行恢復;其次減少 ZooKeeper 的寫入次數,只用於必要的操作,比如 broker 選舉等。像 broker 的負載信息,可以尋求其他存儲介質,尤其是當一個 broker 服務大量主題時,這個信息會達到兆(M)級別。我們正在和 Pulsar 社區攜手優化 broker 負載功能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Pulsar 內存管理稍複雜"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 的內存由 JVM 的堆內存和堆外存構成,消息的發送和緩存通過堆外內存來存儲,減少 IO 造成的垃圾回收(GC);堆內存主要緩存 ZooKeeper 相關數據,比如 ledger 的元數據信息和訂閱者重推的消息 ID 緩存信息,通過 dump 內存分析發現,一個 ledger 元數據信息需要佔用約 10K,一個訂閱者者重推消息 ID 緩存初始爲 16K,且會持續增長。當 broker 的內存持續增長時,最終頻繁進行整體垃圾回收(full GC),直到最終退出。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"要解決這個問題,首先要找到可以減少內存佔用的字段,比如 ledger 元數據信息裏面的 bookie 地址信息。每個 ledger 都會創建對象,而 bookie 節點非常有限,可以通過全局變量來減少創建不必要的對象;訂閱者重推消息 ID 緩存可以把初始化控制在 1K 內,定期進行縮容等。這些操作可以大大提升 Broker 的可用性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"和 Kafka 相比,Pulsar broker 的優點比較多,Pulsar 能夠自動進行負載均衡,不會因爲某個 broker 負載過高導致服務不穩定,可以快速擴容,降低整個集羣的負載。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總體來說,Pulsar 在高一致場景中,性能表現優異,目前已在騰訊內部廣泛使用,比如騰訊金融和大數據場景。大數據場景主要基於 KOP 模式,目前其性能已經非常接近 Kafka,某些場景甚至已經超越 Kafka。我們深信,在社區和廣大開發愛好者的共同努力下,Pulsar 會越來越好,開啓下一代雲原生消息流的新篇章。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文作者:劉德志"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"騰訊專家工程師、TEG 技術工程事業羣和計費系統開發者"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章