Pulsar | Kafka | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
多租戶 | 基於雲的多租戶設計 | 單租戶、高吞吐設計 | |||||||||||||||||||||||||||
部署方式 | 很好地適配了容器化的部署模式 | 業內以虛擬機部署爲主 | |||||||||||||||||||||||||||
topic 量級 | 支持數百萬個 topic,可以實現不丟數據 | 無法支持量級較大的 topic | |||||||||||||||||||||||||||
擴容 | 計算與存儲分離,擴容簡單,業務中斷時間短 | 擴容複雜 | |||||||||||||||||||||||||||
訂閱模型 | 支持共享訂閱類型,更靈活 | 只支持 Failover 消費模式 | |||||||||||||||||||||||||||
P99 延遲 | 穩定 | 波動較大"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 不僅能夠解決 Kafka 方案的不足,其不丟消息的特性更是完美契合了我們的需求,所以我們決定試用 Pulsar。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"初版設計"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最初設計時,我們想在客戶端類型和服務端類型都使用 Key_Shared 消費模式。下圖爲客戶端類型(以 HTTP 爲例)的設計,客戶每配置一條數據流轉規則,我們就在 Pulsar 中創建一個 topic,consumer 消費 topic,再經過 NAT 網關推送到客戶的 HTTP 服務器。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/a3\/74\/a363294038aaa6612124df84b65e1574.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務端類型(以 AMQP 爲例)推送的設計如下圖。如果沒有連接到 AMQP 客戶端,即使啓動 consumer 拉取到數據,也無法進行下一步處理,所以當客戶端通過負載均衡組件連接到對應的 consumer 微服務實例後,該實例纔會啓動對應 topic 的 consumer 進行消費。一個 AMQP 的連接對應一個 consumer。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/6d\/e0\/6d87bea9b08da4c3f9325f36d75b36e0.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 集羣內 topic 單 partition 吞吐量有限,當單個客戶的規則數據量超過吞吐量時,比如當 topic 的性能規格在 3000 左右,而客戶的預估業務量爲 5000 時,我們需要爲 topic 擴容 partition。爲了避免重啓 producer\/consumer,我們將 "},{"type":"codeinline","content":[{"type":"text","text":"autoUpdatePartition"}]},{"type":"text","text":" 參數設置爲 "},{"type":"codeinline","content":[{"type":"text","text":"true"}]},{"type":"text","text":",使 producer\/consumer 可以動態感知到 partition 的變化。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"初版設計在測試中遇到的問題"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在對初版設計方案進行測試時,我們發現這一方案存在一些問題,主要體現在以下三個方面:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"客戶端類型推送使用上述設計,微服務實例和 consumer 之間形成了網狀關係。假設我們有 1 萬個客戶規則和 4 個微服務實例,則會有 4 萬個消費-訂閱關係。單個微服務實例在內存中同時有 1 萬個 consumer,consumer 接收隊列大小是吞吐量和內存消耗的關鍵,但不易配置。若配置偏大,則在異常場景下,consumer 發送不出 HTTP 消息,會造成大量消息積壓在 consumer 中,導致 consumer 內存溢出。假設有 1000 個 consumer,網絡忽然斷開 5 分鐘,則這 5 分鐘內的消息都會積壓在接收隊列中;若配置偏小,consumer 與服務器之間通信的效率較低,影響系統性能。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 Pulsar 或生產消費服務滾動升級的場景中,頻繁請求 topic 元數據對集羣壓力較大(請求個數爲實例個數與 topic 數量的乘積)。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"autoUpdatePartition"}]},{"type":"text","text":" 對系統資源影響大。如果每個 topic 都開啓 "},{"type":"codeinline","content":[{"type":"text","text":"autoUpdatePartition"}]},{"type":"text","text":",按照默認設置,每個 topic 每分鐘發送一次 ZooKeeper 請求。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們在 Pulsar 社區反饋了這個問題,StreamNative 團隊同學給了大力支持和幫助,建議我們對客戶進行分組後再根據需要設置 "},{"type":"codeinline","content":[{"type":"text","text":"autoUpdatePartition"}]},{"type":"text","text":" 參數。有了社區的支持,我們決定做相應改進後開始策劃上線方案。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"上線方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們的客戶大致分爲兩種,一種是在業務忙時有大量數據上行的推送繁忙用戶,其特點是一個分片可能滿足不了訴求,用戶數量少;另一種是業務比較穩定,數據量中等,其特點是一個分片足夠,用戶數量多。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們根據建議對用戶進行分組,單獨部署推送繁忙用戶的工作負載,合設業務量中等的用戶。目前,我們根據客戶的業務容量,通過 SRE 在配置中心手動分組,未來會根據實時統計數據自動分組。對業務進行分組不僅可以大大減少 topic 和 consumer 之間的組合數量,也降低了重啓時請求元數據的次數。另外,兩類用戶客戶端參數在分組後也不完全相同,首先,"},{"type":"codeinline","content":[{"type":"text","text":"autoUpdatePartition"}]},{"type":"text","text":" 僅在繁忙用戶 topic 中開啓;其次,兩組工作負載的接收隊列大小不同。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/18\/71\/1812b929b0c6b60a887d98ed19baed71.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"部署"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們採用容器化部署方式爲兩類用戶部署:使用 deployment 方式部署 broker,StatefulSet 方式部署 BookKeeper 和 ZooKeeper。部署場景包括雲端部署和邊緣部署,不同的部署方式對可靠性、性能要求不同,我們設置的部署參數如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"
|
從 Kafka 到 Pulsar,華爲雲物聯網上雲之旅
{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"設備接入服務(IoTDA)是華爲雲物聯網平臺的核心服務,IoTDA 需要一款可靠的消息中間件,經過對比多款消息中間件的能力與特性,Apache Pulsar 憑藉其多租戶設計、計算與存儲分離架構、支持 Key_Shared 模式消費等特性成爲華爲雲物聯網消息中間件的首選。本文介紹了 Pulsar 在華爲雲物聯網的上線歷程以及上線過程中遇到的問題和相應的解決方案。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"華爲雲設備接入服務介紹"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"設備接入服務(IoTDA)具備海量設備連接上雲、設備和雲端雙向消息通信、數據流轉、批量設備管理、遠程控制和監控、OTA 升級、設備聯動規則等能力。下圖爲華爲雲物聯網架構圖,上層爲物聯網應用,包括車聯網、智慧城市、智慧園區等。設備層通過直連網關、邊緣網絡連接到物聯網平臺。目前華爲雲物聯網聯接數超過 3 億,IoT 平臺競爭力中國第一。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/9f\/e7\/9f27b0d5c9677c7fd72yyf9d312c7ce7.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據流轉指用戶在物聯網平臺設置規則後,當設備行爲滿足規則條件時,平臺會觸發相應的規則動作來實現用戶需求,例如對接到華爲雲其他服務,提供存儲、計算、分析設備數據的全棧服務,如 DIS、Kafka、OBS、InfluxDb 等,也可以通過其他通信協議和客戶的系統對接,如 HTTP、AMQP。在這些動作中,物聯網平臺主要做客戶端或服務端。根據用戶類別,可以將使用場景分爲三類:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"體量較大的客戶一般會選擇推送到消息中間件(如 Pulsar、Kafka)上,並在雲上構建自己的業務系統進行消費處理。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中長尾客戶通常會選擇將數據推送到自己的數據庫(如 MySQL)中進行處理,或由自己的 HTTP 服務器接收數據進行處理。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"更輕量級的客戶會選擇通過 AMQP 協議創建簡單的客戶端進行連接。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"原推送模塊的痛點"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原推送模塊採用 Apache Kafka 方案,這種運行模式本身有一些弊端,且擴容操作複雜,爲開發和運維團隊帶來負擔。此外,原推送模塊支持客戶端類型和服務端類型的推送,但不支持 AMQP 推送,其架構圖如下。Consumer 不斷從 Kafka 中拉取消息,並將發送失敗的消息存入數據庫,等待重試。這種運行模式帶來了很多問題:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"即使很多客戶的服務器不可達,consumer 仍需要從 Kafka 拉取消息(因爲 Kafka 只有一個 topic)並嘗試發送。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"無法根據單用戶來配置消息的存儲時間和大小。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有些客戶的服務器能力差,無法控制將消息推送到單個客戶的速率。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/1f\/0e\/1fbb47af06fc2914111260bfdd41620e.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Topic 數量"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2020 年 5 月,爲提升產品的競爭力,我們計劃讓客戶通過 AMQP 協議來接收流轉數據。AMQP 協議的客戶端接入更加複雜,而客戶可能會將 AMQP 客戶端集成在手機端,每天定時上線兩小時,這種情況下,我們需要保證客戶在使用時不會出現數據丟失,因此要求消息中間件支持多於規則數量的 topic(有些客戶單規則下數據量大,單 topic 無法支撐)。目前,我們的規則數量已超過 3 萬,預計很快會達到 5 萬,並且還會繼續增長。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/02\/72\/02yyef343f7856183089e98c4ae56d72.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka topic 在底層佔用文件句柄,且共享 OS 緩存,無法支持量級較大的 topic,友商的 Kafka 最多可以支撐 1800 個 topic。我們要想支持規則數量級別的隊列,就必須維護多個 Kafka 集羣,下圖是我們基於 Kafka 設計的方案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於 Kafka 方案的實現會非常複雜,我們不但要維護多個 Kafka 集羣的生命週期,還要維護租戶和 Kafka 集羣之間的映射關係,因爲 Kafka 不支持 Shared 消費模型,還需要兩層中繼。另外,如果某個 Kafka 集羣上 topic 數量已達到上限,但由於流轉數據過多,需要對 topic 進行擴容。在這種情況下,不遷移數據就無法對原有集羣進行擴容。整體方案非常複雜,對開發和運維都是很大的挑戰。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"爲什麼選擇 Pulsar"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了解決我們在 Kafka 方案中的問題,我們開始調研市面上流行的消息中間件,瞭解到 Apache Pulsar。Apache Pulsar 是雲原生的分佈式消息傳遞和流平臺,原生支持諸多優秀特性,其獨有 Key_Shared 模式和百萬 topic 支持是我們迫切需要的特性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 支持 Key_Shared 模式。如果 Pulsar 的單個分片支持 3000 QPS,而客戶的一個 AMQP 客戶端只支持 300 QPS。這種情況下,最佳解決方案是使用 Pulsar 的共享模式,啓用多客戶端連接,即同時連接 10 個客戶端來處理數據。如果使用 Failover 模式,則需要擴到 10 個 partition,造成資源的浪費。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 可擴展至百萬級 topic。我們可以將一個規則對應於一個 Pulsar topic。AMQP 客戶端上線時,即可從上一次消費到的位置開始讀取,保證不丟消息。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 是基於雲上的多租戶設計,而 Kafka 更偏向於在系統和系統之間對接,單租戶、高吞吐。Pulsar 考慮了基於 K8s 的部署,整體部署易實現;Pulsar 的計算與存儲分離,擴容操作簡單,擴容時 topic 中斷時間短,重試可實現業務無中斷;並且支持共享訂閱類型,更靈活。我們從不同維度對 Pulsar 和 Kafka 做了對比,結果如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.