從 Kafka 到 Pulsar，華爲雲物聯網上雲之旅

原創

2021-05-24 16:18

Pulsar

Kafka

多租戶

基於雲的多租戶設計

單租戶、高吞吐設計

部署方式

很好地適配了容器化的部署模式

業內以虛擬機部署爲主

topic 量級

支持數百萬個 topic，可以實現不丟數據

無法支持量級較大的 topic

擴容

計算與存儲分離，擴容簡單，業務中斷時間短

擴容複雜

訂閱模型

支持共享訂閱類型，更靈活

只支持 Failover 消費模式

P99 延遲

穩定

波動較大"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 不僅能夠解決 Kafka 方案的不足，其不丟消息的特性更是完美契合了我們的需求，所以我們決定試用 Pulsar。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"初版設計"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最初設計時，我們想在客戶端類型和服務端類型都使用 Key_Shared 消費模式。下圖爲客戶端類型（以 HTTP 爲例）的設計，客戶每配置一條數據流轉規則，我們就在 Pulsar 中創建一個 topic，consumer 消費 topic，再經過 NAT 網關推送到客戶的 HTTP 服務器。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/a3\/74\/a363294038aaa6612124df84b65e1574.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務端類型（以 AMQP 爲例）推送的設計如下圖。如果沒有連接到 AMQP 客戶端，即使啓動 consumer 拉取到數據，也無法進行下一步處理，所以當客戶端通過負載均衡組件連接到對應的 consumer 微服務實例後，該實例纔會啓動對應 topic 的 consumer 進行消費。一個 AMQP 的連接對應一個 consumer。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/6d\/e0\/6d87bea9b08da4c3f9325f36d75b36e0.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 集羣內 topic 單 partition 吞吐量有限，當單個客戶的規則數據量超過吞吐量時，比如當 topic 的性能規格在 3000 左右，而客戶的預估業務量爲 5000 時，我們需要爲 topic 擴容 partition。爲了避免重啓 producer\/consumer，我們將 "},{"type":"codeinline","content":[{"type":"text","text":"autoUpdatePartition"}]},{"type":"text","text":" 參數設置爲 "},{"type":"codeinline","content":[{"type":"text","text":"true"}]},{"type":"text","text":"，使 producer\/consumer 可以動態感知到 partition 的變化。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"初版設計在測試中遇到的問題"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在對初版設計方案進行測試時，我們發現這一方案存在一些問題，主要體現在以下三個方面："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"客戶端類型推送使用上述設計，微服務實例和 consumer 之間形成了網狀關係。假設我們有 1 萬個客戶規則和 4 個微服務實例，則會有 4 萬個消費-訂閱關係。單個微服務實例在內存中同時有 1 萬個 consumer，consumer 接收隊列大小是吞吐量和內存消耗的關鍵，但不易配置。若配置偏大，則在異常場景下，consumer 發送不出 HTTP 消息，會造成大量消息積壓在 consumer 中，導致 consumer 內存溢出。假設有 1000 個 consumer，網絡忽然斷開 5 分鐘，則這 5 分鐘內的消息都會積壓在接收隊列中；若配置偏小，consumer 與服務器之間通信的效率較低，影響系統性能。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 Pulsar 或生產消費服務滾動升級的場景中，頻繁請求 topic 元數據對集羣壓力較大（請求個數爲實例個數與 topic 數量的乘積）。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"autoUpdatePartition"}]},{"type":"text","text":" 對系統資源影響大。如果每個 topic 都開啓 "},{"type":"codeinline","content":[{"type":"text","text":"autoUpdatePartition"}]},{"type":"text","text":"，按照默認設置，每個 topic 每分鐘發送一次 ZooKeeper 請求。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們在 Pulsar 社區反饋了這個問題，StreamNative 團隊同學給了大力支持和幫助，建議我們對客戶進行分組後再根據需要設置 "},{"type":"codeinline","content":[{"type":"text","text":"autoUpdatePartition"}]},{"type":"text","text":" 參數。有了社區的支持，我們決定做相應改進後開始策劃上線方案。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"上線方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們的客戶大致分爲兩種，一種是在業務忙時有大量數據上行的推送繁忙用戶，其特點是一個分片可能滿足不了訴求，用戶數量少；另一種是業務比較穩定，數據量中等，其特點是一個分片足夠，用戶數量多。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們根據建議對用戶進行分組，單獨部署推送繁忙用戶的工作負載，合設業務量中等的用戶。目前，我們根據客戶的業務容量，通過 SRE 在配置中心手動分組，未來會根據實時統計數據自動分組。對業務進行分組不僅可以大大減少 topic 和 consumer 之間的組合數量，也降低了重啓時請求元數據的次數。另外，兩類用戶客戶端參數在分組後也不完全相同，首先，"},{"type":"codeinline","content":[{"type":"text","text":"autoUpdatePartition"}]},{"type":"text","text":" 僅在繁忙用戶 topic 中開啓；其次，兩組工作負載的接收隊列大小不同。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/18\/71\/1812b929b0c6b60a887d98ed19baed71.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"部署"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們採用容器化部署方式爲兩類用戶部署：使用 deployment 方式部署 broker，StatefulSet 方式部署 BookKeeper 和 ZooKeeper。部署場景包括雲端部署和邊緣部署，不同的部署方式對可靠性、性能要求不同，我們設置的部署參數如下："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"

	雲端部署	邊緣部署
Ensemble Size	3	2
Write Quorum	3	2
Ack Quorum	2	1
Broker CPU	4U	1U
Broker Mem	16G	4G
BookKeeper CPU	4U	1U
BookKeeper Mem	16G	4G
故障域	AZ	物理機"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在部署時，我們發現："}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"topic 之間 EnsembleSize 和 Write Quorum Size（Qw）參數相同時，寫入性能最好。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雲端消息量大，如果副本數量爲 2，則要冗餘 100% 部署，副本數量爲 3 時只需冗餘 50%。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Pulsar 調優方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上述方案已在半年前順利上線，並且我們還在測試環境中測試了 5 萬 topic，10 萬消息每秒的場景，測試期間遇到了一些問題，並根據具體情況採取了調優方案，詳情參閱 "},{"type":"link","attrs":{"href":"https:\/\/www.jianshu.com\/p\/ace6e1094c41","title":null,"type":null},"content":[{"type":"text","text":"Pulsar 5 萬 topic 調優"}]},{"type":"text","text":"。本節重點介紹延遲、端口和改進建議。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"降低生產\/消費延遲"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過使用測試工具，我們發現消息的整體端到端延遲較大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了方便定位問題，我們開發了單 topic debug 特性。在海量消息場景下，無論是測試環境還是生產環境，都不會輕易在 broker 開啓全局 debug。我們在配置中心增加了一個配置，只爲配置列表中的 topic 打印詳細的 debug 信息。在單 topic debug 特性的配合下，我們很快發現消息的最大延遲出現在 producer 發送消息後與服務端接收到消息之間，可能的原因是 netty 線程數配置偏小。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但增加 netty 線程數並沒有完全解決這一問題，我們發現單 jvm 實例仍會出現性能瓶頸，上文提到按用戶的數據量大小分組後，小用戶組需要服務大約四萬個 topic，由於需要啓動相同數量級的 consumer，導致啓動慢（進而導致升級時中斷時間長）、內存隊列不足、調度複雜等。我們最後決定再對小用戶組進行哈希分組，每個實例負責約 1 萬個 consumer，順利解決了生產消費延遲大的問題。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"採用 8080 端口連接 broker"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們採用 8080 端口而非 6650 端口連接 broker，原因主要有兩點:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"日誌詳細，而且大部分發向 8080 的請求是元數據請求，有助於排查問題，易監控。比如，jetty 的 requestLog 很容易監測到創建 topic 失敗、創建 producer 超時等事件。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以隔離數據請求和元數據請求，避免在 6650 端口繁忙時影響創建 topic、刪除 topic 等操作。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"和 6650 端口相比，8080 端口效率性能較差。在 5 萬 topic 量級下，升級 producer\/consumer 或 broker 時，需要創建大量的 producer\/consumer，對 8080 端口產生大量請求，如 partitions、lookup 請求。我們通過增加 Pulsar 中 jetty 線程數，順利解決了這一問題。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"改進建議"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 在運維和應對問題方面能力稍有欠缺，我們希望 Pulsar 能在以下幾個方面有所改進："}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"自動調整客戶端參數，如發送隊列大小、接收隊列大小等，這樣使用萬級 topic 時會更加流暢。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在遇到到不可恢復的錯誤（如操作 ZooKeeper 失敗）時，暴露 API 接口，使我們可以輕鬆對接到雲上的告警平臺。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在運行期間通過日誌追蹤單個 topic，支持運維人員使用 kibana 工具結合業務日誌快速定位、解決問題。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採樣追蹤 Pulsar 內部生產、消費、存儲的關鍵節點，並將數據導出到 APM 系統（如 Skywalking），更易剖析優化性能。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在負載均衡策略中考慮 topic 數量。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"BookKeeper 監控時只監控了 data 盤的使用率，沒有監控 journal 盤的使用率。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"結語"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從初次接觸 Pulsar 到設計上線，我們一共用了三四個月的時間。上線後，Pulsar 一直運行穩定、性能良好，幫助我們實現了預期目標。Pulsar 大大簡化了華爲雲物聯網平臺數據接入服務的整體架構，平穩低延遲地支撐着我們的新業務，因此我們可以專注於提升業務競爭力。由於 Pulsar 的優秀性能，我們也將其應用於數據分析服務，並且希望可以在業務中使用 Pulsar Functions，進一步提升產品競爭力。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"作者簡介"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"張儉，華爲雲 IoT 高級工程師。關注雲原生、IoT、消息中間件、APM。"}]}]}