從 Kafka 到 Pulsar,華爲雲物聯網上雲之旅

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"設備接入服務(IoTDA)是華爲雲物聯網平臺的核心服務,IoTDA 需要一款可靠的消息中間件,經過對比多款消息中間件的能力與特性,Apache Pulsar 憑藉其多租戶設計、計算與存儲分離架構、支持 Key_Shared 模式消費等特性成爲華爲雲物聯網消息中間件的首選。本文介紹了 Pulsar 在華爲雲物聯網的上線歷程以及上線過程中遇到的問題和相應的解決方案。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"華爲雲設備接入服務介紹"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"設備接入服務(IoTDA)具備海量設備連接上雲、設備和雲端雙向消息通信、數據流轉、批量設備管理、遠程控制和監控、OTA 升級、設備聯動規則等能力。下圖爲華爲雲物聯網架構圖,上層爲物聯網應用,包括車聯網、智慧城市、智慧園區等。設備層通過直連網關、邊緣網絡連接到物聯網平臺。目前華爲雲物聯網聯接數超過 3 億,IoT 平臺競爭力中國第一。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/9f\/e7\/9f27b0d5c9677c7fd72yyf9d312c7ce7.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據流轉指用戶在物聯網平臺設置規則後,當設備行爲滿足規則條件時,平臺會觸發相應的規則動作來實現用戶需求,例如對接到華爲雲其他服務,提供存儲、計算、分析設備數據的全棧服務,如 DIS、Kafka、OBS、InfluxDb 等,也可以通過其他通信協議和客戶的系統對接,如 HTTP、AMQP。在這些動作中,物聯網平臺主要做客戶端或服務端。根據用戶類別,可以將使用場景分爲三類:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"體量較大的客戶一般會選擇推送到消息中間件(如 Pulsar、Kafka)上,並在雲上構建自己的業務系統進行消費處理。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中長尾客戶通常會選擇將數據推送到自己的數據庫(如 MySQL)中進行處理,或由自己的 HTTP 服務器接收數據進行處理。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"更輕量級的客戶會選擇通過 AMQP 協議創建簡單的客戶端進行連接。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"原推送模塊的痛點"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原推送模塊採用 Apache Kafka 方案,這種運行模式本身有一些弊端,且擴容操作複雜,爲開發和運維團隊帶來負擔。此外,原推送模塊支持客戶端類型和服務端類型的推送,但不支持 AMQP 推送,其架構圖如下。Consumer 不斷從 Kafka 中拉取消息,並將發送失敗的消息存入數據庫,等待重試。這種運行模式帶來了很多問題:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"即使很多客戶的服務器不可達,consumer 仍需要從 Kafka 拉取消息(因爲 Kafka 只有一個 topic)並嘗試發送。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"無法根據單用戶來配置消息的存儲時間和大小。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有些客戶的服務器能力差,無法控制將消息推送到單個客戶的速率。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/1f\/0e\/1fbb47af06fc2914111260bfdd41620e.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Topic 數量"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2020 年 5 月,爲提升產品的競爭力,我們計劃讓客戶通過 AMQP 協議來接收流轉數據。AMQP 協議的客戶端接入更加複雜,而客戶可能會將 AMQP 客戶端集成在手機端,每天定時上線兩小時,這種情況下,我們需要保證客戶在使用時不會出現數據丟失,因此要求消息中間件支持多於規則數量的 topic(有些客戶單規則下數據量大,單 topic 無法支撐)。目前,我們的規則數量已超過 3 萬,預計很快會達到 5 萬,並且還會繼續增長。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/02\/72\/02yyef343f7856183089e98c4ae56d72.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka topic 在底層佔用文件句柄,且共享 OS 緩存,無法支持量級較大的 topic,友商的 Kafka 最多可以支撐 1800 個 topic。我們要想支持規則數量級別的隊列,就必須維護多個 Kafka 集羣,下圖是我們基於 Kafka 設計的方案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於 Kafka 方案的實現會非常複雜,我們不但要維護多個 Kafka 集羣的生命週期,還要維護租戶和 Kafka 集羣之間的映射關係,因爲 Kafka 不支持 Shared 消費模型,還需要兩層中繼。另外,如果某個 Kafka 集羣上 topic 數量已達到上限,但由於流轉數據過多,需要對 topic 進行擴容。在這種情況下,不遷移數據就無法對原有集羣進行擴容。整體方案非常複雜,對開發和運維都是很大的挑戰。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"爲什麼選擇 Pulsar"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了解決我們在 Kafka 方案中的問題,我們開始調研市面上流行的消息中間件,瞭解到 Apache Pulsar。Apache Pulsar 是雲原生的分佈式消息傳遞和流平臺,原生支持諸多優秀特性,其獨有 Key_Shared 模式和百萬 topic 支持是我們迫切需要的特性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 支持 Key_Shared 模式。如果 Pulsar 的單個分片支持 3000 QPS,而客戶的一個 AMQP 客戶端只支持 300 QPS。這種情況下,最佳解決方案是使用 Pulsar 的共享模式,啓用多客戶端連接,即同時連接 10 個客戶端來處理數據。如果使用 Failover 模式,則需要擴到 10 個 partition,造成資源的浪費。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 可擴展至百萬級 topic。我們可以將一個規則對應於一個 Pulsar topic。AMQP 客戶端上線時,即可從上一次消費到的位置開始讀取,保證不丟消息。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 是基於雲上的多租戶設計,而 Kafka 更偏向於在系統和系統之間對接,單租戶、高吞吐。Pulsar 考慮了基於 K8s 的部署,整體部署易實現;Pulsar 的計算與存儲分離,擴容操作簡單,擴容時 topic 中斷時間短,重試可實現業務無中斷;並且支持共享訂閱類型,更靈活。我們從不同維度對 Pulsar 和 Kafka 做了對比,結果如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"
PulsarKafka
多租戶基於雲的多租戶設計單租戶、高吞吐設計
部署方式很好地適配了容器化的部署模式業內以虛擬機部署爲主
topic 量級支持數百萬個 topic,可以實現不丟數據無法支持量級較大的 topic
擴容計算與存儲分離,擴容簡單,業務中斷時間短擴容複雜
訂閱模型支持共享訂閱類型,更靈活只支持 Failover 消費模式
P99 延遲穩定波動較大"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 不僅能夠解決 Kafka 方案的不足,其不丟消息的特性更是完美契合了我們的需求,所以我們決定試用 Pulsar。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"初版設計"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最初設計時,我們想在客戶端類型和服務端類型都使用 Key_Shared 消費模式。下圖爲客戶端類型(以 HTTP 爲例)的設計,客戶每配置一條數據流轉規則,我們就在 Pulsar 中創建一個 topic,consumer 消費 topic,再經過 NAT 網關推送到客戶的 HTTP 服務器。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/a3\/74\/a363294038aaa6612124df84b65e1574.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務端類型(以 AMQP 爲例)推送的設計如下圖。如果沒有連接到 AMQP 客戶端,即使啓動 consumer 拉取到數據,也無法進行下一步處理,所以當客戶端通過負載均衡組件連接到對應的 consumer 微服務實例後,該實例纔會啓動對應 topic 的 consumer 進行消費。一個 AMQP 的連接對應一個 consumer。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/6d\/e0\/6d87bea9b08da4c3f9325f36d75b36e0.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 集羣內 topic 單 partition 吞吐量有限,當單個客戶的規則數據量超過吞吐量時,比如當 topic 的性能規格在 3000 左右,而客戶的預估業務量爲 5000 時,我們需要爲 topic 擴容 partition。爲了避免重啓 producer\/consumer,我們將 "},{"type":"codeinline","content":[{"type":"text","text":"autoUpdatePartition"}]},{"type":"text","text":" 參數設置爲 "},{"type":"codeinline","content":[{"type":"text","text":"true"}]},{"type":"text","text":",使 producer\/consumer 可以動態感知到 partition 的變化。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"初版設計在測試中遇到的問題"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在對初版設計方案進行測試時,我們發現這一方案存在一些問題,主要體現在以下三個方面:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"客戶端類型推送使用上述設計,微服務實例和 consumer 之間形成了網狀關係。假設我們有 1 萬個客戶規則和 4 個微服務實例,則會有 4 萬個消費-訂閱關係。單個微服務實例在內存中同時有 1 萬個 consumer,consumer 接收隊列大小是吞吐量和內存消耗的關鍵,但不易配置。若配置偏大,則在異常場景下,consumer 發送不出 HTTP 消息,會造成大量消息積壓在 consumer 中,導致 consumer 內存溢出。假設有 1000 個 consumer,網絡忽然斷開 5 分鐘,則這 5 分鐘內的消息都會積壓在接收隊列中;若配置偏小,consumer 與服務器之間通信的效率較低,影響系統性能。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 Pulsar 或生產消費服務滾動升級的場景中,頻繁請求 topic 元數據對集羣壓力較大(請求個數爲實例個數與 topic 數量的乘積)。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"autoUpdatePartition"}]},{"type":"text","text":" 對系統資源影響大。如果每個 topic 都開啓 "},{"type":"codeinline","content":[{"type":"text","text":"autoUpdatePartition"}]},{"type":"text","text":",按照默認設置,每個 topic 每分鐘發送一次 ZooKeeper 請求。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們在 Pulsar 社區反饋了這個問題,StreamNative 團隊同學給了大力支持和幫助,建議我們對客戶進行分組後再根據需要設置 "},{"type":"codeinline","content":[{"type":"text","text":"autoUpdatePartition"}]},{"type":"text","text":" 參數。有了社區的支持,我們決定做相應改進後開始策劃上線方案。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"上線方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們的客戶大致分爲兩種,一種是在業務忙時有大量數據上行的推送繁忙用戶,其特點是一個分片可能滿足不了訴求,用戶數量少;另一種是業務比較穩定,數據量中等,其特點是一個分片足夠,用戶數量多。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們根據建議對用戶進行分組,單獨部署推送繁忙用戶的工作負載,合設業務量中等的用戶。目前,我們根據客戶的業務容量,通過 SRE 在配置中心手動分組,未來會根據實時統計數據自動分組。對業務進行分組不僅可以大大減少 topic 和 consumer 之間的組合數量,也降低了重啓時請求元數據的次數。另外,兩類用戶客戶端參數在分組後也不完全相同,首先,"},{"type":"codeinline","content":[{"type":"text","text":"autoUpdatePartition"}]},{"type":"text","text":" 僅在繁忙用戶 topic 中開啓;其次,兩組工作負載的接收隊列大小不同。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/18\/71\/1812b929b0c6b60a887d98ed19baed71.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"部署"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們採用容器化部署方式爲兩類用戶部署:使用 deployment 方式部署 broker,StatefulSet 方式部署 BookKeeper 和 ZooKeeper。部署場景包括雲端部署和邊緣部署,不同的部署方式對可靠性、性能要求不同,我們設置的部署參數如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"
雲端部署邊緣部署
Ensemble Size32
Write Quorum32
Ack Quorum21
Broker CPU4U1U
Broker Mem16G4G
BookKeeper CPU4U1U
BookKeeper Mem16G4G
故障域AZ物理機"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在部署時,我們發現:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"topic 之間 EnsembleSize 和 Write Quorum Size(Qw)參數相同時,寫入性能最好。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雲端消息量大,如果副本數量爲 2,則要冗餘 100% 部署,副本數量爲 3 時只需冗餘 50%。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Pulsar 調優方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上述方案已在半年前順利上線,並且我們還在測試環境中測試了 5 萬 topic,10 萬消息每秒的場景,測試期間遇到了一些問題,並根據具體情況採取了調優方案,詳情參閱 "},{"type":"link","attrs":{"href":"https:\/\/www.jianshu.com\/p\/ace6e1094c41","title":null,"type":null},"content":[{"type":"text","text":"Pulsar 5 萬 topic 調優"}]},{"type":"text","text":"。本節重點介紹延遲、端口和改進建議。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"降低生產\/消費延遲"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過使用測試工具,我們發現消息的整體端到端延遲較大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了方便定位問題,我們開發了單 topic debug 特性。在海量消息場景下,無論是測試環境還是生產環境,都不會輕易在 broker 開啓全局 debug。我們在配置中心增加了一個配置,只爲配置列表中的 topic 打印詳細的 debug 信息。在單 topic debug 特性的配合下, 我們很快發現消息的最大延遲出現在 producer 發送消息後與服務端接收到消息之間,可能的原因是 netty 線程數配置偏小。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但增加 netty 線程數並沒有完全解決這一問題,我們發現單 jvm 實例仍會出現性能瓶頸,上文提到按用戶的數據量大小分組後,小用戶組需要服務大約四萬個 topic,由於需要啓動相同數量級的 consumer,導致啓動慢(進而導致升級時中斷時間長)、內存隊列不足、調度複雜等。我們最後決定再對小用戶組進行哈希分組,每個實例負責約 1 萬個 consumer,順利解決了生產消費延遲大的問題。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"採用 8080 端口連接 broker"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們採用 8080 端口而非 6650 端口連接 broker,原因主要有兩點:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"日誌詳細,而且大部分發向 8080 的請求是元數據請求,有助於排查問題,易監控。比如,jetty 的 requestLog 很容易監測到創建 topic 失敗、創建 producer 超時等事件。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以隔離數據請求和元數據請求,避免在 6650 端口繁忙時影響創建 topic、刪除 topic 等操作。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"和 6650 端口相比,8080 端口效率性能較差。在 5 萬 topic 量級下,升級 producer\/consumer 或 broker 時,需要創建大量的 producer\/consumer,對 8080 端口產生大量請求,如 partitions、lookup 請求。我們通過增加 Pulsar 中 jetty 線程數,順利解決了這一問題。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"改進建議"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 在運維和應對問題方面能力稍有欠缺,我們希望 Pulsar 能在以下幾個方面有所改進:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"自動調整客戶端參數,如發送隊列大小、接收隊列大小等,這樣使用萬級 topic 時會更加流暢。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在遇到到不可恢復的錯誤(如操作 ZooKeeper 失敗)時,暴露 API 接口,使我們可以輕鬆對接到雲上的告警平臺。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在運行期間通過日誌追蹤單個 topic,支持運維人員使用 kibana 工具結合業務日誌快速定位、解決問題。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採樣追蹤 Pulsar 內部生產、消費、存儲的關鍵節點,並將數據導出到 APM 系統(如 Skywalking),更易剖析優化性能。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在負載均衡策略中考慮 topic 數量。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"BookKeeper 監控時只監控了 data 盤的使用率,沒有監控 journal 盤的使用率。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"結語"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從初次接觸 Pulsar 到設計上線,我們一共用了三四個月的時間。上線後,Pulsar 一直運行穩定、性能良好,幫助我們實現了預期目標。Pulsar 大大簡化了華爲雲物聯網平臺數據接入服務的整體架構,平穩低延遲地支撐着我們的新業務,因此我們可以專注於提升業務競爭力。由於 Pulsar 的優秀性能,我們也將其應用於數據分析服務,並且希望可以在業務中使用 Pulsar Functions,進一步提升產品競爭力。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"作者簡介"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"張儉,華爲雲 IoT 高級工程師。關注雲原生、IoT、消息中間件、APM。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章