從 0 到 1 搭建技術中臺之推送平臺實踐:高吞吐、低延遲、多業務隔離的設計與實現

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"APP推送是觸達用戶的一個非常重要的手段,對於提高產品活躍度、提高功能使用體驗、提升用戶粘性、提升用戶留存率都會起到重要作用。伴魚旗下多款 APP,支持豐富的用戶交互體驗,對推送的依賴上表現的尤爲突出。隨着公司業務的快速發展,伴魚旗下的 APP 也在與日俱增,對推送場景的需求也開始多樣化,推送量的需求更是飛速增長,這些都對伴魚推送平臺提出了更高的要求。本文就伴魚推送平臺在實踐中遇到問題的思考以及相應的技術方案進行詳細說明,以期給讀者帶來一些思考以及解決類似問題的思路。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推送平臺通常會遇到的問題,在伴魚這裏也都同樣遇到,最具有代表的問題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"高吞吐。推送巨大的流量如何支撐,尤其在運營集中做活動時候表現的尤爲突出,動輒是千萬量級 亦或是億級別的量級,怎麼能夠很好的支撐?"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"低延遲。推送任務要能以最快的速度讓用戶收到。運營集中做活動的時間很短,要在這有限的時間內,儘可能快的觸達用戶。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時伴魚推送平臺也遇到了我們業務特有的推送問題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多客戶端推送。伴魚旗下的諸多 APP,在業務上有着強的關聯關係,業務上的同一功能可能要給多個APP 下發推送,比如:家長都很關心學生的學習情況,學生若在學生端獲得一個獎勵,需要推送學生端,也需要及時通知家長端"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多種推送場景。伴魚諸多業務中比較突出的有三種場景的推送,一種是業務實時推送,在線課堂的交互實時推送就是比較典型的示例,這類推送用戶多爲在線用戶,推送時效性要求很高,否則會影響用戶的上課體驗;一種是類似站內信的系統通知,這類推送用戶多爲離線用戶,推送消息可靠性很高;一種是營銷推送,這類推送用戶多爲離線用戶,可靠性要求沒有前兩種高。第一種場景的流量曲線跟業務高峯期的曲線相同,每天量相對穩定,後兩種場景流量是典型的脈衝式流量,有推送時流量會瞬間陡增。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推送相互隔離。推送平臺是面向伴魚所有業務線的,不能因爲某個業務線的推送量過大,影響到別的業務的使用,如何能夠按業務隔離,按推送類型隔離?"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"帶着這些問題我們技術中臺協力打造了一個高吞吐、低延遲、多業務隔離的的伴魚推送平臺。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"推送流程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏簡述下推送的一般處理流程。推送常用的模式有推、拉兩種模式,拉的模式消息實時性較差,對客戶端的流量、電量消耗都比較大,因此目前很少採用這種模式。下面以推的模式爲例介紹推送的流程。各推送通道的流程大同小異,這裏不詳細展開。"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/35/35f5213efa22228883e3edf3a60cdb6b.png","alt":"push flow","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推送一般涉及到三個階段流程:設備綁定;消息推送;消息回執"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"設備綁定"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推送通道一般會爲設備分配客戶端唯一標識(token),用來做爲長連接通道的標識,依據這個標識將消息送達到對應的客戶端上。推送平臺面向業務,是以系統的用戶id作爲推送標識。推送服務這時候就需要將用戶id與客戶端標識進行映射。客戶端打開 APP 會跟推送通道建立長連接,此時獲取到token。然後將當前用戶id和token一起發送給推送服務,推送服務將信息存儲完成綁定操作。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"消息推送"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當推送服務接收到推送請求時,一般推送請求是給某個用戶id發送消息。推送服務根據用戶id獲取到之前綁定的設備標識,針對不同的通道構造不同的消息格式,調用推送通道的接口執行發送。推送通道接收到請求後,根據設備標識找到對應的長連接,將消息通過長連接通道發送到客戶端。客戶端進行消息的彈窗展示等邏輯處理。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"消息回執"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"客戶端在收到消息或者點擊的時候,可以調用配置的回執地址(有些推送通道支持,有些是主動請求推送通道輪訓拉取),將該條消息的狀態上報到推送服務。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"架構設計"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"伴魚推送平臺整體採用分層的架構設計,如圖:"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5d/5de3402e99066a217d6acbf7784365ae.png","alt":"archtecture","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"分層介紹"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整體架構從上而下包括:業務層、網關層、推送服務層、長連接服務層以及依賴的基礎資源層"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"業務層"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務會藉助推送的通道能力實現複雜的業務功能,其中典型的業務場景就是im。用戶在發出im消息之後,藉助推送平臺進行消息信令的下行投遞。im的實現是考慮了消息可靠性,客戶端在收到推送的消息信令或客戶端與後端連接服務建立長連接時,會採用拉的方式將消息獲取到。業務上會藉助im的這一特點使用特殊的系統號來發送im消息,從而達到可靠性推送的業務場景。前文提到的系統消息就是藉助這一方案實現的。在推送可靠性相對較低的場景下業務也會使用推送的接口直接下發消息。爲了方便運營人員進行營銷推送和系統通知,我們開發了運營推送平臺。該平臺會根據不同的推送場景需求(系統通知/營銷消息)選擇通過im或者直接調用推送下發。業務層的所有推送流量都會統一接管到推送網關層,由網關層進行下行處理。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"網關層"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推送網關作爲一個網關層處理路由轉發、流量控制等基本的網關工作。路由轉發針對不同 APP、不同消息類型結合相應下發策略配置,將消息推送到相應的 APP 上。下發規則在網關控制後,業務上的只需要關心推送消息的業務場景,不用考慮消息是否需要調用學生端推送,或者需要調用家長端推送。流量控制上網關層會根據推送的 APP、推送的消息類型、推送的消息級別、推送的場景等信息設置不同的消息處理能力。消息處理能力主要體現在消息的接收和下行。通過靈活的配置,可以隨時爲優先級高的業務消息提供更多的資源支持,接收上提供更多 mq 的 topic,下行上提供更多的協程資源、更快的速度上限。網關層針對流量路由到不同 APP 之後就需要通過推送服務進行實際的推送處理。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"推送服務層"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推送服務層主要是針對客戶端的長連接標識和用戶信息進行映射綁定,並針對不同的用戶客戶端類型藉助不同的推送通道進行消息下發。推送服務首先處理的問題就是設備的綁定/解綁,即將用戶 id 與設備 token 建立映射關係。其次是接收到用戶 id 的推送時,獲取到相應 token 信息進行消息的下發。爲了減少伴魚各業務推送影響,推送服務層進行了縱向切分,針對不同的業務提供各自的推送服務,像圖中 pushpicturebook 是專門爲伴魚繪本 APP 提供推送的服務,pushrtc 專門爲在線課堂提供推送服務。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"長連接服務層"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"長連接層也是推送通道層。主要有自建長連接、蘋果推送的 APNS、google 的 GCM、Android 四大廠商推送(華爲、小米、vivo、oppo)。自建長連接服務基於隔離性的考慮也進行了縱向切分,比如connectpicturebook 專門用於繪本的長連接,connect 服務用於其他業務長連接。針對用戶客戶端所在網絡區域不同(國內、國外),也對長連接服務進行了橫向切分,通過接入點智能路由到相對較近的長連接接入點上。針對 APNS 和 GCM 推送通道的特點,在國外網絡訪問的延遲較低,我們將 APNS 和 GCM 的服務專門部署了海外節點,推送服務到這些節點通過專線連通。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"基礎資源層"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基礎資源層包括了伴魚基礎架構提供的各類中間件能力。推送過程中依賴比較重要的是消息隊列和緩存。推送流量到推送網關時,我們會根據使用消息隊列來進行一次緩衝,並控制下發的速度從而保護下層的服務。用戶的設備標識通過緩存存儲,減少獲取設備標識時的時延。各層也都會根據實現場景依賴 DB 進行持久化數據存儲。還會用到其他資源包括底層運維資源,這些不是本文的重點,就不再贅述。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"架構設計的思考"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"架構設計我們是立足伴魚已有的技術架構體系,針對當前階段推送目標來進行設計。我們在設計服務時以雲原生理念爲出發點,將服務粒度設計足夠小,並且儘可能的保證服務的無狀態,充分發揮底層容器化的能力。架構設計上着重考慮了服務的性能、高可用、伸縮性以及可擴展性。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"性能保證"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"高性能主要在體現各個服務的高 QPS 和低 RT。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務在流量請求到 pushgateway 時,pushgateway 直接使用消息隊列來承接所有流量,消息隊列的高性能,保證了 pushgateway 的接受消息的高性能。pushgateway 在消費消息隊列中的消息時,會根據不同流量的業務啓動不同數量大小的協程池,充分利用 go 協程併發處理能力快速處理消息。推送服務在收到推送請求時從緩存中直接獲取客戶端標識,根據用戶的設備並行調用各推送通道的接口執行消息投遞。針對推送場景的特點,設計了批量推送的接口,減少 rpc 的調用開銷。整體的思路就是能並行的充分利用協程併發處理,能異步化的藉助消息隊列解耦,能走緩存的不要請求數據庫。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"高可用保證"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"系統設計在考慮多實例、服務無狀態(長連接服務除外)等基本的高可用前提下,結合服務特點進行了不同的處理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"pushgateway 提供了靈活的動態配置,可以隨時控制下游的流量,在下游資源緊張情況下可以隨時限制流速,保證底層資源不會過載。業務上針對不同等級的消息分配不同大小的協程池資源處理,在保證了高優先級的消息能及時推送的同時,也讓低優先級的消息有機會執行推送。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推送服務依賴多個下游推送通道,我們雖然已經並行處理各通道的推送,但是任意推送通道出現問題可能還是會影響推送的可用性,針對這個問題,我們對各下游推送通道並行調用的同時,設計了超時降級策略,如果在指定時間不能響應則直接對該通道快速失敗,保證其他通道依然能夠推送,推送服務整體不會被拖垮。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"長連接服務是有狀態服務,在出現實例異常的情況下,客戶端會根據心跳及時探測到連接異常,並重新進行接入點調度,調度到新的可用節點上,重新建立長連接,長連接異常切換過程在秒級完成。異常時間段的消息均會進行緩存,並在長連接重連後重新投遞,保證長連接的高可用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推送服務和長連接服務均做了縱向切分,業務流量之間從物理上進行了隔離,保證不會因爲任意一個業務服務的異常導致其他服務的不可用。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"伸縮性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"架構設計的伸縮性主要依賴的是伴魚基礎架構及基礎運維的伸縮能力。服務的容器化部署,由 k8s 統一管控,可以隨時增減服務的實例。依賴的基礎架構中的緩存( codis 集羣)和消息隊列( kafka 集羣)均有很好的伸縮性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"pushgateway 在消息處理的伸縮能力,使用自身動態的配置可以根據推送消息量隨時調整topic的數量以及消息處理的資源消耗。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"擴展性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推送整體的分層比較細緻,將im、推送、長連接全部分開處理,這樣給業務提供了各個層的基礎能力。業務層可以根據需要依賴到任意一層。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"單就推送業務場景來看擴展性,通過推送網關將業務的各類推送屏蔽,可以隨時根據業務需求配置不同的推送策略。推送服務自身的實現也是依賴動態配置根據不同的推送類型制定不同的推送策略,比如一個人推幾個設備,是否推 apns 等,均可進行動態配置。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推送網關從設計之初就不只是面向 APP 推送,未來可以將所有觸達類(短信、微信)的推送都通過網關進行相應的路由和流控。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"核心功能實現"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"推送網關服務實現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推送網關從設計之初就是面向所有觸達類的消息推送。目前先實現網關能力中的路由和流控。推送網關處理流程示意如圖"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b7/b72de772671b9b8930220285d3590c85.png","alt":"pushgateway","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推送請求流量到推送網關後,根據推送參數中消息業務標識(src),推送類型(ptype)信息進行路由,找到對應消息隊列topic,將消息寫入後結束。網關服務自身會根據不同優先級的topic情況啓動不同資源配置的協程池按照一定速度消費消息隊列中的消息,交由消息執行器處理。消息執行器按照src、ptype路由到相應的目標推送服務執行推送。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整個流程看最主要的包括路由(隊列路由&推送路由)和圍繞消息隊列的限速消費。系統整體實現是使用go語言棧,rpc接口使用grpc。路由信息的實現全部基於配置信息,配置信息我們依賴基礎架構提供的apollo動態配置中心。消息隊列使用kafka集羣。自研的限速消費及協程池。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"隊列路由"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"消息隊列的 topic 分有兩個維度一個是 topic 的標識,一個是 topic 的優先級。系統初始會在 kafka 中申請三個 topic:default_high, default_normal, default_low 分別對應默認的高優先級消息、中優先級消息及低優先級消息。可以看出topic的完整名稱是由 \" topic 標識 + 優先級 \" 組成。所以隊列路由解決的問題就是根據入參能夠找到 topic 標識及優先級,即 kafka 的 topic。規則默認所有找不到 topic 標識的都使用default,找不到優先級的均使用high。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隊列路由配置信息依賴 apollo 的 key=value 方式配置,配置的格式爲"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"topicpre.$src[.$ptype]=xxxx"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"level.$src[.$ptype]=[1~10]"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在收到推送請求後根據請求中的 src,ptype 從路由配置中找到對應 topic 標識和優先級的值。配置中[.$ptype]表示該部分可選,若不配置這部分意味着這個 src 下的所有 ptype 都使用相同的配置,以此簡化路由配置規則。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"推送路由"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推送路由的實現與隊列路由基本一致,只是配置信息中的 value 值是多個 src,格式如"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"srcroute.$src[.$ptype]=kid,picturebook"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據請求中 src 和 ptype 可以知道當前的推送目標業務需要推到少兒和繪本"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"限速消費"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"限速主要通過控制從 kafka 中消費消息的 qps 達到限速的目的。通過 qps 計算出一個消息處理最少需要的時間,當處理消息時間少於這個最小時間的時候,協程進行阻塞等待,直到時間達到最小時間才進行下一個消息消費。qps 的信息也是依賴 apollo 的配置管理。配置格式如"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"qps.$topic=100"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對每個 topic 均可以指定速度,如果不指定也有默認的速度值。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"速度限制在實際推送過程作用非常重要。底層資源有可能會被海量的推送消耗完,直接影響正常業務。一個典型的資源就是專線,流量不做限制的話,消息推送的瞬間流量可以輕鬆達到百兆,專線資源是非常昂貴的,一般不會爲這類脈衝式流量提供非常大的帶寬資源。這時候就得通過速度控制住下游流量。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"協程池"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"系統在啓動時就會爲不同 topic 分配一個協程池用以消費消息隊列中的消息。協程池資源大小的分配按照 high:normal:low=5:2:1 的比例進行分配。這樣保證高優先級消息佔用較多的資源,低優先級的消息也能得到被處理的機會。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"推送服務實現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推送服務主要實現設備綁定和消息推送功能"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"設備綁定"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"設備綁定的流程如圖"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b7/b72484dbb099a83e3f023d50e355d093.png","alt":"bind","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"設備綁定信息是典型的讀多寫少的數據,非常適合使用緩存處理,此處設備信息除了存儲db也會在緩存中保留一份。綁定請求至少有用戶標識(uid),業務標識(src),通道的客戶端標識(token)和通道類型(ttype)。其中src表示不同的業務,如:伴魚少兒英語、伴魚繪本等;token如前文所述每個推送通道在客戶端建立長連接之後,使用該token表示長連接;ttype是token的類型,用該參數表示當前要綁定的通道是 APNS、GCM、華爲、小米或者是自建的長連接。每個用戶會有多條綁定記錄,相同的推送通道也可能有多條記錄,數據量會隨着用戶增長遞增。因此這裏需要考慮分庫分表的策略,具體策略可以根據當前的業務情況選擇。目前我們不同業務的數據量差別較大,爲減少相互影響,此處使用了src進行分表處理。用戶設備綁定信息變更的時候,爲減少數據不一致,我們採用了直接刪除緩存的操作。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"消息推送"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"消息推送流程如圖"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/bc/bcaa532939254e336f6500f49e7e1978.png","alt":"dopush","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"收到的推送請求至少包含用戶標識(uid)、業務標識(src)、推送場景(ptype)、消息信息(msg)。其中ptype是表示不同的推送場景,比如 im聊天消息的推送、營銷消息推送等,對於不同場景的推送會配置不同的推送策略。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推送策略目前主要考慮到的是根據ptype配置推送通道、推送通道的通知設備數(如:用戶在多個蘋果手機都綁定過同一用戶,某些業務場景只需要最近的設備收到,則可以通過該配置支持)、推送客戶端的版本範圍(如:有些推送只能在某些固定客戶端版本纔有支持,其他版本可能帶來異常等,可以通過版本範圍控制)等,也有不同推送通道的其他配置信息。通過推送策略配置信息獲取到需要推送的通道信息之後,針對各個通道並行執行推送。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"各個推送通道的處理流程基本一樣。首先根據uid在設備綁定信息中獲取token,前面已經提到設備信息是讀多寫少特點,會先從緩存讀取,緩存沒有則從數據庫將數據加載到緩存中。然後根據不同推送通道的協議,拼裝各通道推送需要的相關參數,並調用各推送通道服務執行推送。流程中在獲取token時,我們講到了先緩存後db,而我們在實現時,並不是嚴格按照這個過程。試想兩種場景:在線課堂的推送和業務營銷推送。在線課堂的場景中,用戶基本是在線狀態,用戶會在上課這段時間內收到大量推送,這種場景使用緩存是再合適不過的,但是對於營銷消息,可能大量的用戶是冷數據,冷數據都會先訪問一次緩存,判斷緩存不存在,再訪問一次數據庫,然後再將數據寫入到緩存,至少三次網絡io操作,並且沒有減輕db的壓力,對於這個情況我們實現時考慮了兩個方案:一個方案是保證設備綁定信息大量存在緩存中,設置較長的過期時間,這樣可以減少db的訪問;另一個方案是直接跳過緩存使用db,將三次io減少爲一次io。對於方案一設計上看可能更爲合理,但這樣會消耗大量的緩存資源,而方案二我們通過將集羣規模擴大增加更多用於讀的從節點,一樣可以達到效果。鑑於當前資源配備情況,我們選擇了方案二,並且比較好的實現了業務目標。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"未來工作"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"伴魚推送平臺在日常支撐業務的過程中我們也發現了一些不足,需要後續繼續迭代優化:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在網關控制下行速度的時候,更多還是通過經驗值及大促活動時人工介入處理,我們完全可以根據下行能力,以及大促推送消息的任務量提前進行預判和動態調整。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"早期推送更多的是關注下行的能力,對於送達質量上並沒有更細緻的打磨,需要針對推送流程,推送數據的特點,推送通道的特點增加更多的推送策略,提高推送質量"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推送網關消息隊列上依託的是kafka,因此在處理不同優先級的消息、不同業務的topic時,還是不夠靈活。topic需要提前人工配置。未來應該考慮優先級以及topic的靈活實現,支持彈性收縮。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"參考"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://www.infoq.cn/article/LczBKq7gMx55OwuJsjRV","title":null},"content":[{"type":"text","text":"一篇文章帶你瞭解 APP PUSH 推送機制"}]}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s?__biz=MjM5MDE0Mjc4MA==&mid=2650994440&idx=1&sn=4d14a4f9e5adac3f2cddfff2ea7e7427&chksm=bdbf0f5b8ac8864daefffa1862cfb02fc41788727441a1ec4932fd49468305273849a9aa8b90&scene=27#wechat_redirect","title":null},"content":[{"type":"text","text":"日消息量突破50億,小米是如何設計高可用推送系統的?"}]}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://firebase.google.com/docs/cloud-messaging/fcm-architecture","title":null},"content":[{"type":"text","text":"GCM"}]}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://developer.apple.com/library/archive/documentation/NetworkingInternet/Conceptual/RemoteNotificationsPG/APNSOverview.html#//apple_ref/doc/uid/TP40008194-CH8-SW1","title":null},"content":[{"type":"text","text":"APNs"}]}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://developer.huawei.com/consumer/cn/doc/development/HMS-Guides/push-introduction","title":null},"content":[{"type":"text","text":"華爲推送"}]}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://dev.mi.com/console/doc/detail?pId=863","title":null},"content":[{"type":"text","text":"小米推送"}]}]}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章