短視頻個性化Push工程精進之路

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"導讀","attrs":{}},{"type":"text","text":":短視頻Push系統是一套支持百度內多款app及多業務場景的分佈式Push系統,目前支撐着好看視頻,直播,度小視,好看大字版等app的推送業務,提供基於用戶基本特徵的個性化推送,熱門活動和熱點事件的運營推送,基於關注關係或訂閱關係的業務實時推送等場景的支持。旨在通過個性化推薦系統及運營編輯方式穩定高效的給用戶通知欄消息推送自己喜歡的內容信息從而達到提高用戶活躍度,提升用戶留存的業務目標。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"全文5886字,預計閱讀時間15分鐘。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"背景","attrs":{}},{"type":"text","text":":","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這個信息爆炸的互聯網時代,能夠及時和準確獲取信息是當今社會要解決的關鍵問題之一,Push技術改變了傳統的靠\"主動拉\"獲取信息的方式,而是變成了信息主動尋找用戶的方式,更適合在移動網絡中滿足用戶個性化信息的需求。本文主要通過介紹短視頻Push系統的設計和實現以及系統的不斷優化,從而向大家講述億級數據量的Push系統的建設經驗。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"名詞解釋","attrs":{}},{"type":"text","text":":","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"消息推送(Push)","attrs":{}},{"type":"text","text":":通知欄消息推送,由服務端發起推送,在用戶設備的鎖屏界面、通知欄、APP角標等位置展現的消息內容。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"個性化Push","attrs":{}},{"type":"text","text":":通過用戶畫像和推薦模型挑選用戶感興趣的物料的Push。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"運營Push","attrs":{}},{"type":"text","text":":由運營人員在Push後臺手動編輯物料發送的Push(如:熱門活動和熱點事件等推送)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"實時Push","attrs":{}},{"type":"text","text":":根據用戶在app產生互動操作(如:關注、點贊、評論等)或直播開播需要發送開播提醒時對時間要求相對精確的實時發送的Push。","attrs":{}}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"一、瞭解系統","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1.1系統簡介","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着百度旗下短視頻業務不斷髮展,app也有上億級別的季活用戶量。Push系統每天會給app的季活用戶進行n條個性化推送和不固定條數的熱門活動和熱點事件運營推送,需要處理的數據量和併發量是系統設計需要考慮的重要問題,此外根據不同地域的用戶羣每天會發上百條的地域推送和大量的關注關係等實時推送,這對系統的穩定性要求也是很嚴格的,衆所周知Push是一種很有效的拉活手段,其系統的穩定性重要程度可想而知。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1.2系統全貌","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Push系統服務於好看視頻,直播,度小視,好看大字版等業務。系統會實時訂閱更新視頻物料信息和用戶屬性信息,保障構建Push消息體時信息的準確性,會在凌晨請求推薦服務進行個性化物料的召回,然後根據運營Push和個性化Push的時間點創建Push任務,任務創建完成後會提前半個小時進行任務的預處理操作(保障Push能按時間儘快的發送),用戶互動消息和直播開播提醒等實時Push是通過api調用實時把要推送的內容發送給Push預處理服務。預處理完成將結果寫進redis隊列中,發送服務根據任務的優先級發送信息給雲Push中臺,雲Push中臺調用廠商代理or自己的長鏈接服務將Push信息發送到用戶手機上。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"總體架構如下圖所示:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/02/02755b17038bb9fdec25954dbf4b20a9.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1.2.1 Push核心架構各模塊簡介","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1.物料中心","attrs":{}},{"type":"text","text":":存儲Push時需要的視頻物料信息,包含Push的標題,描述,物料圖片及狀態等信息,訂閱B端視頻變更消息隊列實時更新。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2.用戶中心","attrs":{}},{"type":"text","text":":存儲Push需要的用戶基本信息及Push系統特有的一些用戶屬性(如:1.預估用戶活躍時間。2.預估用戶首末條個性化Push時間等),客戶端上報用戶信息實時更新。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"3.個性化召回","attrs":{}},{"type":"text","text":":每天凌晨1點開始對季活用戶進行個性化物料召回用於白天個性化Push的發送。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.realtime-api服務","attrs":{}},{"type":"text","text":":實時寫入預處理隊列進行數據預處理及發送操作,用於實時Push等場景。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"5.頻控服務(ufc)","attrs":{}},{"type":"text","text":":防止打擾用戶,分天級別和小時級別兩種。天級別的頻控設置,一個用戶一天內設置最大Push條數。小時級別,每個用戶每半小時內最多收到1條Push。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"6.預處理服務","attrs":{}},{"type":"text","text":":提前半小時對入庫的任務進行切分,消息構造和入Push隊列等處理,保障Push任務按時發送。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"7.發送服務","attrs":{}},{"type":"text","text":":根據任務的發送時間及任務的優先級從Push隊列中獲取相應廠商的任務將任務根據廠商的ups和qps進行切割後發送給雲Push。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8.回執服務","attrs":{}},{"type":"text","text":":根據各廠商的到達回執記錄相關日誌,用於數據統計及實時監控報警。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"9.控制中心(pcc)","attrs":{}},{"type":"text","text":":重要Push功能的可視化配置系統。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Push核心架構各模塊依賴圖如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b1/b1f1bf12d3a8b8a6eac3e982b5ecddb5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1.3系統數據流","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1.3.1系統整體數據流","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"客戶端上報用戶信息及一些用戶行爲的打點日誌到數據中心,數據中心根據客戶端打點產出相應的數據表,策略根據數據中心產出的數據表產出視頻物料、Push發送用戶集和代理配額用戶集,架構側根據策略模型進行Push物料的召回並進行任務創建和發送將信息發送給Push中臺,Push中臺發送給各廠商代理或長鏈接併產出Push相關數據表,廠商感知Push到達後發送回執消息給內部服務,架構根據到達回執記錄日誌並上報數據中心完成相關報表的產出。如下圖所示:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ed/ed98227627cc43eed7607b20fed6b011.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"客戶端","attrs":{}},{"type":"text","text":":通過Push sdk完成Push_token的綁定,上報用戶基本信息及用戶行爲打點日誌。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"數據中心","attrs":{}},{"type":"text","text":":根據業務打點產出活躍用戶表、用戶行爲表和相關業務報表。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Push策略","attrs":{}},{"type":"text","text":":天級別產出Push物料並根據用戶畫像產出個性化Push物料。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Push架構","attrs":{}},{"type":"text","text":":凌晨進行個性化Push物料召回,定時進行任務發送並處理廠商的到達回執。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"雲Push中臺","attrs":{}},{"type":"text","text":":將Push任務發送給各廠商代理或長鏈接併產出Push基礎數據表。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"廠商代理","attrs":{}},{"type":"text","text":":負責將各自廠商的Push任務發送到用戶設備併發送到達回執。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1.3.2 Push到達回執數據流","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Push到達回執分三種,各安卓代理廠商回執,Ios回執和長鏈接回執,都由Push中臺服務接收然後寫進消息隊列,架構側的Push-arrive服務消費消息隊列,1.實時統計計算並將數據寫入Redis供實時統計報表使用,1.記錄本地Log,採集後做實時監控和報警,並上傳到數據中心產出相關統計報表、Push物料候選集,此外還會產出Push的點展樣本用作Push模型的訓練。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"如下圖所示:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6b/6b3c7c783ff47bd79e11a8f733c39974.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"二、系統迭代及優化之路","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2.1 定時預估個性化Push首末條發送時間","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2.1.1 背景","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原邏輯所有用戶每天首條個性化push的時間爲6:30,最後一條個性化Push的時間爲21:45。而每個用戶起牀、入睡時間不同,不同時間對接收到的Push敏感度也不同,根據用戶習慣選擇時間發送,可以提高Push的點擊率。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2.1.2 服務設計","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過用戶的使用習慣預估不同用戶每日的首末條發送時間,達到用戶在想看手機的時候準時給他Push他感興趣的內容。顯而易見服務的難點在於怎麼預估用戶什麼時間比較空閒會看手機,大致邏輯如下,首條發送時間預估,統計7天內用戶在[5:30, 6:00]時段內的首次活躍天數,若大於1,則此用戶的首條個性化發送時間由6:30調整爲5:30;非上述區間,則統計7天內該用戶在[5:30, 6:30]時段內的首次活躍天數,若大於1,則此用戶的首條個性化發送時間由6:30調整爲6:00;剩下的用戶發送時間仍爲6:30;末條發送時間預估,統計7天內用戶在[22:15, 22:45]時段內的首次活躍天數,若大於1,則此用戶的首條個性化發送時間由21:45調整爲22:15;非上述區間,則統計7天內該用戶在[22:15 23:59]時段內的首次活躍天數,若大於1,則此用戶的首條個性化發送時間由21:45調整爲22:45;剩下的用戶發送時間仍爲21:45;如下圖所示:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/47/47fd78c218d9b52b2759ec44ba773e96.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2.2 Push系統用戶分羣服務優化","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2.2.1 背景","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此服務產出Push所需要的各種用戶集合全量用戶、個性化用戶、興趣用戶、地域用戶等,統稱爲用戶包)用戶包的產出依賴於不同的上游,包括用戶中心、策略、數據組等,隨着業務的迭代,存在以下幾個問題:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)缺乏統一管理,大多爲部署在物理機上的定時腳本,存在單點問題,數據產出的監控、報警分散。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)用戶包的存儲依賴物理機及hadoop集羣,發送過程需要通過ftp、afs文件將用戶包全量加載到內存,全量單任務耗時30s左右,影響時效性。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)每種類型用戶包都進行了單獨的存儲,存儲資源存在浪費。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4)運營多選用戶包時,加載重複的用戶標識浪費內存資源,去重過程影響時效性。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5)直播召回模塊重啓時加載關注用戶包及處理邏輯過程時間較長,影響上線效率及服務可用性,單機重啓需20分鐘。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2.2.3 服務設計","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2.2.3.1 新老架構對比","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"原架構","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e6/e644ebbf4d39ccc4a4a01f8f1f7eef6e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"新架構","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/20/20db2fda0db7c0ba5bec7cc18370979c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)爲區別當前架構中基於物理機ftp、afs集羣的用戶包,使用用戶羣來表示符合某個單一維度特徵的用戶集合。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)用戶羣的註冊和管理通過 amis 平臺統一配置,每個用戶羣擁有一個唯一標識。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)用戶羣採用 bitmap 的方式進行表示及存儲,bitmap 中每一位即表示一個用戶,每個用戶羣都可以用一個 bitmap 來表示。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4)將Push 服務中原有的用戶包地址替換爲用戶羣標籤,多個用戶羣之間支持邏輯運算,用邏輯表達式表示。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5)發送過程中首先通過用戶羣標籤的邏輯表達式查詢用戶羣服務,獲取一個最終要發送用戶的bitmap;再通過 bitmap 從用戶羣服務中批量獲取 用戶標識,流式處理併發送。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2.2.3.2 用戶羣管理設計","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/02/0259d36e486569951e31643a4bfccec9.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"用戶羣配置:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)配置層通過 amis 平臺進行用戶羣的統一管理,以任務形式存於mysql中,支持用戶羣標籤、用戶包產出地址、更新頻率、重試次數等配置;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)調度層針對每個任務進行搶佔式定時調度,將搶到的任務發給服務層執行;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)服務層獲取到任務,開始建庫過程:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1、根據用戶包地址拉取遠程文件,成功則繼續向下,失敗則修改執行記錄表任務狀態及重試次數;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2、加載文件中的用戶標識,計算對應crc64/fnv64值,並將結果映射存儲在redis 中(k:crc64/fnv64,v:用戶標識);","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3、計算當前用戶羣的 bitmap(RaoringBitmap算法),並將結果存儲在 redis 中(k:用戶羣標籤,v:用戶羣bitmap)。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2.2.3.3 在線服務交互設計","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c5/c50dfed189cd322835e27e79194a01cc.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"在線服務交互流程:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)Push 任務通過用戶羣標籤來指定發送用戶集合(amis/定時任務寫入 mysql),多個標籤使用邏輯表達式表示;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)Push 服務層獲取發送任務後,使用標籤表達式請求用戶羣在線服務;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)用戶羣在線服務根據邏輯表達式,從 redis中讀取出所有用戶羣標籤的 bitmap,進行邏輯運算,得到最終發送用戶羣的 bitmap,返回給 push 服務層;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4)Push 服務層遍歷 bitmap,按位獲取crc64/fnv64值,批量請求用戶羣服務;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5)用戶羣服務從 redis將crc64/fnv64映射回對應用戶標識,返回給 Push 服務層。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2.3 Push系統頻控服務(ufc)優化改造","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2.3.1 頻控服務主要有如下功能:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)基礎功能限制一個用戶在30分鐘內不能收到兩條Push消息,一天內Push總條數不能超過max條","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":2,"normalizeStart":2},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"結合策略的提供的白名單數據,時間段頻控,用戶標識+推送類型的權重策略數據,到達回收數據,進行個性化頻控","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"永久PushType和用戶標識 白名單功能,針對這類PushType,用戶標識不進行頻控","attrs":{}}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2.3.2背景","attrs":{}}]},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"ufc目前通過hash(用戶標識)的值取mod的形式分配固定的物理存儲頻控數據,固定了分配服務器個數,服務器ip,不容易擴展,且擴展後會影響當天的用戶頻控數據,擴展上線大約1個小時。","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)推送類型、用戶標識白名單經常變化的配置採用配置文件形式,每次改動上線耗費時間長。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":3,"normalizeStart":3},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"服務混合部署物理機,與其他服務競爭資源,會影響或者受到其他服務的影響,服務不穩定。","attrs":{}}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2.3.3服務設計","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d1/d14572bcb41c335f6c0c5b5f95952400.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"動態擴容,一致性hash算法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"首先求出服務器(節點)的哈希值,並將其配置到0~232的圓(continuum)上。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"然後採用同樣的方法求出存儲數據的鍵的哈希值,並映射到相同的圓上。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"然後從數據映射到的位置開始順時針查找,將數據保存到找到的第一個服務器上。如果超過232仍然找不到服務器,就會保存到第一臺服務器上。","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ef/efff805277f3a5de6154da6ea38abcc6.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"資源壓縮,使用protobuf協議進行數據壓縮","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Google Protocol Buffer( 簡稱 Protobuf) 是 Google 公司內部的混合語言數據標準,目前已經正在使用的有超過 48,162 種報文格式定義和超過 12,183 個 .proto 文件。他們用於 RPC 系統和持續數據存儲系統。Protocol Buffers 是一種輕便高效的結構化數據存儲格式,可以用於結構化數據串行化,或者說序列化。它很適合做數據存儲或 RPC 數據交換格式。可用於通訊協議、數據存儲等領域的語言無關、平臺無關、可擴展的序列化結構數據格式。Protobuf 由如 JSON 和 XML,不過它更小、更快、也更簡單。你可以定義自己的數據結構,然後使用代碼生成器生成的代碼來讀寫這個數據結構。你甚至可以在無需重新部署程序的情況下更新數據結構。只需使用 Protobuf 對數據結構進行一次描述,即可利用各種不同語言或從各種不同數據流中對你的結構化數據輕鬆讀寫。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"一句話描述","attrs":{}},{"type":"text","text":":protobuf是二進制協議, 通過給json設計了schema來提供更快的解析速度,主要是把比如{,\",key 等值捨棄,採用tag|value存值,傳輸效率極高,傳輸體積很小,在tcp/rpc裏的使用很普及。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"序列化耗時對比:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/9a/9a524439bda8ddab7b0e6da50ea8bd80.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"bytes字節數對比:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/2a/2a85f3c5fd3923b32df7f121c9b33486.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Push業務個協議對比:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/af/af6816583ddcf6c34d7df2947423ae51.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"結論","attrs":{}},{"type":"text","text":":proto的Marshal 比 json的Marshal快2倍,壓縮後數據大小proto是json的 1/4,且數據越大優勢越明顯。最終頻控數據壓縮後節省75%的redis資源。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"三、總結","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"消息推送(Push)是移動端App產品運營的重要手段,成本低效益高。隨着移動互聯網的高速發展,手機應用的開發越來越成熟,應用的更新頻率也隨之提高,同時各應用推送的消息也是五花八門,能否及時和準確給用戶推送他感興趣的消息內容,提升推送信息的消費率是Push系統的核心價值。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“蘋果之父”喬布斯曾說:“根據大衆的需要去設計產品其實是非常難的。因爲在很多情況下,人們並不知道自己想要什麼,所以需要你去展示給他看。”我感覺這句話用在推送上也是合適的,對於明確自己想要什麼樣的消息內容的用戶,推送內容可以投其所好(個性化推送)。對於不明確自己想要什麼樣的消息內容的用戶,App運營者在推送消息時需要考慮消息的可行性。不只是內容選擇上需要謹慎,在時間,推送對象,推送方式上等都需深思熟慮(運營推送)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"推薦閱讀","attrs":{}},{"type":"text","text":":","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"http://mp.weixin.qq.com/s?__biz=Mzg5MjU0NTI5OQ==&mid=2247506045&idx=1&sn=45b6fcadf8236f07329cb82b624a406f&chksm=c03ee801f74961176f35b7e242f896be2e2a842ea38620ba01a0134752cbd5698de5e26301e4&scene=21#wechat_redirect","title":"","type":null},"content":[{"type":"text","text":"|百度愛番番數據分析體系的架構與實踐","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"http://mp.weixin.qq.com/s?__biz=Mzg5MjU0NTI5OQ==&mid=2247505418&idx=1&sn=d99cef598f03000e2235e7d826c46a25&chksm=c03ee676f7496f6012019ececa54e33c5719be339c424f1776c2dde43b4ce5c3462438569691&scene=21#wechat_redirect","title":"","type":null},"content":[{"type":"text","text":"|託管頁前端異常監控與治理實戰","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"http://mp.weixin.qq.com/s?__biz=Mzg5MjU0NTI5OQ==&mid=2247506228&idx=1&sn=1040913f52bfafc5e8b97c0dc546462b&chksm=c03ee948f749605e688f1b501161811047e3ec75417ed8244c1865178d9ebbedda0622d6a653&scene=21#wechat_redirect","title":"","type":null},"content":[{"type":"text","text":"|基於etcd實現大規模服務治理應用實戰","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"---------- END ----------","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百度 Geek 說","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百度官方技術公衆號上線啦!","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"技術乾貨 · 行業資訊 · 線上沙龍 · 行業大會","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"招聘信息 · 內推信息 · 技術書籍 · 百度周邊","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"歡迎各位同學關注","attrs":{}}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章