基於Doris的小程序用戶增長實踐

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文的主題爲基於Doris的小程序用戶增長實踐,將從實際案例出發介紹基於 Doris 用戶分層解決方案,重點分享了項目中的難點和架構解決方案,以及怎麼使用 Doris做用戶分層,如何做到秒級的人數預估和快速產出用戶包。主要內容包括:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"小程序私域精細化運營能力介紹"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶分層技術難點"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶分層的架構和解決方案"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"未來規劃"}]}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"小程序私域精細化運營能力介紹"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"好,現在開始。現在首先介紹一下我們小程序當前的私域精細化運營的能力有哪些。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ab\/ab34a0ebca458f58b0c6ab839441add3.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先我們爲啥要做思域精細化運營呢,這起源於兩個痛點:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"私域用戶的價值不突出"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"比如:我有100萬個用戶,我想給高收入人羣去推薦奢侈品的包包,但是我不知道在這100萬人裏面有多少人是這種高收入人羣"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"缺乏主動觸達的能力"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後針對這兩個問題,我們產品上面提出了一個解決方案 -- 就是分層運營,它主要分爲兩部分:一個是運營觸達,還有一個是精細化的人羣。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"舉個例子:如圖示,從上往下看:當運營想搞一個活動時(比如 DataFun Talk 這個活動),可以選擇消息、私信、卡券、小程序內這四個通路中的一個進行推送,選完通路之後,就需要選擇需要推送的人羣,這時候就要用到精細化人羣,精細化人羣是基於百度大數據平臺提供的畫像數據、新聞數據生成的,最後根據選擇人羣、推送通路完成推送。之後我們還會提供觸達效果的分析,主要包括下發量、點展、到達之類的,另外針對人羣也會提供整個用戶羣體更細緻化的分析。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這套解決方案的收益和價值:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"對於開發者來說:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"合理地利用私域流量提升價值"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"促進用戶活躍和轉化"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"對於整個生態來講:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"提高了私慾利用率和活躍度"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"激活了開發者主動經營的意願"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"促進了生態的良性循環"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上講的是一個產品的方案,接下來跟大家講一下具體的功能"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 分層運營-B端視角"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f3\/f3bb8217b88aafd9eb9144b18867c900.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先介紹下 B 端視角下分層運營平臺是如何工作的,比如說我是開發者,我是怎麼去做去創建用戶分層的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏我們提供了自定義配置篩選的功能,可以從用戶關注、卡券、交易、活躍行爲、性別、年齡等多種維度選擇,同時提供了預估人數的功能,可以實時的算出來你當前圈選的用戶有多少人,方便評估一下人數是否 OK,如果 OK 的話,就直接生成人羣,如果不 OK 的話,就重新選擇篩選條件。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"完成人羣篩選之後,會進入分層管理列表,在列表裏面可以根據需要點擊對應的推送按鈕就可以直接推送,推送方式包括私信、羣發。當然了,這裏也有羣體分析功能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"B 端功能入口:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"小程序開發者後臺 -> 運營中心 -> 分層運營 -> 分層管理 -> 自定義篩選"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 分層運營-C端視角"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/4c\/4c0da4cd4c0a70881466f7554280489f.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"簡單展示下 C 端視角下分層運營的一個樣式:如圖截取的是百度APP 上通知和私信的樣式。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"用戶分層技術難點"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 分層運營經典案例"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/79\/79fa943ff128b2f2eb8ffbf9cd6a8642.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面給大家介紹分層運營的一個經典的案例 -- 我們跟汽車大師的合作案例:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"需求是汽車大師要對近一週付費且活躍的用戶進行一個評價送券的活動。圖中截圖展示了這個活動的交互過程:汽車大師推送一個通知(圖中:8\/6日通知),然後用戶在《 百度APP -> 我的 -> 通知 》裏面就可以看到汽車大師的通知消息,點開之後跳轉到《諮詢待評價頁面》,然後寫完評價後系統會自動發券並通知給用戶;在這個活動中達成的效果如下:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"準確判斷了用戶需求,活躍用戶價值,頁面的打開率達到了 9.51%"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶次均使用時長提升 2.5 倍"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"活動帶來新增付費轉化率達 17.71%"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"簡單介紹下一些這裏面的基本運營技巧:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"結合和實際的業務場景,無中間頁跳轉的折損"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"拼接消息組件,自動發券場景過度順轉"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"場景可定期複用,節省人力成本。創建完人羣之後,可以一直使用,不需要重複創建"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"\"分享和使用\" 雙按鈕強強勢引導"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面的分享主要是讓大家瞭解用戶分層運營給開發者帶來運營效率以及轉化效果的一個提升,從而促進用戶增長。這種看起來是特別的香,對不對 ?但是它有沒有什麼技術難點,答案是當然有,而且還特別大,不過沒關係,大家不用擔心了,你們認真聽完煜楊老師接下來的分享這些難點就會變得很 Easy了,用之前一句特別流行的廣告語就是:媽媽再也不用擔心我的工作了。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 分層運營難點"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"難點的話,大家可以看下圖,可以看到我把難點跟方法論都放到了一起,其實最開始我是想先講難點,然後後面再講方法論的,這樣的話就可以調一下大家的胃口😁。後來我又一想,大家都是程序員對不對,既如此,程序員何苦爲難程序員,還是少一些套路,多一些真誠比較好。所以最後,我就把難點跟方法論都放到一起了,這樣可以給大家一個直觀的一個認識。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/45\/45bbabe6d8c676ecc97b7b06f41ca1e2.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先給大家簡單介紹遇到的四個難點:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"TB級數據。數據量特別大,前面講到我們是基於畫像和行爲去做的一個用戶分層,數據量是特別大的,每天的數據量規模是 1T +"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"查詢的頻響要求極高,毫秒級到秒級的一個要求。前面介紹 B 端視角功能時大家有看到,我們有一個預估人數的功能,用戶只要點擊 ”預估人數“ 按鈕,我們就需要從 TB 級的數據量級裏面計算出篩選出的人羣人數是多少,這種要在秒級時間計算 TB 級的數量的一個結果的難度其實可想而知"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"計算複雜,需要動靜組合。怎麼理解?就是現在很多維度我們是沒辦法去做預聚合的,必須去存明細數據,然後去實時的計算,這個後面也會細講"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"產出用戶包的時效性要求高。這個比較好理解,如果產出特別慢的話,肯定會影響用戶體驗"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對上面的四個難點,我們的解法是:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"針對第一個難點 --> 壓縮存儲,降低查詢的數量級。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"具體選型就是使用 Bitmap 存儲,這解法其實很好理解,不管現在主流的 OLAP引 擎有多麼厲害,數據量越大,查詢肯定會越慢,不可能說數據量越大,我查詢還是一直不變的,這種其實不存在的,所以我們就需要降低存儲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"針對第二和第三個難點 --> 選擇合適計算引擎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們調研了當前開源的包括 ClickHouse, Doris, Druid 等多種引擎,最終選擇了基於 MPP 架構的OLAP引擎 Doris。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏可以簡單跟大家介紹一下選擇 Doris 的原因,從性能來說其實都差不多,但是都 Doris 有幾個優點:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一:它是兼容 Mysql 協議,也就是說你的學習成本非常低,基本上大家只要瞭解mysql, 就會用Doris, 不需要很大的學習成本。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二:Doris 運維成本很低,基本上就是自動化運維。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"針對第四個難點 -->  選擇合適的引擎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過對比 Spark 和 Doris,我們選擇了 Doris ,後面會詳細講爲什麼會用 Doris。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"用戶分層的架構和解決方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"介紹難點以及解法之後,接下來從架構跟解決方案裏面跟大家細講一下,難點是怎麼解決的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"分層運營架構:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/3b\/3b9c1d56978cf2cabb68d078fa57c0ea.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先介紹一下我們分層運營的架構。架構的話分爲兩部分,就是在線部分跟離線部分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"在線部分:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分爲了四層:服務層、解析層、計算層跟存儲層,然後還有調度平臺和監控平臺。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務層,主要功能包含:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"權限控制:主要是戶權限、接口權限的控制"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分層管理:主要是是對用戶篩選的增刪改查"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"元數據管理:主要是對頁面元素、ID-Mapping 這類數據的管理"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"任務管理:主要是支持調度平臺任務的增刪改查"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解析層,是對DSL的一個解析、優化、路由以及Sql模板:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"比如要查在線預估人數,首先會在解析層做一個 DSL 的解析,之後根據不同情景做 DSL 的優化,比如選擇了近七天活躍且近七天不活躍的用戶,這種要七天活躍和七天不活躍的交集顯然就是零了,對不對?像這樣情況在優化層直接將結果 0 返回給用戶就不會再往下走計算引擎,類似還有很多其他優化場景。然後優化完之後會使用 DSL 路由功能,根據不同查詢路由到不同的 Sql 模板進行模板的拼接。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"計算層,計算引擎我們使用 Spark 和 Doris:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Spark:離線任務"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Doris:實時任務"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"存儲層:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Mysql:主要用來存用戶分層的一些用戶信息"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Redis:主要用作緩存"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Doris:主要存儲畫像數據和行爲數據"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"AFS:主要是存儲產出的用戶包的一些信息"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"調度平臺:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主要是離線任務的調度"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"監控平臺:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整個服務穩定性的監控"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"離線部分:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"離線部分的話主要是對需要的數據源(比如說畫像、關注、行爲等數據源)做 ETL 清洗,清洗完之後會做一個全局字典並寫入 Doris。任務最終會產出用戶包,並會分發給小程序 B 端跟百度統計:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"小程序 B 端:推送給手機端用戶"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百度統計:拿這些用戶包做一次羣體分析"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上就是一個整體的架構。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖中大家可以看到有幾個標紅的地方,同時也用數字 1、2、3 做了標記,這幾個標紅是重點模塊,就是針對於上面提到的四個難點做的重點模塊改造,接下來會針對這三個重點模塊一一展開進行講解。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 全局字典"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/74\/74a2e573630d3527ec1cebca8e5f84a7.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先講解全局字典這個模塊,全局字典的目的主要是爲了解決難點一:數據量大,需要壓縮存儲同時壓縮存儲之後還要保證查詢性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"爲啥要用全局字典:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏大家可能會有一個疑問,就是說我用 BitMap 存儲爲啥還要做全局字典?這個主要是因爲 Doris 的 BitMap 功能是基於  RoaringBitmap 實現的,因此假如說用戶 ID 過於離散的時候,RoaringBitmap 底層存儲結構用的是 Array Container 而不是 BitMap Container,Array Container 性能遠遠差於 BitMap Container。因此我們要使用全局字典將用戶 ID 映射成連續遞增的 ID,這就是使用全局字典的目的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"全局字典的更新邏輯概況:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏是使用 Spark 程序來實現的,首先加載經過 ETL 清洗之後各個數據源(畫像、關注、行爲這些數據源)和全局字典歷史表(用來維護維護用戶 ID 跟自增 ID 映射關係),加載完之後會判斷 ETL 裏面的用戶ID 是否已經存在字典表裏面,如果有的話,就直接把 ETL 的數據寫回 Doris 就行了,如果沒有就說明這是一個新用戶,然後會用 row_number 方法生成一個自增 ID ,跟這個新用戶做一次映射,映射完之後更新到全局字典並寫入 Doris。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. Doris"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來介紹第二個重點模塊 Doris。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.1 Doris 分桶策略"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/7d\/7d8eb0925a5a6e21600660bbb1a26bc2.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分桶策略的目的是爲了解決難點二:查詢頻響要求高。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"爲啥要做分桶策略:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們之前使用了全局字典保證用戶的連續遞增,但是我們發現用了全局字典之後,BitMap 的查詢性能其實並沒有達到我們預期的那樣絲滑般柔順的感覺,哈哈哈。。。對,還是特別慢,然後我們就特別鬱悶了,開始懷疑 BitMap 並不像傳說中的那麼快,難道童話都是騙人的嗎?我們就在想怎麼解決這個問題,後來我們發現 Doris 其實是分佈式的一個集羣,它會按照某些 Key 進行分桶,也就是分桶之後用戶ID 在桶內就不連續,又變成零散的了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"舉個例子,如圖中左側原始數據,可以看到 appkey 有A1、A2,channel 有C1、C2、C3,然後 userid 是0、1、2、3、4、5 六個連續的 userid 。我們按照 appkey 和 channel 進行分桶,這樣的話分完桶之後的結果就是右邊這張圖:桶一 key 是A1、C1,userid 就是0、2;桶二 key 是 A1、C2,userid 就是 1、4;桶三 key 是A2、C3,userid 是3、5;大家能比較直觀地看到在桶中 userid 已經不連續了,不連續的話,BitMap  的性能就沒法發揮出來的,它會走 Array Container 去存儲,它的性能會比較差。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"解決方式:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏,其實我想問一下大家,這種怎麼去保證桶內的連續?大家如果有想法,可以私下一起討論下,現在大家不要給自己加戲啊, 今天的star是我啊, 大家要focus on me 身上啊。開個玩笑啊, 活躍下直播間氣氛。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來我會跟大家分享一下我們的一個方案,對了,給大家五秒鐘的時間,大家可以現在記筆記了,真的,這個方案是我們經歷了無數個日日夜夜跟無數根頭髮總結出來的,非常有實戰意義。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/73\/7313ce9de7838e96f8192616e7738a88.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"好,現在講一下我們的方案,我們的方案是在表裏面增加了一個 hid 的字段,然後讓 Doris 按照 hid 字段進行分桶,這裏 hid 生成算法是:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"hid  = V\/(M\/N) 然後取整"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"V:全局字典的用戶ID 對應的整數"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"M:預估的用戶總數"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"N:分層數"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"還是結合上面的例子,大家可以看一下:userid 是六個即 0~5,所以 M= 6;分爲三個桶,N = 3;因此 M 除以 N 就等於二。這樣的話我就要用 userid 去除以二,然後取整作爲 hid。可以看一下,比如說 userid  是零,0÷2 取整爲 0 ,userid 是一的話,hid 還是這樣,因爲 1÷2 的整數部分是零;同理 2÷2 、3÷2 是一,4÷2、5÷2 是二,這樣的話就把 userid 跟 hid 做對應,然後再根據 hid 做分層。大家可以看到分層結果,hid = 0 時 userid 是0、1,hid = 1 時 userid 是2、3,hid = 2時 userid 是 4、5,這樣就保證了桶內連續。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.2 doris之用戶畫像標籤優化"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/e0\/e0f9c7a1cec5ae290aac56f6be078941.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前面給大家講了分桶與全局字典這兩個通用的策略,就是說大家要做 BitMap 的話,這兩個東西肯定是要考慮的,但是隻考慮這兩個東西,還並不能說達到性能的最優,還要結合自己的實際業務去做針對性的優化,這樣才能達到一個性能的最優,接下來我會給大家介紹我們的具體業務優化:畫像標籤的優化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"畫像標籤優化解決的難點也是難點二:查詢頻響要求高。這個問題當時是有兩個方案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"方案一:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"tag_type, tag_value 。tag_type 是用來記錄標籤的類型,tag_value  是用來記錄標籤的內容。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如圖所示:比如說 tag_type  是性別,tag_value  可能是男或女,bitmap 這裏就是存儲所有性別是男的用戶 id 列表。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同樣對於 tag_type 是地域、tag_value   是北京,bitmap 存儲的是所有地域在北京的用戶 id 列表。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"方案二:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大寬表,使用大寬表在一行記錄了所有的標籤,然後使用 bitmap 記錄這個標籤的用戶 id 列表。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最終我們選擇了方案二,爲什麼沒有選方案一呢 ?因爲方案一它是一個標籤對應一個用戶 bitmap,當我想查一個聯合的結果就比較耗時,比如我想查詢性別是男且區域是北京的所有用戶,這樣的話我需要取出 “男” 的用戶和 “北京“ 的用戶,兩者之間做一個交集,對吧?這種的話肯定會有計算量會有更多的時間消耗,但是如果用大寬表去做存儲的話,就可以根據用戶常用的查詢去構建一個物化視圖,當用戶的查詢(比如在北京的男性)命中了物化視圖,就可以直接去取結果,而不用再去做計算,從而降低耗時。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏還有一個知識點跟大家分享一下:在使用 Doris 的時候,一定要儘量去命中它的前綴索引跟物化視圖,這樣會大大的提升查詢效率。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.3 doris之動靜組合查詢"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/c1\/c1a46d7343be93721c18670adcfb17d2.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"好,用戶標籤講完之後繼續講下一個難點的解決方案:動靜組合查詢,對應的難點是難點三:計算複雜。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先介紹一下什麼叫動靜組合查詢:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"靜態查詢:我們定義爲用戶維度是固定的,就是可以進行預聚合的查詢爲靜態查詢。比如說男性用戶,男性用戶個就是一個固定的羣體,不管怎麼查用戶肯定不會變,就可以提前進行預聚合的。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"動態查詢:主要偏向於一些行爲,就是那種查詢跟着用戶的不同而不同。比如說查近30天收藏超過三次的用戶,或者還有可能是近30天收藏超過四次的用戶,這種的話就很隨意,用戶可能會查詢的維度會特別的多,而且也沒法沒辦法進行一個預聚合,所以我們稱之爲動態的一個查詢。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後小程序用戶分層,相比於同類型的用戶分層功能增加了用戶行爲篩選,這也是小程序產品的特點之一。比如說我們可以查近 30 天用戶支付訂單超過 30元的男性, 這種 ”近 30天用戶支付訂單超過 30元“ 的查詢是沒辦法用 bitmap 做記錄的,也沒辦法說提前計算好,只能在線去算。這種就是一個難點,就是說我怎麼用非 bitmap 表和  bitmap 做交併補集的運算,爲了解決這個問題,我們結合上面的例子把查詢拆分爲四步:我要查近30天用戶支付訂單超過30元的男性,且年齡在 20 ~30 歲的用戶(具體查詢語句參考 PPT 圖片)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"第一步我先查 20~30歲的男性用戶。"},{"type":"text","text":"因爲是比較固定,這裏可以直接查 bitmap 表。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"第二步我要查近30天用戶支付訂單超過30元的用戶。"},{"type":"text","text":"這種的話就沒辦法去查 bitmap 表了,因爲 bitmap 沒有辦法做這種聚合,只能去查行爲表。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"第三步就是要做用戶ID 跟在 線 bitmap 的一個轉化。"},{"type":"text","text":"Doris 其實已經提供了這樣的功能函數:to_bitmap,可以在線將用戶 id 轉換成 bitmap。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"第四步是求交集。"},{"type":"text","text":"就是第一步和第四步的結果求交集。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後,請大家要注意一下,整篇的核心其實是在第三步:Doris 提供了 to_bitmap 的功能,它幫我們解決了非 bitmap 表和  bitmap 聯合查詢的問題。講到這裏,我其實想給大家表現出那種Doris特別驚豔、特別帥那種感覺,但是我線下練習了好多遍都無法表演出來, 你們看我現在的表演 有點太浮誇了, 用力過猛了。誰讓我是個程序員,不是個演員呢,沒有辦法演出那種感覺、那種感情。所以我只能把我想表達的感情跟大家說出來,大家一定要懂我,對,就是那種很驚豔的感覺,大家一定要懂我~"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上是我們基於 Doris 用戶分層方案的一個講解,基於上述方案整體的性能收益是:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"95分位耗時小於一秒"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"存儲耗降低了9.67倍"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"行數優化了八倍"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"說明了基於 Doris 的用戶存儲方案還是特別有效果的, 希望我們的經驗能給大家能有所幫助。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. 如何快速產出用戶包"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/4a\/4ad3b3783ceaa3392e215e12ca4e3969.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在講一下第三部分:用戶包。這部分主要是用來解決難點四:產出用戶包要求時效性高。這個其實我們也有兩個方案:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"方案一:調度平臺 + spark。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個其實比較容易理解,因爲你要跑離線任務很容易就想到了 spark。在這個調度平臺裏面用了 DAG 圖,分三步:先產出用戶的 cuid,然後再產出用戶的 uid,最後是回調一下做一次更新。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"方案二:調度平臺 + solo。"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"執行的 DAG 圖的話就是:solo 去產出 cuid,uid,還有回調。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"solo:是百度雲提供的 Pingo 單機執行引擎,大家可以理解爲是一個類似於虛擬機的產品,這個其實是公有云:《百度智能雲》裏面已經有的功能,大家感興趣的可以去登錄百度智能雲官網 去看一下。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最終的方案選型我們是選用了 Doris,因爲 Doris 比 Spark 更快,爲啥快?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先介紹下方案一,方案一使用的是 Spark ,它存在幾個問題:比如 Yarn 調度比較耗時,有時候也會因爲隊列的資源緊張而會有延遲,所以有時候會出現一個很極端的情況就是:我產出零個用戶,也要30分鐘才能跑完,這種對用戶的體驗度非常不好。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"方案二的話就是我們利用了 Doris 的 SELECT INTO OUTFILE 產出結果導出功能,就是你查出的結果可以直接導出到 AFS,這樣的效果就是最快不到三分鐘就可以產出百萬級用戶,所以 Doris 性能在某些場景下比 Spark 要好很多。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後大家可以看到其實我這裏敘述的時候語氣依然是比較平淡的,是吧?沒有帶什麼感情,但是我其實還想表達出那種驚歎和喜悅的感情,就是 Doris 性能在某些場景下比 Spark 還要好,但是大家要懂我,我畢竟是個程序員不是一個演員,沒法表達出那種感覺,但是你們一定要懂我啊,哈哈,開個玩笑,活躍一下直播間氣氛。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"未來規劃"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/2c\/2c1e77829160c2609494817fe84e9288.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"未來的規劃:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先在產品上我們會繼續的豐富分層的應用場景,拓展關係維度豐富觸達的形式,然後探索分層和商業的結合模式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在技術上我們會從時效性豐富性跟通用性上做文章:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"時效性:我們會把交易,訂單,關注等行爲時實化"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"豐富性:我們會接入更多的用戶畫像,標籤和行爲"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通用性:我們會把全局字典插件化,然後通用到各個業務上"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這就是我們的未來規劃。後面有機會再根大家從 Doris 的架構方面跟大家介紹下 Doris 的性能爲何如此強悍。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"分享嘉賓:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"趙煜楊"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百度 | 資深研發工程師"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"負責手百小程序數據產品的工程架構工作,從0到1主持設計了精細化用戶分層系統,實現了百億級TB量級小程序用戶畫像、行爲數據秒級預估,保障了小程序私域運營的落地。具有超過6年在高可用、大數據方向的工作經驗,一直專注在數據工程架構、個性化推薦工程等工作上,對技術團隊管理也比較有經驗,目前個人專注於大數據、個性化推薦、高可用架構等技術方向。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:DataFunTalk(ID:dataFunTalk)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/_Tar1Wb0X_UxRQ452zuWiA","title":"xxx","type":null},"content":[{"type":"text","text":"基於Doris的小程序用戶增長實踐"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章