特徵類型 定義 特徵舉例 |
5年迭代5次,抖音推薦特徵體系演進歷程
{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2021年,字節跳動旗下產品總 MAU 已超過19億。在以抖音、今日頭條、西瓜視頻等爲代表的產品業務背景下,強大的推薦系統顯得尤爲重要。Flink 提供了非常強大的 SQL 模塊和有狀態計算模塊。目前在字節推薦場景,實時簡單計數特徵、窗口計數特徵、序列特徵已經完全遷移到 Flink SQL 方案上。結合 Flink SQL 和 Flink 有狀態計算能力,我們正在構建下一代通用的基礎特徵計算統一架構,期望可以高效支持常用有狀態、無狀態基礎特徵的生產。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"業務背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/54\/54e74d54df4a52d8b09d247246256b44.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"對於今日頭條、抖音、西瓜視頻等字節跳動旗下產品,基於 Feed 流和短時效的推薦是核心業務場景。而推薦系統最基礎的燃料是特徵,高效生產基礎特徵對業務推薦系統的迭代至關重要。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"主要業務場景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/83\/831d192142aa8810e8d1eff597720a02.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"抖音、火山短視頻等爲代表的短視頻應用推薦場景,例如 Feed 流推薦、關注、社交、同城等各個場景,整體在國內大概有6億+規模 DAU;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"頭條、西瓜等爲代表的 Feed 信息流推薦場景,例如 Feed 流、關注、子頻道等各個場景,整體在國內大概有1.5億+規模 DAU;"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"業務痛點和挑戰"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/25\/2550175b5c2e7057750a74f7ae7504e0.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"目前字節跳動推薦場景基礎特徵的生產現狀是“"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"百花齊放"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"”。離線特徵計算的基本模式都是通過消費 Kafka、BMQ、Hive、HDFS、Abase、RPC 等數據源,基於 Spark、Flink 計算引擎實現特徵的計算,而後把特徵的結果寫入在線、離線存儲。各種不同類型的基礎特徵計算散落在不同的服務中,缺乏業務抽象,帶來了較大的運維成本和穩定性問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"而更重要的是,缺乏統一的基礎特徵生產平臺,使業務特徵開發迭代速度和維護存在諸多不便。如業務方需自行維護大量離線任務、特徵生產鏈路缺乏監控、無法滿足不斷髮展的業務需求等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/6d\/6dd6cad8922304cfab6555531c87c3ee.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 在字節的業務規模下,構建統一的實時特徵生產系統面臨着較大挑戰,主要來自四個方面:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"巨大的業務規模"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":":抖音、頭條、西瓜、火山等產品的數據規模可達到日均 PB 級別。例如在抖音場景下,晚高峯 Feed 播放量達數百萬 QPS,客戶端上報用戶行爲數據"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"高達數千萬 IOPS。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"業務方期望在任何時候,特徵任務都可以做到不斷流、消費沒有 lag 等,這就要求特徵生產具備非常高的穩定性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"較高的特徵實時化要求"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":":在以直播、電商、短視頻爲代表的推薦場景下,爲保證推薦效果,實時特徵離線生產的時效性需實現常態穩定於分鐘級別。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"更好的擴展性和靈活性"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":":隨着業務場景不斷複雜,特徵需求更爲靈活多變。從統計、序列、屬性類型的特徵生產,到需要靈活支持窗口特徵、多維特徵等,業務方需要特徵中臺能夠支持逐漸衍生而來的新特徵類型和需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"業務迭代速度快"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":":特徵中臺提供的面向業務的DSL需要足夠場景,特徵生產鏈路儘量讓業務少寫代碼,底層的計算引擎、存儲引擎對業務完全透明,徹底釋放業務計算、存儲選型、調優的負擔,徹底實現實時基礎特徵的規模化生產,不斷提升特徵生產力;"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"迭代演進過程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在字節業務爆發式增長的過程中,爲了滿足各式各樣的業務特徵的需求,推薦場景衍生出了衆多特徵服務。這些服務在特定的業務場景和歷史條件下較好支持了業務快速發展,大體的歷程如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/aa\/aaaf625e86241ce935ea3b630544f6fb.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"推薦場景特徵服務演進歷程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"在這其中2020年初是一個重要節點,我們開始在特徵生產中引入 Flink SQL、Flink State 技術體系,逐步在計數特徵系統、模型訓練的樣本拼接、窗口特徵等場景進行落地,探索出新一代特徵生產方案的思路。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"新一代系統架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"結合上述業務背景,我們基於 Flink SQL 和 Flink 有狀態計算能力重新設計了新一代實時特徵計算方案。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"新方案的定位是:解決基礎特徵的計算和在線 Serving,提供更加抽象的基礎特徵業務層 DSL。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在計算層,我們基於 Flink SQL 靈活的數據處理表達能力,以及 Flink State 狀態存儲和計算能力等技術,支持各種複雜的窗口計算。極大地縮短業務基礎特徵的生產週期,提升特徵產出鏈路的穩定性。新的架構裏,我們將"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"特徵生產的鏈路分爲數據源抽取\/拼接、狀態存儲、計算三個階段,"},{"type":"text","marks":[{"type":"strong"}],"text":"Flink SQL完成特徵數據的抽取和流式拼接,Flink State 完成特徵計算的中間狀態存儲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"有狀態特徵是非常重要的一類特徵,其中最常用的就是帶有各種窗口的特徵,例如統計最近5分鐘視頻的播放 VV 等。對於窗口類型的特徵在字節內部有一些基於存儲引擎的方案,整體思路是“"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"輕離線重在線"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"”,即把窗口狀態存儲、特徵聚合計算全部放在存儲層和在線完成。離線數據流負責基本數據過濾和寫入,離線明細數據按照時間切分聚合存儲(類似於 micro batch),底層的存儲大部分是 KV 存儲、或者專門優化的存儲引擎,在線層完成複雜的窗口聚合計算邏輯,每個請求來了之後在線層拉取存儲層的明細數據做聚合計算。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們新的解決思路是“"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"輕在線重離線"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"”,即把比較重的"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"時間切片明細數據"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"狀態存儲和窗口聚合計算全部放在離線層。窗口結果聚合通過"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"離線窗口觸發機制"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"完成,把特徵結果"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"推到"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在線 KV 存儲。在線模塊非常輕量級,只負責簡單的在線 serving,極大地簡化了在線層的架構複雜度。在離線狀態存儲層。我們主要依賴 Flink 提供的"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"原生狀態存儲引擎 RocksDB"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",充分利用離線計算集羣本地的 SSD 磁盤資源,極大減輕在線 KV 存儲的資源壓力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"對於長窗口的特徵(7天以上窗口特徵),由於涉及 Flink 狀態層明細數據的回溯過程,Flink Embedded 狀態存儲引擎沒有提供特別好的外部數據回灌機制(或者說不適合做)。因此對於這種“"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"狀態冷啓動"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"”場景,我們引入了中心化存儲作爲底層狀態存儲層的存儲介質,整體是 "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"Hybrid "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"架構。例如7天以內的狀態存儲在本地 SSD,7~30天狀態存儲到中心化的存儲引擎,離線數據回溯可以非常方便的寫入中心化存儲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"除窗口特徵外,這套機制同樣適用於其他類型的有狀態特徵(如序列類型的特徵)。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"實時特徵分類體系"}]},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.