千人萬面,奇妙自見:愛奇藝短視頻推薦技術中多興趣召回技術的演變

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推薦系統的本質是信息過濾,多個信息漏斗將用戶最感興趣的內容逐步呈現在用戶面前,如圖1所示("},{"type":"link","attrs":{"href":"http:\/\/mp.weixin.qq.com\/s?__biz=MzI0MjczMjM2NA==&mid=2247492847&idx=1&sn=d8e8a853fb1f1dee938ea8dbdf05897f&chksm=e97578ccde02f1da57b19dbe327d32c14578f8457559e5aff91536a898008a3f74d5f3e1abf7&scene=21#wechat_redirect","title":null,"type":null},"content":[{"type":"text","text":"《愛奇藝短視頻推薦之粗排模型優化歷程》"}]},{"type":"text","text":")。召回階段作爲首個漏斗從多個維度將海量視頻中用戶可能感興趣的內容濾出交給後續排序技術處理,它直接決定着後續推薦結果的效果上限。本文主要介紹愛奇藝隨刻推薦團隊多興趣召回技術的發展歷程。相比於其他召回技術,多興趣召回技術能夠同時挖掘出用戶的多個潛在興趣,在個性化推薦系統中突破傳統的“千人千面”而達到“千人萬面”效果。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/35\/35515a1d93477c952f10904a791f44c8.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖1 視頻推薦系統主要流程[1]"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"技術背景:如何召回“好苗子”,打破信息繭房"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"優秀的視頻推薦系統可以精準地將視頻分發給興趣相匹配的用戶,這個過程可以類比爲優秀運動員經過層層選拔最終在世界大賽成功登頂,而召回階段則相當於運動員年少時期的初次面對的市隊選拔。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"優秀的國家隊教練固然業務水平精湛,但若沒有天賦迥異的好苗子,也難以培養出世界級冠軍選手;排序技術固然能夠通過大量特徵和精巧網絡將效果提升,但若召回的所有視頻本身質量不佳,那排序技術效果的上限將會提前鎖死。因此,國家隊教練需要多個省市的運動人才作爲選拔來源,排序技術需要多個召回源作爲待排序內容。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"談到召回技術,熟悉推薦的同學將舉出諸多策略與算法,例如策略包括考慮內容關聯的頻繁項集挖掘Apriori等、考慮用戶與內容相關性的召回itemCF等、基於協同過濾的召回SVD等;算法包括將內容變爲embedding後再進行近鄰檢索的item2vec和node2vec、應用內容理解的CDML召回以及近年來興起的GNN召回等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/82\/82f4a23138d3dce7b01d1a6297b84afc.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖2 多興趣召回主要流程[2]"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如圖2所示,多興趣召回技術類似其他召回技術都依賴着用戶過往的歷史行爲,但不同點在於多興趣召回技術可以學習到用戶的多個興趣表示,將個性化推薦的“千人千面”升級爲“千人萬面”,每一個興趣表示都能根據最近鄰搜索得到相應的視頻成爲召回源。一方面,多興趣召回技術符合多數用戶擁有不同志趣和愛好的現實情況,能夠讓推薦結果精準且豐富,能夠防止內容同質化帶來觀感疲勞;另一方面,除了挖掘用戶的已有興趣,多興趣召回技術不斷挖掘出用戶自己從未發現的潛在新興趣,防止傳統推薦算法造成的“信息繭房”現象,讓愛奇藝線上海量的文化資源呈現給用戶。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時,由於愛奇藝旗下豐富的產品矩陣,往往一個用戶會同時使用包括愛奇藝基線、隨刻、奇異果等多種產品。在多端用戶行爲混合訓練的情況下,往往能夠抽取出用戶在不同端的不同興趣、不同端用戶的共同興趣。這些興趣往往能夠幫助用戶找到自己喜愛的社區與圈子,完成產品間的滲透打通和愛奇藝產品矩陣的複合生態建設。愛奇藝短視頻推薦現在使用到的多興趣召回技術有聚類多興趣召回、MOE多興趣召回、單激活多興趣召回。本文將依次進行介紹。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"聚類多興趣召回"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"聚類多興趣召回的主要優點在於不用訓練複雜的神經網絡,只需利用線上其他深度學習的embedding即可形成多個興趣向量(例如較爲成熟的node2vec,item2vec等video embedding空間),時間和空間代價都較小。主要理論依據爲KDD2020提出的興趣聚類方法PinnerSage[3]。(是不是和PinSage名字很像,但它與圖神經網絡沒有太大關係)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"PinnerSage聚類多興趣召回是傳統ii召回基礎上結合聚類方法的新型策略。傳統的ii召回中往往有兩種做法:1,選擇用戶短期歷史行爲的每個視頻,進行多次ANN查找選出近鄰視頻,這樣的做法不僅時間成本高而且推出視頻同質化嚴重。2,將用戶短期歷史行爲的所有視頻embedding進行pooling形成代表用戶的user embedding,再進行ANN近鄰查找,這樣的方式能一定程度的融合信息減少時間空間代價,但很容易造成信息損失, pooling出的embedding如圖3所示很可能差了十萬八千里。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/70\/70c0bf7f0ecc5de75113f8437263171f.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖3"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"PinnerSage則取兩者之長,對用戶歷史行爲中的視頻進行聚類分組,pooling形成多個興趣向量。聚類既避免了多次ANN帶來的壓力,也能一定程度上避免信息損失。PinnerSage 聚類多興趣召回分爲兩步走:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"a. 聚類過程。"},{"type":"text","text":"如圖4所示,對用戶觀看過的所有視頻進行聚類操作,Pinnersage聚類採用了hierarchical clustering聚類方法,並不需要像K-Means設置初始類別數,而是首先將每一個視頻均看作一類,接下來每兩類開始合併,若合併後組內variance增加最少則可以將兩類合併爲一類,直到variance超過閾值即停止。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/36\/36828249315338fbb0a438ec8e75c0e9.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖4"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"b. 取出embedding過程。"},{"type":"text","text":"PinnerSage依然不對類內視頻embedding 取平均,而是選擇類內的一個視頻embedding作爲類(興趣簇)的代表,該視頻embedding需滿足與類內所有視頻embedding距離之和最小。再利用這些代表用戶興趣的embedding們進行ANN即可。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/7b\/7b1f0ae228961a78b0529caefa7c7532.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"聚類多興趣召回通過簡單的策略便形成了用戶多個興趣,時間代價較少。但由於依賴其他算法形成的embedding空間,學習到的多個興趣embedding很容易有偏,推出內容趨於高熱難以滿足個性化。因此,團隊繼續向深度學習領域的多興趣網絡進發。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"MOE多興趣召回"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雙塔模型是業界比較主流的召回模型,但是雙塔模型在實際場景中效果有限。因此團隊將雙塔中的用戶側的塔結構進行修改,引入類似於MOE[4]的結構,提取多個向量表示用戶潛在的興趣,從而獲得了極大提升。其中MOE是多目標學習中使用廣泛的經典結構,根據數據進行分離訓練多個專家模型,我們最終將多個專家模型的輸出作爲用戶興趣向量,通過與視頻側提取的向量分別計算內積得到最相似的一個用戶向量參與損失的計算。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/61\/615ca1203c829c31d7d52ac3116e5aa3.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖5"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MOE多塔結構如圖5所示,左邊爲用戶側MOE多塔部分,右邊爲視頻側單塔部分。模型的實現細節包括:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"a. 用戶側的輸入主要是用戶的偏好序列,包括用戶偏好的視頻id序列、上傳者id序列與內容標籤(tag)序列,序列特徵經過embedding處理與average pooling操作後得到不同的向量,拼接之後組成MOE多塔的輸入,經過MOE多塔計算後得到多個向量表示用戶潛在的多個興趣。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"b. 視頻側爲單塔結構,輸入爲用戶交互過的視頻id、上傳者id與內容標籤(tag)特徵,經過embedding提取和拼接之後使用單塔結構提取信息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"c. 在loss計算上,由於召回是從千萬級的視頻庫中尋找出用戶可能感興趣的幾百條視頻,因此實際樣本中負樣本空間十分巨大。爲了增加模型對負樣本的篩選能力和提升模型負採樣的效率,我們在模型中使用batch內負採樣,將batch內其他樣本作爲當前樣本的負樣本,同時使用focal loss損失函數來提升模型對難樣本的識別能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經過修改之後的MOE多塔模型上線之後,單召回源的點擊率和人均觀看時長得到極大提升(全端CTR提升0.64%,召回源推出視頻CTR比全端高出28%,展均播放時長比全端高出45%)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經過修改之後的MOE多塔模型上線之後,單召回源的點擊率和人均觀看時長得到極大提升。但是MOE多塔共享底層的輸入,僅僅使用簡單的DNN網絡提取不同的向量,導致多個塔之間的區分度比較低,多向量中冗餘較多難以優化;此外用戶序列特徵中實際包含的位置信息對用戶比較重要,當前模型難以利用,因此我們希望通過其他的網絡來加以利用。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"4.單激活多興趣召回"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"單激活多興趣召回從19年開始便被工業界使用,其中最繞不開的是阿里提出的MIND[3],其利用膠囊網絡對用戶序列進行動態路由收集多興趣的方法在測試集上取得爆炸效果,激起了整個工業界對多興趣網絡的探索熱情。隨刻推薦團隊也進行了探索。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.1 單激活多興趣召回初版"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於MIND等網絡的啓發,團隊進行了單激活多興趣網絡的初版探索,網絡結構如圖5所示。在MIND網絡中,採用了膠囊網絡來抓取用戶的興趣,膠囊網絡可以很好的同時捕捉觀看的序列順序信息和視頻間的相關性,但由於結構較爲複雜計算開銷較大,且觀看順序僅單個維度即可表示不需要網絡對位置信息太過敏感,因此團隊選擇transformer結構進行代替以保證訓練速度。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/0b\/0b62dac4261421b1439e9a9fd6145e29.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖6"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大致流程爲:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"a. 截取用戶觀看視頻id序列{V1,…VN}作爲sample,第N+1個視頻作爲target輸入網絡,經過video embedding層後形成embedding序列E={E1,E2,..EN}。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"b. E經過transformer構造的興趣抽取層得到多個興趣向量M,取|Mi|最大的興趣向量與target視頻的embedding進行sampled softmax loss負採樣,因此每次訓練實際上只激活一個通道的興趣向量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"c. 模型訓練好後在推理階段,取出用戶所有興趣向量,逐個進行ANN檢索得到召回結果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"初版雖然結構簡單,但上線後效果較好,極大提升消費指標、視頻覆蓋度和多樣性。然而初版也存在着不同興趣向量召回結果重複度較高、特徵較少、即時性差等問題,因此也產生了多個版本的演變。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.2 disagreement-regularzation多興趣召回"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4.2中興趣向量間無任何約束,因此容易出現興趣向量過於相似的問題,因此在損失函數上需要施加正則項。鑑於初版多興趣召回主要部分爲transformer,團隊在不改變網絡結構的情況下使用三種正則函數進行探索[4]。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/11\/11f5c4806766798b1f1953d9c2d197af.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/24\/24eb554c57248c0811e1304ebec1daf1.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a6\/a66c84e7afe3f9f9076a43ac80be4be0.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖7"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如圖7所示,分別對學習到的視頻embedding(公式1),Attention(公式2),興趣向量(公式3)進行正則化約束。在實際生產環境中發現,直接對興趣向量進行正則化約束能達到最優效果。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.3 容量動態化多興趣召回"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不同用戶往往呈現不同的興趣發散性,因此興趣向量數應該是一個彈性指標而非超參數,在4.1與4.2的基礎上,如圖8所示在網絡結構中引入興趣激活記錄表。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/0f\/0f3ca6a470cbde3de458d2bd0be0a6e4.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖8"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"訓練過程中每當用戶有任何興趣向量被激活時,記錄表均會記錄這次激活。推理階段,回溯激活表情況,將用戶未激活或激活較少的興趣向量剔除,以達到興趣數動態化的目的,從而匹配不同用戶興趣發散性存在差異的現實情況。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.4 多模態特徵多興趣召回"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4.1-4.3中,多興趣召回僅使用到視頻id特徵,學習效果依然有限,因此在後續版本的發展中,將上傳者和內容標籤(tag)融入訓練成爲主要方向。如圖9所示,爲網絡主要結構。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Transformer部分與4.1-4.3中大致相同,不同點在於訓練樣本加入上傳者和內容標籤(tag)特徵後經過embedding和pooling部分再進入transformer中。值得注意的有兩點:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":" loss部分依然只對視頻id的embedding進行負採樣(與MIND等結構不同),這樣的目的是讓視頻id的全部embedding可以進入負採樣中,而不用折中使用batch內負採樣,能夠讓最終推理階段主要使用video id embedding更加精準(推理階段ANN部分不會使用tag與uploader)。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"一個視頻往往有多個內容標籤(tag),因此在對內容標籤(tag)做embedding時需要對所有內容標籤(tag)做embedding操作後進行一次pooling。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/bb\/bb782287228f8a1de69d3ffeccd9c591.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖9"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.5小結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如4.1-4.4所示,單激活多興趣網絡進行了多次演變過程,一次次改進後的應用帶來了非常顯著的效果,全端CTR顯著提升2%,全端時長提升1.5%,人均播放提升1.5%;特別是在推出視頻的多樣性上,直接提升4%以上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時作爲一個老少皆宜的內容平臺,在愛奇藝一直存在着以家庭爲單位,不同年齡段用戶使用同一賬號的情況,因此同一賬號下的歷史行爲往往來自各個年齡階段,用戶歷史行爲的複雜性給推薦帶來了難題。而單激活多興趣網絡的興趣向量在學習過程的採樣中具隨機性、在數學呈現上具正交性,這就使得興趣向量的搜索範圍能夠召回不同年齡段所喜愛的海量視頻。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"單激活多興趣網絡現在也是學術熱點之一,希望能夠有更多的研究者提出新的idea讓推薦技術繼續大放異彩。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結與展望"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文已經大致展現了愛奇藝短視頻推薦召回技術中多興趣召回的發展情況。多興趣召回最大的亮點,在於可以抽取一個用戶的多種興趣,讓曾經“千人千面”的畫像邁入“千人萬面”的高維空間,讓推薦結果同時提升精準度和豐富度,同時也有興趣試探,避免用戶走入信息繭房。"},{"type":"text","marks":[{"type":"strong"}],"text":"同時該技術也在愛奇藝產品矩陣複合生態建設與用戶歷史行爲複雜性問題解決方案的前路上一直探索。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文也認爲多興趣召回依然有可以優化的方向:"}]},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":" 在行爲序列的選取上,大部分的多興趣策略與網絡依然只考慮到用戶的觀看歷史,如果能夠運用事件知識圖譜,將用戶在平臺上的搜索、訂閱等行爲一起納入訓練數據中,應該可以抓取用戶更多的興趣與傾向。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"在負反饋信息的處理上,多興趣召回尚無應對之策。視頻中的許多點踩、消極評論、不喜歡、取消關注等行爲尚且未融入到多興趣召回中,這些信息對指導興趣網絡的也至關重要,後期該方向將成爲重點工作。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"在用戶的靜態信息與偏好特徵的整合上,亦有很大的應用空間。這部分特徵的組合能夠很好地和排序目標對齊,提升召回源質量和排序效果上限。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"參考文獻"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[1] 2021-2-26期,如何提升鏈路目標一致性?愛奇藝短視頻推薦之粗排模型優化歷程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[2] AdityaPal, et al. PinnerSage: Multi-Modal User Embedding Framework for Recommendations at Pinterest. KDD 2020"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[3] Jiaqi Ma, et al. Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts. KDD 2018"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[4] Yukuo Cen, et al. Controllable Multi-Interest Framework for Recommendation.KDD 2020."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[5] Chao Li, et al.Multi-Interest Network with Dynamic Routing for Recommendation at Tmall. CIKM 2019."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[6] Jian Li, et al. Multi-Head Attention with Disagreement Regularization. EMNLP 2018"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:愛奇藝技術產品團隊(ID:iQIYI-TP)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/T2G8L820haEbvXgryvEWHg","title":"xxx","type":null},"content":[{"type":"text","text":"千人萬面,奇妙自見:愛奇藝短視頻推薦技術中多興趣召回技術的演變"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章