Angel推薦算法在遊戲推薦中的應用

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"導讀:","attrs":{}},{"type":"text","text":"Angel是騰訊自研的分佈式高性能的機器學習平臺,支持機器學習、深度學習、圖計算以及聯邦學習等場景。Angel的深度學習平臺已應用在騰訊的很多個場景中。本次分享爲大家介紹Angel推薦算法在遊戲推薦中的應用。主要內容包括:遊戲平臺上的遊戲推薦、Tesla平臺上的推薦算法、經典算法的線性特點、DeepFM算法的非線性特點、DeepFM應用過程。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"遊戲平臺上的遊戲推薦","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/26/26b940d43271615781f27467e968c2ab.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這張圖看到的是Steam平臺上的一個遊戲推薦的應用。Steam平臺主要是使用標籤的推薦方法,它的標籤主要是基於用戶選擇去收集的信息。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/53/53e24436b47a12a3719a9d10d78871e5.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Steam平臺的特點在於遊戲的內容比較多,深度也比較深一些。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果item比較多,而且使用用戶比較多,那就可以通過用戶來選擇一些標籤,能夠抽象出其實本來是需要協同過濾ALS的算法計算的因變量的特徵向量。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"也就是說,Steam是用人工選擇的方式,用集體智慧的方式來抽象出特徵向量。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b3/b33da44a5bd61b5b1fda47436dec59a7.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這是Wegame平臺上的一個活動。它的推薦算法並沒有使用人工抽取標籤的方式,而使用了CF算法,以及在用戶行爲數據的基礎上使用Deep FM算法。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Tesla平臺上的推薦算法","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/54/543d2c8c63865c0be8350d1b41660a73.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個是特斯拉平臺上的推薦算法。如果你要在外網試用,可以到下面的網址進行試用:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"https://cloud.tencent.com","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在試用的過程中,根據wiki文檔生成相關的模型,定義相關的參數,就可以使用了。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d9/d9f92e430aaad420d234ee49cd468f13.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"特斯拉平臺上也可以使用一些傳統的算法,就像CF-ALS算法,可以把Rank、Lambda、Alpha這幾個重要的參數,通過一個迭代的方式去選擇合適的超參數,就可以使用了。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"經典算法的線性特徵","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/78/780f9cbd530396e91d916bf5e515ea99.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果是基於內容的標籤推薦,不用集體智慧的抽象,在很多情況下就是一個主觀的先行判斷。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另一種是協同過濾——或基於物品,或基於用戶,或者是聯合的協同過濾,會存在稀疏矩陣和長尾推薦的問題。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ac/ac7111110e5f4c1d5eb3bcb11c8471c4.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此時會出現一個問題:這是基於物品點擊率比較高的協同過濾,但遊戲大部分推薦的並不是都是熱門物品。如果我們想在經典算法的基礎上對一些點擊比較少的物品進行推薦,也就是對這些長尾的物品進行推薦,如何去實現呢?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此時可以先做一個聚類,再應用一個協同過濾的算法。如果是經典的算法,有一個很大的核心的問題:怎麼去結合用戶畫像,還有物品畫像,參與到DeepFM的建立?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"DeepFM的非線性特徵","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f3/f3fdcaf8685c556f8b69c14ce1a022e5.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其實CF算法中輸入的是Item Id和 User ID,但是Deep FM在輸入UserID和Item Id之外,還可以輸入用戶或者物品特徵。另外也可以對特徵進行一個二次交叉:二次標籤的特徵組合,在本質上還是比較依賴於原來所採集的特徵,而且特徵組合也僅僅是二次的一個特徵表達。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c6/c6404900f4e8c94d72660f1c5b82a5f7.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DeepFM的各種非線性特徵,其實比較類似於CNN算法裏面,在識別分類過程中所作的權重分解。上圖是在CNN的過程中,做的一個熱力圖。其中CNN是對圖像做了一個分類。這個分類依據,是Class之前所做的權重分佈,從Deep FM的一部分的模型研發而來的一部分。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/2a/2a3cbf4caabd533443fba10f3564cc49.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Deep FM模型的random部分有很多類別特徵的。類別特徵是用人工選擇出來的categorical Features。而Deep FM在自動歸類分羣的過程中,有一個自動embedding 離散維度的推薦過程。一般我們可以根據誤差的自動調整,來調整這個特徵的維度組合。類似於推薦過程中自動聚類分羣的過程,這就解決了傳統算法的問題。如果只輸入一個User ID或者是Item ID,就無法使用用戶畫像和物品畫像,進行信息的分羣,這樣Deep FM就成爲了一個可以根據誤差進行自動調整,自動聚類分羣的推薦工具。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"DeepFM應用過程","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/10/105ac1f58bb5a13ba09f3fc01a54c475.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另一種情形則是在Deep FM的過程中進行編碼,如果有單列的類別值,如何進行、轉換成一個稀疏矩陣?一般來說可以使用手工編碼,但這個方式比較低效。這時可以使用Feature Hasher,可以自動把幾列類別值轉換成一個技術矩陣,把稀疏矩陣和由連續值組成的vector和Hash feature組成vector,合成一個Vector,再把這個Vector作爲一個Deep FM的輸入,就可以比較方便進行數據計算了。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ae/aefec1bd3642564ff230489462b8cdf9.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏面有幾個經驗:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Validation AUC與點擊率的對應關係:就是說在算法的訓練的過程中,會輸出一個Validation AUC,這個AUC很大情況下會和點擊率有對應關係。在算法的模型訓練過程中AUC比較低的話,點擊率可能比較高,但是如果AUC比較高,其點擊率一般情況下也比較高。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"特定場景下的target 數據產生的模型只能應用到特定場景下的點擊率評估:你可能會需要使用到3個採集位置。這三個位置前面的Target數據就是用戶在item下面點擊數據,如果你運用3個模型,每一個模型應用在3個特定位置下,應該會得到一個更好的效果,而儘量不要把3個位置下的Target的數據合併爲一個Target數據。因爲這和Deep FM的特徵的抽象能力相關,也就是說Deep FM有比較強的特徵抽取能力,Target數據會反向傳遞到特徵的採集過程中,會比較依賴於這個特徵的場景。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Deep FM 的參數調整比較簡單,因爲一旦收斂,AUC 的提升跟訓練數據集的大小最密且:因爲增加更多的訓練數據,AUC提升會比較明顯一點。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據精排需要過濾已擁有,已玩過,並且根據業務需要重排數據,比如重點展示新遊,熱銷等。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"FM 需要輸入 user_id 作爲訓練數據:Deep FM是從FM演化過來的,FM的訓練和使用的過程中,一般會是要用User Id做一個輸入,真正FM的User ID會在大數據情況下形成一個大的標籤。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"FM 的user_id 在大數據環境下會形成很大的標籤數據,例如:如果有1000萬用戶,一臺一個輸入進去,在水平層這邊特徵就會比較大。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"FM 的user_id 會形成冷啓動的瓶頸,並且導致更新頻率受到限制:傳統的FM算法和CM算法,是一定要輸入User ID的,那樣就會產生一個冷啓動的問題。也就是說,如果用戶沒有參與到每天的活動中,其實是沒法在預測階段做輸入的。但是Deep FM則不同,如果對自己的特徵比較自信,也就是說你的輸入除了User ID之外,還有很多特徵。因爲Deep FM還會有比較強的特徵抽取的能力,也就是自動特徵組合能力。此時可以不輸入User ID,只輸入用戶的特徵或者是物品的特徵,就會避免冷啓動問題,而且更新頻率會更快。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DeepFM 的特徵輸入可以替代user_id 的輸入:比如在一個活動剛開始沒多久,收集了一定數量的數據的情況下,特徵是比較豐滿的,就可以對用戶和User ID,Item ID等沒有接觸到的數據進行一個預測。","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/4b/4baa92592d220e5cf3de32ae55947894.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"以上就是今天的分享,謝謝大家。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"王培軍,騰訊高級工程師,主要負責wegame平臺的廣告系統和深度學習與系統的結合探索。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:","attrs":{}},{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s/jtUh-f9zrTtLdW4zc6w8FQ","title":""},"content":[{"type":"text","text":"Angel推薦算法在遊戲推薦中的應用","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章