矩陣分解推薦算法(十八)

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"寫在前面:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大家好,我是強哥,一個熱愛分享的技術狂。目前已有 12 年大數據與AI相關項目經驗, 10 年推薦系統研究及實踐經驗。平時喜歡讀書、暴走和寫作。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業餘時間專注於輸出大數據、AI等相關文章,目前已經輸出了40萬字的推薦系統系列精品文章,今年 6 月底會出版「構建企業級推薦系統:算法、工程實現與案例分析」一書。如果這些文章能夠幫助你快速入門,實現職場升職加薪,我將不勝歡喜。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"想要獲得更多免費學習資料或內推信息,一定要看到文章最後喔。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"內推信息","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你正在看相關的招聘信息,請加我微信:liuq4360,我這裏有很多內推資源等着你,歡迎投遞簡歷。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"免費學習資料","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你想獲得更多免費的學習資料,請關注同名公衆號【數據與智能】,輸入“資料”即可!","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"學習交流羣","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你想找到組織,和大家一起學習成長,交流經驗,也可以加入我們的學習成長羣。羣裏有老司機帶你飛,另有小哥哥、小姐姐等你來勾搭!加小姐姐微信:epsila,她會帶你入羣。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作者在《協同過濾推薦算法》中介紹了user-based和item-based協同過濾算法,這類協同過濾算法是基於鄰域的算法(也稱爲基於內存的協同過濾算法),該算法不需要模型訓練,基於非常樸素的“物以類聚”、“人以羣分”的思想就可以爲用戶生成推薦結果。還有一類基於隱因子(模型)的協同過濾算法也非常重要,這類算法中最重要的代表就是本章我們要講的矩陣分解算法。矩陣分解算法是2006年Netflix推薦大賽獲獎的核心算法,在整個推薦系統發展史上具有舉足輕重的地位,對促進推薦系統的大規模發展及工業應用功不可沒。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本章我們會詳細介紹矩陣分解算法的方方面面。我們會從矩陣分解算法的核心思想、矩陣分解的算法原理、矩陣分解算法的求解方法、矩陣分解算法的拓展與優化、近實時矩陣分解算法、矩陣分解算法的應用場景、矩陣分解算法的優缺點等7個方面來講解矩陣分解算法。希望通過本章的學習,讀者可以很好地瞭解矩陣分解的算法原理與工程實現,並且具備自己動手實踐矩陣分解算法的能力,可以嘗試將矩陣分解算法應用到推薦業務中。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8.1 矩陣分解推薦算法的核心思想","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們在《協同過濾推薦算法》這一章中講過,用戶操作行爲可以轉化爲如下的用戶行爲矩陣。其中","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/13/1312e517bedf6d1405acbb2feb9c5383.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是用戶i 對標的物 j 的評分,如果是隱式反饋,值爲0或者1(隱式反饋可以通過一定的策略轉化爲得分,具體參考《協同過濾推薦算法》這一章中的介紹),本文我們主要用顯示反饋(用戶的真實評分)來講解矩陣分解算法,對於隱式反饋,我們會在8.4.5中專門講解和說明。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/39/39412800c70010ff6bd83f9f772e08dd.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖1:用戶對標的物的操作行爲矩陣","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/11/1148257bfcad58080e27656dd43acc80.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"矩陣分解算法是將用戶評分矩陣","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分解爲兩個矩陣","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/3f/3fbd292b5cac64fc8bd79ce194f5d66a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"、","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8b/8b798e3723ba8003d05c551e21f79596.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的乘積。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/55/5580842bb96c8e60046d4767f58d7f7e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/3f/3fbd292b5cac64fc8bd79ce194f5d66a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"代表的用戶特徵矩陣,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8b/8b798e3723ba8003d05c551e21f79596.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"代表標的物特徵矩陣。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"某個用戶對某個標的物的評分,就可以採用矩陣","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/3f/3fbd292b5cac64fc8bd79ce194f5d66a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對應的行(該用戶的特徵向量)與矩陣","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8b/8b798e3723ba8003d05c551e21f79596.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對應的列(該標的物的特徵向量)的乘積。有了用戶對標的物的評分就很容易爲用戶做推薦了。具體,可以採用如下方式爲用戶做推薦:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先可以將用戶特徵向量","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a5/a53cd447583a2b4024907fca558ab1d7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"乘以標的物特徵矩陣","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8b/8b798e3723ba8003d05c551e21f79596.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",最終得到用戶對每個標的物的評分","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/dc/dcc52ec55042c11127e72fa8503f0ad3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/0e/0ee07a6d19bb3bf1c29ebcc7dfb926a9.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖2:爲用戶計算所有標的物評分","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"得到用戶對標的物的評分","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/dc/dcc52ec55042c11127e72fa8503f0ad3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"後,從該評分中過濾掉用戶已經操作過的標的物,針對剩下的標的物得分做降序排列取topN推薦給用戶。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"矩陣分解算法的核心思想是將用戶行爲矩陣分解爲兩個低秩矩陣的乘積,通過分解,我們分別將用戶和標的物嵌入到了同一個k維的向量空間(k一般很小,幾十到上百),用戶向量和標的物向量的內積代表了用戶對標的物的偏好度。所以,矩陣分解算法本質上也是一種","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"嵌入方法","attrs":{}},{"type":"text","text":"(我們會在第11章中介紹嵌入方法在推薦系統中的應用)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面提到的k維向量空間的每一個維度是","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"隱因子","attrs":{}},{"type":"text","text":"(","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"latent factor","attrs":{}},{"type":"text","text":"),之所以叫隱因子,是因爲每個維度不具備與現實場景對應的具體的可解釋的含義,所以矩陣分解算法也是一類隱因子算法。這k個維度代表的是某種行爲特性,但是這個行爲特性又是無法用具體的特徵解釋的,從這點也可以看出,矩陣分解算法的可解釋性不強,我們比較難以解釋矩陣分解算法爲什麼這麼推薦。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"矩陣分解的目的是通過機器學習的手段將用戶行爲矩陣中缺失的數據(用戶沒有評分的元素)填補完整,最終達到可以爲用戶做推薦的目的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"講完了矩陣分解算法的核心思路,那麼我們怎麼利用機器學習算法來對矩陣進行分解呢?這就是下節要講的主要內容。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8.2 矩陣分解推薦算法的算法原理","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前面只是很形式化地描述了矩陣分解算法的核心思想,本節我們來詳細講解怎麼將矩陣分解問題轉化爲一個機器學習問題,從而方便我們訓練機器學習模型、求解該模型,具備最終爲用戶做推薦的能力。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假設所有用戶有評分的","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/71/71c8b22e040c900777cd79d4e87b7f3e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對(","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ce/ce518b8a1c9c6904fabfef47edade938.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"代表用戶,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/11/111d380bcea903d6659fe965616f0437.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"代表標的物)組成的集合爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f5/f5fde0280c0e5db9eadae1c57b8a4807.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a0/a0accb4e8eb47dc9b3fb9b69db38ace7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",通過矩陣分解將用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ce/ce518b8a1c9c6904fabfef47edade938.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"和標的物","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/11/111d380bcea903d6659fe965616f0437.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"嵌入k維隱式特徵空間的向量分別爲:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/70/708d6ea4c79db612d65761165916cc88.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/4c/4c584b78affde91796a557a760dd2f6d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ce/ce518b8a1c9c6904fabfef47edade938.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對標的物","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/11/111d380bcea903d6659fe965616f0437.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的預測評分爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/22/22bf162dda4db318e968faf18bf1db49.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",真實值與預測值之間的誤差爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ee/ee6a862518c0a9f708f7cf6b6ac17533.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"。如果預測得越準,那麼","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/21/21d7abd3eb9216d398a7c436fd127227.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"越小,針對所有用戶評分過的","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/71/71c8b22e040c900777cd79d4e87b7f3e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對,如果我們可以保證這些誤差之和儘量小,那麼有理由認爲我們的預測是精準的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了上面的分析,我們就可以將矩陣分解轉化爲一個機器學習問題。具體地說,我們可以將矩陣分解轉化爲如下等價的求最小值的最優化問題。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/50/50d4841f42c82189713d3521f9621507.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"公式1:矩陣分解等價的最優化問題","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f8/f852a6f3bdc350feb7b1f90acdc20830.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是超參數,可以通過交叉驗證等方式來確定,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fa/fa455ea8a1b497ebeffd8a7a7383a47e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是正則項,避免模型過擬合。通過求解該最優化問題,我們就可以獲得用戶和標的物的特徵嵌入(用戶的特徵嵌入","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e8/e84edc0aa2115b30465d7834b69447a3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",就是上一節中用戶特徵矩陣","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/3f/3fbd292b5cac64fc8bd79ce194f5d66a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的行向量,同理,標的物的特徵嵌入","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b8/b874bdcff1e57191e9eed95f7ae1b917.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"就是標的物特徵矩陣","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8b/8b798e3723ba8003d05c551e21f79596.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的列向量),有了特徵嵌入,就可以爲用戶做個性化推薦了。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼剩下的問題是怎麼求解上述最優化問題了,這是下一節主要講解的內容。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8.3 矩陣分解推薦算法的求解方法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於上一節講到的最優化問題,在工程上一般有兩種求解方法,SGD(","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"S","attrs":{}},{"type":"text","text":"tochastic ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"G","attrs":{}},{"type":"text","text":"radient ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"D","attrs":{}},{"type":"text","text":"escent)和ALS(","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"A","attrs":{}},{"type":"text","text":"lternating ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"L","attrs":{}},{"type":"text","text":"east ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"S","attrs":{}},{"type":"text","text":"quares)。下面我們分別講解這兩種方法的實現原理。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8.3.1 利用SGD來求解矩陣分解","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假設用戶","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ce/ce518b8a1c9c6904fabfef47edade938.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對標的物","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/11/111d380bcea903d6659fe965616f0437.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的評分爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a3/a3d6b263791dde5f87d45593791dab5d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a2/a25d338e029e8eef5d6725a8c4dbba31.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"嵌入k維隱因子空間的向量分別爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/38/38f352da8d474db2ccdf03db6f792fd6.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",我們定義真實評分和預測評分的誤差爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/95/95cf07ef33da381ecf38c036e77d4826.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",公式如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/0a/0a55153c9e0889135a715e431d5114f7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們可將公式1寫爲如下函數","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1a/1af54aa1d57b99079976e935403dc852.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ca/ca481d452af2d8295d2e88ddd30e28cf.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/38/38f352da8d474db2ccdf03db6f792fd6.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"求偏導數,具體計算如下:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7a/7ae4162bfaec8fb0cb25c35012c5046c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/76/76de5d9af1233d9e5cf68b54f26b0a89.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了偏導數,我們沿着導數(梯度)相反的方向更新","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/38/38f352da8d474db2ccdf03db6f792fd6.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",最終我們可以採用如下公式來更新","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/38/38f352da8d474db2ccdf03db6f792fd6.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/9b/9b3a78b3b7de657deb6a79445ec5ea2d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ee/ee9feb4c02c7e3cdb03d530816ae422b.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上式中","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d1/d1d5672a1e62c62c007b9fcb4c7a3e1c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲步長超參數,也稱爲學習率(導數前面的係數2可以吸收到參數","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d1/d1d5672a1e62c62c007b9fcb4c7a3e1c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中),取大於零的較小值。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/18/18e9e81eb3976c3e3a9af372bb341d9b.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"先可以隨機取值,通過上述公式不斷更新","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/38/38f352da8d474db2ccdf03db6f792fd6.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",直到收斂到最小值(一般是局部最小值),最終求得所有的","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/38/38f352da8d474db2ccdf03db6f792fd6.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SGD方法一般可以快速收斂,但是對於海量數據的情況,單機無法承載這麼大的數據量,所以在單機上是無法或者在較短的時間內無法完成上述迭代計算的,這時我們可以採用下面的ALS方法來求解,該方法可以非常容易地進行分佈式拓展。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8.3.2 利用ALS來求解矩陣分解","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ALS(","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"A","attrs":{}},{"type":"text","text":"lternating ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"L","attrs":{}},{"type":"text","text":"east ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"S","attrs":{}},{"type":"text","text":"quares)方法是一個高效的求解矩陣分解的算法,目前Spark Mllib中的協同過濾算法就是基於ALS求解的矩陣分解算法,它可以很好地拓展到分佈式計算場景,輕鬆應對大規模訓練數據的情況(參考文獻6中有ALS分佈式實現的詳細說明)。下面對ALS算法原理及特點做一個簡單介紹。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ALS算法的原理基本就是它的名字表達的意思,通過交替優化求得極小值。一般過程是先固定","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e8/e84edc0aa2115b30465d7834b69447a3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",那麼公式1就變成了一個關於","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b8/b874bdcff1e57191e9eed95f7ae1b917.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的二次函數,可以作爲最小二乘問題來解決,求出最優的","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/02/02488dc915824ed459b6222446831c8d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"後,固定","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/02/02488dc915824ed459b6222446831c8d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",再解關於","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e8/e84edc0aa2115b30465d7834b69447a3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的最小二乘問題,交替進行直到收斂。對工程實現有興趣的讀者可以參考Spark ALS算法的源碼。相比SGD算法,ALS算法有如下兩個優勢。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"(1) 可以並行化處理","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從上面","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e8/e84edc0aa2115b30465d7834b69447a3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"、","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b8/b874bdcff1e57191e9eed95f7ae1b917.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的更新公式中可以看到,當固定","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e8/e84edc0aa2115b30465d7834b69447a3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"後,迭代更新","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b8/b874bdcff1e57191e9eed95f7ae1b917.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"時每個","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b8/b874bdcff1e57191e9eed95f7ae1b917.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"只依賴自己,不依賴於其他的標的物的特徵向量,所以可以將不同的","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b8/b874bdcff1e57191e9eed95f7ae1b917.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的更新放到不同的服務器上執行。同理,當","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b8/b874bdcff1e57191e9eed95f7ae1b917.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"固定後,迭代更新","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e8/e84edc0aa2115b30465d7834b69447a3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"時每個","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e8/e84edc0aa2115b30465d7834b69447a3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"只依賴自己,不依賴於其他用戶的特徵向量,一樣可以將不同用戶的更新公式放到不同的服務器上執行。Spark的ALS算法就是採用這樣的方式做到並行化的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"(2) 對於隱式特徵問題比較合適","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶真正的評分是很稀少的,所以利用隱式行爲是更好的選擇(其實也是不得已的選擇)。當利用了隱式行爲,那麼用戶行爲矩陣就不會那麼稀疏了,即有非常多的","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/71/71c8b22e040c900777cd79d4e87b7f3e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對是非空的,計算量會更大,這時採用ALS算法是更合適的,因爲固定","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e8/e84edc0aa2115b30465d7834b69447a3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"或者","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b8/b874bdcff1e57191e9eed95f7ae1b917.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",讓整個計算問題更加簡單,容易求目標函數的極值。讀者可以閱讀參考文獻5,進一步瞭解隱式反饋利用ALS算法實現的原因及細節(Spark MLlib中的ALS算法即是參考該論文來實現的)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8.4 矩陣分解推薦算法的拓展與優化","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前面幾節對矩陣分解的原理及求解方法進行了介紹,我們知道矩陣分解算法是一個非常容易理解並易於分佈式實現的算法。不光如此,矩陣分解算法的框架還是一個非常容易拓展的框架,可以整合非常多的其他信息和特性到該框架之下,從而豐富模型的表達空間,提升預測的準確度。本節我們就來總結和梳理一下矩陣分解算法可以進行哪些拓展與優化。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8.4.1 整合偏差(bias)項","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在8.2節,用戶u對標的物v的評分採用公式","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/22/22bf162dda4db318e968faf18bf1db49.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"來預測,但是不同的人對標的物的評價可能是不一樣的,有的人傾向於給更高的評分,而有的人傾向於給更低的評分。對於同一個標的物,也會受到外界其他信息的干擾,影響人們對它的評價(比如視頻,可能由於主演的熱點事件導致該視頻突然變火),這兩種情況是由於用戶和標的物引起的偏差。我們可以在這裏引入Bias項,將評分表中觀察到的值分解爲4個部分:全局均值(global average),標的物偏差(item bias),用戶偏差(user bias)和用戶標的物交叉項(user-item interaction)。這時,我們可以用如下公式來預測用戶u對標的物v的評分:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f3/f3df7ca5509e187b43ed1442e91674d9.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c8/c83d3e5ce9c166cb7946b7651ed619cf.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是全局均值,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d7/d70e58c5f7a9f52ee3dc67b20d86a180.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是標的物偏差,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d2/d2bd218f32640258d8107642145a5b16.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是用戶偏差,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/2e/2ec85113132e8135363b89d7d49306d3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是用戶與標的物交叉項。那麼最終的優化問題就轉化爲:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8f/8fc8110c3ce582c44cbf4a7eb171dced.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該優化問題同樣可以採用SGD或者ALS算法來優化,該方法在開放數據集及工業實踐上都被驗證比不整合Bias的方法有更好的預測效果(見參考文獻8)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8.4.2 增加更多的用戶信息輸入","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於用戶一般只對很少的標的物評分,導致評分過少,可能無法給該用戶做出較好的推薦,這時可以通過引入更多的信息來緩解評分過少的問題。具體來說,我們可以整合用戶隱式反饋(收藏、點贊、分享等)和用戶人口統計學信息(年齡、性別、地域、收入等)到矩陣分解模型中。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於隱式反饋信息,我們用","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/3b/3b9639a4b8a957022fe576cf1c2176a4.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"來表示用戶有過隱式反饋的標的物集合。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/2f/2fd842e15ef23d3aab20a7efc046553e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/27/27704b869954ba14a10ee2629be6a0a6.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是用戶對標的物v的隱式反饋的嵌入特徵向量(這裏爲了簡單起見,我們不區分用戶的各種隱式反饋,只要用戶做了一次隱式反饋,認爲有隱式反饋,即是採用布爾代數的方式來處理隱式反饋)。那麼對用戶所有的隱式反饋","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/3b/3b9639a4b8a957022fe576cf1c2176a4.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",累計的特徵貢獻爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e7/e768781c179336042788b0f81063b0ab.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們可以對上式進行如下的歸一化處理","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/08/08005bd628387ba5f2721fd0b63b9f38.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於用戶人口統計學信息,假設","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/56/5690eb4c4e91ddb9491a3b7eea907112.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是用戶的所有人口統計學屬性構成的集合,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e7/e738e49ad0fd632a1463f884f9c885dc.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/0c/0c5245dd3a20e9f196f576cd8a2f4e1a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是屬性","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e3/e3462e37dca54767653c18038dcb59c5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在嵌入特徵向量空間的表示。那麼用戶u所有的人口統計學信息可以綜合表示爲","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/0b/0b94e65e0d5487af5da8b34308365038.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最終整合了用戶隱式反饋和人口統計學信息後(包括偏差項)的用戶預測公式可以表示爲:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/9e/9eb4348f73f44a6361d82ea7ec331b78.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同樣地,我們可以寫出最終的優化目標函數。由於公式太長,這裏不寫出來了。該模型也可以用SGD和ALS算法來求解。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8.4.3 整合時間因素","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"到目前爲止,我們的模型都是靜態的。實際上,用戶的偏好、用戶對標的物的評分趨勢、以及標的物的受歡迎程度都是隨着時間變化的(讀者可以閱讀參考文獻11,對怎麼在協同過濾中整合時間因素有更深入的瞭解)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"拿電影來說,用戶可能原來喜歡愛情類的電影,後面可能會轉而喜歡科幻喜劇類電影,所以我們用包含時間的","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fb/fb06c73496049866143c9634277374f4.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"來表示用戶的偏好特性向量。用戶開始對某個視頻偏向於打高分,經過一段時間後,用戶看的電影多了起來,用戶的審美越來越挑剔,所以一般不會再對一個電影打很高的分數了,除非他覺得真的特別好,因此,我們可以用包含時間的","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e2/e2bab94947c73018cad117c5b7a1f097.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"來表示用戶的偏差隨着時間而變化。對於標的物偏差也一樣,一個電影可能開始不是很火,但是如果它的主演後面演了一部非常火的電影,也會將將原來的電影熱度帶到一個新的高度。比如,去年比較火的李現演的《親愛的,熱愛的》,導致李現人氣高漲,他原來演的《南方有喬木》的百度搜索指數在《親愛的,熱愛的》播出期間高漲(見下面圖3)。因此,我們可以用包含時間的","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6f/6ffa6d118a8305334f2a0ef887cdbc75.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"來表示標的物偏差隨着時間的變化而變化的趨勢。標的物本身的特徵","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b8/b874bdcff1e57191e9eed95f7ae1b917.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",我們可以認爲是穩定的,它代表的是標的物本身的固有屬性或者品質,所以不會隨着時間而變化。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1c/1c4533cb6ddea80806b3b83f82072fef.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖3:《南方有喬木》在《親愛的,熱愛的》播出期間(2019.07.09播出)百度搜索指數高漲","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於上面的分析,我們最終的預測用戶評分的公式整合時間因素後可以表達爲","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/9c/9cb0c2f41c5fd0c1157088666a290814.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整合時間因素的模型效果是非常好的,具體可以閱讀參考文獻8進一步瞭解。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8.4.4 整合用戶對評分的置信度","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一般來說,用戶對不同標的物的評分不是完全一樣可信的,可能會受到外界其他因素的影響,比如某個視頻播出後,主播發生了熱點事件,肯定會影響用戶對該視頻的評價,節假日,特殊事件也會影響用戶的評價。對於隱式反饋,一般我們用0和1來表示用戶是否喜歡該標的物,多少有點絕對,更好的方式是引入一個喜歡的概率/置信度,用戶對該標的物操作次數越多、時間越長、付出越大,相應的置信度也越大。因此,我們可以在用戶對標的物的評分中增加一個置信度的因子","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/02/022037b1f701d22ed1a38a8ae7626922.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",那麼最終的優化公式就變爲:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c8/c8543a749ed9daa57e724eb83f1668c9.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8.4.5 隱式反饋","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"參考文獻5中有對隱式反饋矩陣分解算法的詳細介紹,這裏對一些核心點做講解,讓讀者對隱式反饋有一個比較明確的認知。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用二元變量","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/10/10a10151ea9c4855e7148fdb24e8ef96.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表示用戶u對標的物的偏好,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/bc/bcc9c60235f0915d1502422fda3fec5c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"=1表示用戶u對標的物v有興趣,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/10/10a10151ea9c4855e7148fdb24e8ef96.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"=0表示對標的物v無興趣。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a3/a3d6b263791dde5f87d45593791dab5d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是用戶u對標的物的隱式反饋,如觀看視頻的時長,點擊次數等等。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a3/a3d6b263791dde5f87d45593791dab5d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8e/8e6a9abb7e789298c73e5d266a716be3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的關係見下面公式。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/48/486271f24f45c257b3e211b2886ec835.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a3/a3d6b263791dde5f87d45593791dab5d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"越大,有理由認爲用戶對標的物興趣的置信度越高,比如一個文章讀者看了好幾篇,肯定比看一遍更能反映出讀者對這篇文章的喜愛。具體可以用下面的公式來衡量用戶u對標的物v的置信度","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/02/022037b1f701d22ed1a38a8ae7626922.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/45/4562ac8aae995aa6e1070959720053d6.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上式中","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/02/022037b1f701d22ed1a38a8ae7626922.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"代表用戶u對標的物v的偏好置信度,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f4/f431eacbfb294ca5ab9d703449e34c5c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是一個超參數,論文中作者建議取","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fb/fba143833ed59bf9896b5dc1a926b378.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",效果比較好。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於隱式反饋,求解矩陣分解可以採用如下的公式,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/02/022037b1f701d22ed1a38a8ae7626922.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"即是置信度,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/10/10a10151ea9c4855e7148fdb24e8ef96.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"定義如上面的公式。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c5/c5a8f5b55f95ab8a7b9c5192b9cf83a6.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上述隱式反饋算法邏輯將用戶的操作","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a3/a3d6b263791dde5f87d45593791dab5d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分解爲置信度","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/02/022037b1f701d22ed1a38a8ae7626922.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"和偏好","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/10/10a10151ea9c4855e7148fdb24e8ef96.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"能夠更好地反映隱式行爲的特徵,並且從實踐上可以大幅提升預測的準確度。同時,通過該分解,利用代數上的一些技巧及該模型的巧妙設計,該算法的時間複雜度與用戶操作行爲總次數線性相關,不依賴於用戶數和標的物數,因此非常容易並行化(讀者可以閱讀參考文獻5瞭解更多技術實現細節)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隱式反饋也有一些缺點,不像明確的用戶評分,無法很好地表達負向反饋,用戶購買一個商品可能是作爲禮物送給別人的,他自己可能不喜歡這個商品,用戶觀看了某個視頻,有可能是進入視頻詳情頁時是自動起播的(產品故意這樣設計的,提升用戶體驗,同時也增加廣告曝光的可能),這些行爲是包含很多噪音的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8.4.6 整合用戶和標的物metadata信息","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"參考文獻9給出了一類整合用戶和標的物metadata信息的矩陣分解算法,該算法可以很好地處理用戶和標的物冷啓動問題,在同等條件下會比單獨的內容推薦或者矩陣分解算法效果要更好,該算法在全球時尚搜索引擎Lyst真實推薦場景下得到驗證。我們在下面簡單介紹一下該算法的思路。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6e/6e03dd0e44787128dcd8397e39e9503f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表示所有用戶的集合,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/03/0390893260f86938090dbec37c92463a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表示所有標的物的集合。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a5/a54f6e03f048be84429136b0881e223e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表示用戶特徵集合(年齡、性別、收入等),","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/4a/4a4c5c8906fc7c5fc88661e308fca9a3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表示標的物特徵集合(產地、價格等)。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/35/35c24ee8b11e0dc2349f88a901a41a40.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"、","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/41/41986ed228e72de4479ee8a1b6f52d21.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分別表示用戶對標的物的正負反饋集合。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/04/04483a6b0068a08e9ee6abbb0e8dbe59.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是用戶u的特徵表示(每個用戶用一系列特徵來表示)。同理,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/95/95d1e9c7cda19e49d11b98cf546e28b7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表示標的物i的特徵集合。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於每個特徵","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/06/06c62746879c594d10d02dcdc5efe8e7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",我們用","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/04/0415501ec524ded8cd776bf1331c9afd.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"和","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e4/e44b4d09507031144bd12845e26ebfb4.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分別表示用戶和標的物嵌入到d維的特徵空間的特徵向量。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/72/72893f162891f0f0bed62b5e17450910.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"和","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b8/b80fe47de5aa872b17c3eea35865414f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分別表示用戶和標的物的bias項。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼用戶u的隱因子表示,可以用該用戶的所有特徵的嵌入表示之和,具體來說,可以表示爲:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/44/4437fb3239fb5e84e0ddb4c54f67f6b0.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同理,標的物i的隱因子也可以用該標的物所有特徵的嵌入向量的和來表示,具體如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fd/fda55396551154fd8f499e03e9e28e7d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們分別用","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d5/d55726979cb9b1746c7208c80effe6fd.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f3/f3d87468d2e01a9f3bc55ed827252004.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表示用戶u和標的物i的bias向量表示。那麼,用戶u對標的物i的預測評分可以用如下公式表示","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d6/d614d25b8c42641734bc707e4977530a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/42/4207b52875a73d052ffab06d4ee319c6.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以採用如下的函數形式,","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fc/fcdb33cd014bfd9df2ec1adea95cb251.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了上面這些基礎介紹,最終可以用如下的似然函數來定義問題的目標函數,通過最大化似然函數,求得","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/04/0415501ec524ded8cd776bf1331c9afd.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"、","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e4/e44b4d09507031144bd12845e26ebfb4.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"、","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/72/72893f162891f0f0bed62b5e17450910.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"、","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b8/b80fe47de5aa872b17c3eea35865414f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這些嵌入的特徵向量。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/2d/2dd300e3bd0356f4d2d9864c128f5468.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面利用特徵的嵌入向量之和來表示用戶或者標的物向量,這就很好地將metadata信息整合到了用戶和標的物向量中了,再利用用戶向量","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e8/e84edc0aa2115b30465d7834b69447a3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"和標的物向量","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/37/377968bbe97ef8be93e0cb04e135c881.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的內積加上bias項,通過一個logistic函數來獲得用戶u對標的物i的偏好概率/得分,從這裏的介紹可以看到,該模型很好地將矩陣分解和metadata信息整合到了一個框架之下。感興趣的讀者可以詳細閱讀原文,對該方法做進一步瞭解(該文章給出了具體的代碼實現,是一個非常好的學習資源,代碼見","attrs":{}},{"type":"link","attrs":{"href":"https://github.com/lyst/lightfm","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"https://github.com/lyst/lightfm","attrs":{}}]},{"type":"text","text":")。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8.5 近實時矩陣分解算法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前面三節對矩陣分解的算法原理、求解方法、拓展進行了詳細介紹。前面介紹的方法基本上是適合做批處理的,通過離線訓練模型,再爲用戶推薦。批處理比較適合對時效性要求不太高、消費標的物需要時間比較長的產品,比如電商推薦、長視頻推薦等。而有些產品,比如今日頭條、快手、網易雲音樂,這類產品的標的物要麼用戶消費時間短要麼單位時間內會產生大量標的物(用戶很短的時間就聽完了一首歌,每天有大量用戶上傳短時視頻到快手平臺)。針對這類產品,用離線模型不能很好的捕獲用戶的實時興趣變化,另外,因爲有很多新標的物加入進來,批處理也無法及時將新標的物整合到推薦系統中,解決這兩個問題的方法之一是做近實時的矩陣分解,這樣可以實時反映用戶興趣變化及更快地整合新標的物進推薦系統。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼可以做到對矩陣分解進行實時化改造嗎?答案是肯定的,業內有很多這方面的研究成果及工業應用實踐,有興趣的讀者可以詳細閱讀參考文獻1、2、14、15、16、17,這6篇文章有對近實時矩陣分解算法的介紹及工程實踐經驗的案例分享。在本節我們講解一種實時矩陣分解的技術方案,該方案是騰訊在2016年發的一篇文章上提供的(見參考文獻1),並且在騰訊視頻上進行了實際檢驗,效果相當不錯,該算法的實現方案簡單易懂,非常值得借鑑。下面我們從算法原理和工程實現兩個維度來詳細講解該算法的實現細節。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8.5.1 算法原理","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該實時矩陣分解算法也是採用8.4.1中整合偏差項的方法來預測用戶u對標的物v的評分,具體預測公式如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f3/f3df7ca5509e187b43ed1442e91674d9.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c8/c83d3e5ce9c166cb7946b7651ed619cf.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是全局均值,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d7/d70e58c5f7a9f52ee3dc67b20d86a180.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是標的物偏差,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d2/d2bd218f32640258d8107642145a5b16.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是用戶偏差,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/2e/2ec85113132e8135363b89d7d49306d3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是用戶標的物交叉項。那麼最終的優化問題就轉化爲:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8f/8fc8110c3ce582c44cbf4a7eb171dced.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們定義","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a1/a14364ab595dae93fd6edd23916e470f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於上面的最優化問題,我們可以得到如下的SGD迭代更新公式(上式對各個參數求偏導數,並且沿着導數相反方向更新各個參數就得到如下的公式,感興趣的讀者可以自行推導一下):","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8a/8a46ceb1b0c65e4bc19571a1ffd0d72f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b9/b9adb8438a254fd7c02ff69822b665d3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/bb/bbc223fb44fbafee7cec8901edc35b46.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/2c/2c1967e8fc762d37e0e87c46778f58d1.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該論文采用跟8.4.5中隱式反饋中一樣的思路,用","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/eb/ebde5b1fe44b61857aa22abc041e3421.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表示用戶u對標的物v的偏好置信度,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/eb/ebde5b1fe44b61857aa22abc041e3421.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的計算公式如下,其中a、b是超參數,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ae/aee1607e8605e8eed1d7910a554ad52c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"、","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/78/7869e74881442dc1849333a7aa0d7bc7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分別是用戶播放視頻v的播放時長及視頻v總時長。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/17/1732996b6ba9caf35f92956d0f593937.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該論文中也嘗試過採用","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/88/8828e2cf203d22ed85c741e08a440748.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這類線性公式,但是經過線上驗證,上述對數函數的公式效果更好。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作者公司也是做視頻的,作者曾經用公式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/67/67f4943c93cc6dea7df6bb2a27b316af.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"來計算視頻的得分,其中ratio就是視頻播放時長與視頻總時長的比例,等價於上面的","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/cf/cf3e20f04e06beebf5809a81a7553f5a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",log是對數函數,上式中乘以10是爲了將視頻評分統一到0到10之間,並且當ratio=0時,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1a/1a60679f33cc32ea0b0d23f5ec9e2b19.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"=0,當ratio=1時,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1a/1a60679f33cc32ea0b0d23f5ec9e2b19.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"=10。這個公式跟騰訊論文中的公式本質上是一致的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採用對數函數是有一定的經濟學道理在裏面的,因爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/46/46e85cbf6c8340490a872b9436e45a69.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是自變量x的遞減函數,即導數(斜率)是單調遞減函數,當自變量x越大時,函數值增長越慢,因此基於該公式的數學解釋可以說明對數函數是滿足經濟學上的“邊際效應遞減”這一原則的。針對視頻來說,意思就是你在看前面十分鐘代表的興趣程度是大於在後面看十分鐘的,這就像你在很餓的時候,喫前面一個饅頭的滿足感是遠大於吃了四個饅頭之後再喫一個饅頭的滿足感的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶u對視頻v的偏好","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a3/a3d6b263791dde5f87d45593791dab5d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲二元變量,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a3/a3d6b263791dde5f87d45593791dab5d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"=1表示用戶喜歡視頻v(用戶播放、收藏、評論等隱式行爲),","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a3/a3d6b263791dde5f87d45593791dab5d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"=0表示不喜歡(視頻曝光給用戶而用戶未產生行爲或者視頻根本沒有曝光給用戶),具體公式如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/72/72bd71cd3cca3961b2eb0426580bcb62.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"     ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在近實時訓練矩陣分解模型時,只有當","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a3/a3d6b263791dde5f87d45593791dab5d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"=1的(隱式)用戶行爲才更新模型,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a3/a3d6b263791dde5f87d45593791dab5d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"=0時直接將該行爲丟棄,而更新時,學習率跟置信度成正比,用戶越喜歡該視頻,該用戶行爲對訓練模型的影響越大,具體採用如下公式來定義學習率,它是","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/eb/ebde5b1fe44b61857aa22abc041e3421.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的線性函數。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/70/7079044709797cd47b8bec95766889bc.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"具體的實時訓練採用SGD算法,算法邏輯如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"算法1:利用SGD實時訓練矩陣分解算法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Input:用戶操作行爲,(用戶id,視頻id,具體隱式操作行爲)三元組","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(1)  計算","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a3/a3d6b263791dde5f87d45593791dab5d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/eb/ebde5b1fe44b61857aa22abc041e3421.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(2)  if","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a3/a3d6b263791dde5f87d45593791dab5d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" =1 then","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(3)        if u is new, then","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(4)                Initialize  ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e8/e84edc0aa2115b30465d7834b69447a3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(5)        end","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(6)         if v is new, then","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(7)               Initialize  ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b8/b874bdcff1e57191e9eed95f7ae1b917.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(8)         end","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(9)         Compute","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/92/92304521bdafb3825bac8c454fc26431.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" by","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/70/7079044709797cd47b8bec95766889bc.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(10)        Compute","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/95/95cf07ef33da381ecf38c036e77d4826.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  by  ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a1/a14364ab595dae93fd6edd23916e470f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(11)        ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a8/a8d519e475b9cd460199fb054b4cc7b6.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(12)        ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e7/e7afab1f9c7567732ed79ec3833c82d4.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(13)        ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/33/3360a8bc2f8ab7861b8a95553775e3f0.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(14)        ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ec/ecb7ef0f98cacab96e247aada378f682.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(15)  end","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8.5.2 工程實現","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面講完了實時訓練矩陣分解算法的原理,下面我們來講講怎麼爲用戶做推薦,以及在工程上怎麼實現實時爲用戶做推薦。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,我們簡單描述一下怎麼爲用戶進行推薦。具體爲用戶生成推薦的流程如下圖,一共包含5個步驟。下面分別對每個步驟的作用加以說明。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/10/10e96cecd4ec9fd4e28031f37d597897.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖4:爲用戶實時生成推薦的流程","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"步驟1:獲取種子視頻集","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"種子是用戶最近偏好(有過隱式反饋的)的視頻,或者用戶歷史上有偏好的視頻,分別代表了用戶的短期和長期興趣,這些行爲直接從用戶的隱式反饋行爲中獲取,只要前端對用戶的操作行爲進行了埋點,通過實時日誌收集就可以獲取。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"步驟2:獲取候選推薦視頻集","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一般來說,全量視頻是非常巨大的,實時爲用戶推薦不可能對全部視頻分別計算用戶對每個視頻的偏好預測,因此會選擇一個較小的子集作爲候選集(這即是召回的過程),我們需要確保該子集很大概率上是用戶可能喜歡的,我們先計算出用戶對該子集中每個視頻的偏好評分,最終將預測評分最高的topN推薦給用戶。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該論文通過選取步驟1中種子集的視頻的相似視頻作爲候選集,因爲種子視頻是用戶喜歡的,它的相似視頻用戶喜歡的概率也相對較大,所以這種方式的召回是有理論依據的。視頻相似度會在後面介紹。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"步驟3:獲取特徵向量","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從分佈式存儲中獲取用戶和候選集視頻的特徵向量,該向量會用於計算用戶對候選集中視頻的偏好。特徵向量的計算會在後面介紹。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"步驟4:預測評分偏好","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了步驟3中的用戶和視頻特徵向量,就可以用公式","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f3/f3df7ca5509e187b43ed1442e91674d9.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"來計算該用戶對每個候選集中視頻的偏好度。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"步驟5:候選集排序","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"步驟4計算出了用戶對候選集中每個視頻的偏好度,那麼按照偏好度降序排列,就可以將topN的視頻推薦給用戶了。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當用戶在前端進行隱式反饋操作時,用戶的行爲通過實時日誌流到實時推薦系統,該系統根據上面5個步驟爲每個用戶生成推薦結果,用戶最近及歷史行爲、視頻相似度、用戶特徵向量、視頻特徵向量都存儲在高效的分佈式存儲中(如Redis、HBase、CouchBase等分佈式NoSQL),這5個步驟都是非常簡單的計算,因此可以在幾十毫秒之內爲用戶生成個性化推薦。最終的推薦結果可以插入(更新)到分佈式存儲引擎中,當用戶在前端請求推薦結果時,推薦web服務器從分佈式存儲引擎中將該用戶的推薦結果取出,並組裝成合適的數據格式,返回到前端展示給用戶。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶最近及歷史行爲、視頻相似度、用戶特徵向量、視頻特徵向量這些數據是通過另外一個後臺實時程序來計算及訓練的,跟推薦過程解耦,互不影響。在騰訊的文章中,是利用Storm來實現的,除了用Storm外,用Spark Streaming或者Flink等流式計算引擎都是可以的,只是具體的實現細節不一樣。爲了不拘泥於一種計算平臺,下面結合作者個人的經驗及理解,來講解怎麼生成這些在推薦過程中依賴的數據。這裏我們抽象出實現的一般邏輯,具體生成數據的流程見下圖,通過4個算子來完成數據生成過程,下面我們分別對這4個算子的功能加以說明。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b3/b37f19f68641f5d5671c3283d57a684d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖5:計算個性化推薦依賴的用戶播放歷史、視頻相似度、用戶&視頻特徵向量","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"算子1:提取用戶隱式反饋行爲","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於用戶在前端對視頻的隱式反饋,通過收集用戶行爲日誌,做ETL,將(用戶id,視頻id,隱式操作,操作時間) 四元組插入消息隊列(如kafka),供後面的算子使用。作者在第17章會專門介紹怎麼收集用戶行爲數據,想提前瞭解的讀者可以查閱。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"算子2:生成用戶反饋歷史並存於分佈式key-value存儲中","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於消息隊列中的用戶行爲四元組,將用戶隱式反饋行爲存於分佈式存儲中。注意,這裏可以保留用戶最近的操作行爲及過去的操作行爲,並且也可以給不同時間點的行爲不同的權重,以體現用戶的興趣是隨着時間衰減的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"算子3:計算視頻之間的相似度","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏先說一下計算兩個視頻相似度的計算公式,騰訊這篇文章是通過如下3類因子的融合來計算兩個視頻最終的相似性的:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(1) 基於視頻特徵向量的相似性:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fc/fc2386f908afddd916aa89432bed41fc.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(2) 基於視頻類型的相似性,其中type(i)是視頻i的標籤類型(如搞笑、時政等)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/16/163a0d13887d385d6602cd56866c6d48.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果兩個視頻類型一樣,類型相似性爲1,否則爲0。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(3) 時間衰減因子","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"   ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/01/01b12b3f49a077de09d613576b9db6ea.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",其中","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/59/59712775bfe552740c28823a5f362bbd.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是sim(i,j)最近一次更新時間與當前時間之差,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ac/acf1d6d2b25b1ad6ecefc5f45a611377.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是控制衰減速率的超參數。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了上面3類因子,通過如下融合公式來最終得到i和j的相似度,具體如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ff/ff6a54c76065149cfe471b9cc20e596a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了上面計算兩個視頻相似度的公式,當新事件發生時(即有新用戶行爲產生),基於該事件中的視頻 i,在分佈式存儲中根據上面的公式結合用戶行爲歷史更新該視頻與其他視頻構成的視頻對的相似度(i:),同時更新視頻i最相似的topN的視頻,具體計算實時topN相似度的方法可以參考第6章《協同過濾推薦算法》中的一種實時計算topN相似度的方案,在騰訊這篇論文中沒有細講具體實現細節。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"算子4:生成用戶和視頻的特性向量","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這一步是整個算法的核心,這一步利用算法1中的實時更新邏輯來訓練用戶和視頻的特性向量。具體訓練過程是,當用戶有新的操作行爲從管道中流過來時,從分佈式存儲中將該用戶和對應的視頻的特徵向量取出來,採用算法1中的公式更新特徵向量,更新完成後再插入分佈式存儲中。如果該用戶是新用戶或者操作的視頻是新入庫的視頻,那麼就可以隨機初始化用戶或視頻特徵向量,並插入分佈式存儲中,待後續該用戶或者包含該視頻新的操作流進來時繼續更新。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過上面的介紹,我們大致知道怎麼利用流式計算引擎來做實時的矩陣分解並給用戶做個性化推薦了。到此,矩陣分解的離線及實時算法實現方案都講完了,下面我們來梳理一下矩陣分解算法可行的應用場景。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8.6 矩陣分解算法的應用場景","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前面幾節對矩陣分解的算法原理、工程實踐及拓展做了詳細介紹,我們知道了矩陣分解算法的特性,那麼矩陣分解算法可以用於哪些應用場景呢?在本節我們會詳細介紹幾類可行的應用場景。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8.6.1 應用於完全個性化推薦場景(完全個性化推薦範式)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"完全個性化推薦是爲每個用戶生成不一樣的推薦結果,我們通常指的推薦一般是指完全個性化推薦,上面8.1、8.2節介紹的爲每個用戶生成推薦即是完全個性化推薦。下圖就是電視貓首頁的興趣推薦,爲每個用戶推薦感興趣的長視頻,是完全個性化的,其中也採用了矩陣分解算法。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b4/b4b2225296cbcdebb547742db47a409b.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖6:電視貓首頁興趣推薦:完全個性化推薦,每個用戶推薦不一樣的視頻集","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8.6.2 應用於標的物關聯標的物場景(標的物關聯標的物範式)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"標的物關聯標的物推薦就是爲每個標的物關聯一組相似/相關的標的物作爲推薦。前面我們沒有直接講到怎麼做標的物關聯推薦。但是,如果我們得到了每個標的物的特徵向量(通過矩陣分解,獲得的標的物嵌入k維特徵空間的向量),那麼我們可以通過向量的cosine餘弦相似度來獲得與某個標的物最相似的K個標的物作爲關聯推薦結果。有了標的物特徵向量後具體怎麼工程實現,計算topK相似度,讀者可以參考第5章《協同過濾推薦算法》6.3.1節中計算topK相似度的方法來實現,原理是完全一樣的,這裏不再贅述。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下圖是電視貓視頻的相似推薦,它就是一種關聯推薦,關聯推薦大量用於電商、視頻、新聞等產品中。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/68/6832852f6326b0a1e78d48d43fae6e07.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖7:電視貓詳情頁相似影片:視頻關聯視頻的推薦,可以基於標的物特徵向量來計算關聯推薦","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8.6.3 用於用戶及標的物聚類","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過矩陣分解我們獲得了用戶及標的物的k維特徵向量,有了特徵向量,我們就可以對用戶和標的物聚類了。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對用戶聚類後,我們可以將同一類的其他用戶操作過的標的物推薦給該用戶,這也是一種可行的推薦策略。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對標的物聚類後,我們可以將同一類的標的物作爲該標的物的關聯推薦。另外,聚類好的標的物可以作爲專題/專輯等供編輯、運營人員用於營銷推廣。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8.6.4 應用於羣組個性化場景(羣組個性化推薦範式)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在8.6.3中我們講了用戶聚類,當用戶聚類後,我們可以對同一類用戶提供相同的推薦服務,這時就是羣組個性化推薦。羣組個性化推薦相當於將有相同興趣偏好的個體看成一個等價類,統一爲他們提供推薦,它是介於完全個性化推薦(每個人推薦的都不一樣)和完全非個性化推薦(所有人推薦的都一樣)之間的一種推薦形態。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們對電視貓的站點樹內容做的個性化重排序就是採用的基於矩陣分解獲取用戶特徵向量再對用戶聚類的技術。首先對用戶聚類,同一類的用戶通過特徵向量平均獲得該類的中心特徵向量,該中心特徵向量代表了該羣組的特徵,我們再用該向量跟標的物特徵向量求cosine餘弦,最終獲得該中心向量跟所有標的物特徵向量的相似度。在站點樹重排中,對站點樹的所有節目,可以獲得中心向量與站點樹節目特徵向量的相似度,按照該相似度降序排列就獲得了該羣組的重排序結果。下圖就是電視貓電視劇頻道“","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"戰爭風雲","attrs":{}},{"type":"text","text":"”站點樹重排序的產品形態(每個用戶看到的戰爭風雲總節目量是一樣的,只是排序不一樣,會按照用戶的興趣[其實是用戶所在羣組的平均興趣]將喜歡的排在前面)。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/4d/4d358b6217a78447e0db59d232555dba.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖8:電視貓站點樹個性化排序:基於羣組個性化爲用戶做推薦","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了上面的應用場景外,由矩陣分解獲得的用戶和標的物特徵向量,可以作爲其他模型(如深度學習模型)的特徵輸入,進一步訓練更復雜的模型。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8.7 矩陣分解算法的優缺點","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過前面的介紹,我們知道矩陣分解算法原理相對簡單,也易於分佈式實現,並且可以用於很多真實業務場景和產品形態,那麼在本節我們來總結一下矩陣分解算法的優缺點,方便讀者在實際應用矩陣分解算法時更好地理解和運用。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8.7.1 優點","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"矩陣分解算法作爲一類特殊的協同過濾算法,具備協同過濾算法的所有優點,具體表現在:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"(1) 不依賴用戶和標的物的其他信息,只需要用戶行爲就可以爲用戶做推薦","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"矩陣分解算法也是一類協同過濾算法,它只需要用戶行爲就可以爲用戶生成推薦結果,而不需要用戶或者標的物的其他信息,而這類其他信息往往是半結構化或者非結構化的信息,不易處理,有時也較難獲得。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"矩陣分解是領域無關的一類算法,因此,該優點可以讓矩陣分解算法基本可以應用於所有推薦場景中,這也是矩陣分解算法在工業界大受歡迎的重要原因。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"(2) 推薦精準度不錯","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"矩陣分解算法是Netflix推薦大賽中獲獎算法中非常重要的一類算法,準確度是得到業界一致認可和驗證的,作者所在公司的推薦業務中也大量利用矩陣分解算法,效果也是非常不錯的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"(3) 可以爲用戶推薦驚喜的標的物","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"協同過濾算法利用羣體的智慧來爲用戶推薦,具備爲用戶推薦差異化、有驚喜度的標的物的能力,矩陣分解算法作爲協同過濾算法中一類基於隱因子的算法,當然也具備這個優點,甚至比user-based和item-based協同過濾算法有更好的效果。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"(4) 易於並行化處理","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過第三節矩陣分解的ALS求解過程,我們可以知道矩陣分解是非常容易並行化的,Spark MLlib庫中就是採用ALS算法進行分佈式矩陣分解的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8.7.2 缺點","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面講了這麼多矩陣分解算法的優點,除了這些優點外,矩陣分解在下面這兩點上是有缺陷的,需要採用其他的算法和策略來彌補或者避免。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"(1) 存在冷啓動問題","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當某個用戶的操作行爲很少時,我們基本無法利用矩陣分解獲得該用戶比較精確的特徵向量表示,因此無法爲該用戶生成推薦結果。這時可以藉助內容推薦算法來爲該用戶生成推薦。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於新入庫的標的物也一樣,可以採用人工編排的方式將標的物做適當的曝光獲得更多用戶對標的物的操作行爲,從而方便算法將該標的物推薦出去。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"參考文獻9中提供了一種解決矩陣分解冷啓動問題的有效算法lightFM,通過將metadata信息整合到矩陣分解中,可以有效解決冷啓動問題,對於操作行爲不多的用戶以及新上線不久還未收集到足夠多用戶行爲的標的物都有比較好的推薦效果。該論文的lightFM算法在github上有相應的python代碼實現(參見","attrs":{}},{"type":"link","attrs":{"href":"https://github.com/lyst/lightfm","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"https://github.com/lyst/lightfm","attrs":{}}]},{"type":"text","text":"),可以作爲很好的學習材料。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"(2) 可解釋性不強","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"矩陣分解算法通過矩陣分解獲得用戶和標的物的(嵌入)特徵表示,這些特徵是隱式的,無法用現實中的顯示特徵進行解釋,因此利用矩陣分解算法做出的推薦,我們無法對推薦結果進行解釋,只能通過離線或者在線評估來評價算法的效果。不像user-based和item-based協同過濾算法基於非常樸素的”物以類聚、人以羣分“的思想,可以非常容易地進行解釋。但也不是絕對的,其中參考文獻5中的實現方法就提供了一個爲矩陣分解做推薦解釋的非常有建設性的思路。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"總結","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文對矩陣分解算法原理、工程實踐、應用場景、優缺點等進行了比較全面的總結。矩陣分解算法是真正意義上的基於模型的協同過濾算法。通過將用戶和標的物嵌入到低維隱式特徵空間,獲得用戶和標的物的特徵向量表示,再通過向量的內積來量化用戶對標的物的興趣偏好,思路非常簡單、清晰,也易於工程實現,效果也相當不錯,所以在工業界有非常廣泛的應用。矩陣分解算法算是開啓了嵌入類方法的先河,在NLP領域非常出名的Word2Vec也是嵌入方法的代表,深度學習興起後,各類嵌入方法在大量的業務場景中得到了大規模的採用。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"參考文獻","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. Real-time Video Recommendation Exploration","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2. Real-Time Top-N Recommendation in Social Streams","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3. Online-updating  regularized kernel matrix factorization models for large-scale recommender systems","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4. Factorization Meets the Item Embedding- Regularizing Matrix Factorization with Item Co-occurrence","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5. Collaborative Filtering for Implicit Feedback Datasets","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"6. Large-Scale Parallel Collaborative Filtering for the Netflix Prize","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"7. Content-boosted Matrix Factorization Techniques for Recommender Systems","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"8. Matrix Factorization Techniques for Recommender Systems","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"9. Metadata Embeddings for User and Item Cold-start Recommendations","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"10. Neural Word Embedding as Implicit Matrix Factorization","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"11. Collaborative Filtering with Temporal Dynamics","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"12. Factorization Meets the Neighborhood: A Multifaceted Collaborative Filtering Model","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"13. Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"14. Online collaborative flitering","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"15. Online-updating regularized kernel matrix factorization models for large-scale recommender systems","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"16. Fast incremental matrix factorization for recommendation with positive-only feedback","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"17. Online personalized recommendation based on streaming implicit user feedback","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"18. Multiverse recommendation: N-dimensional tensor factorization for context-aware collaborative filtering    ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"19. Matrix co-factorization for recommendation with rich side information and implicit feedback ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章