推薦算法概述(十五)

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"寫在前面:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大家好,我是強哥,一個熱愛分享的技術狂。目前已有 12 年大數據與AI相關項目經驗, 10 年推薦系統研究及實踐經驗。平時喜歡讀書、暴走和寫作。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業餘時間專注於輸出大數據、AI等相關文章,目前已經輸出了40萬字的推薦系統系列精品文章,今年 6 月底會出版「構建企業級推薦系統:算法、工程實現與案例分析」一書。如果這些文章能夠幫助你快速入門,實現職場升職加薪,我將不勝歡喜。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"想要獲得更多免費學習資料或內推信息,一定要看到文章最後喔。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"內推信息","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你正在看相關的招聘信息,請加我微信:liuq4360,我這裏有很多內推資源等着你,歡迎投遞簡歷。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"免費學習資料","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你想獲得更多免費的學習資料,請關注同名公衆號【數據與智能】,輸入“資料”即可!","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"學習交流羣","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你想找到組織,和大家一起學習成長,交流經驗,也可以加入我們的學習成長羣。羣裏有老司機帶你飛,另有小哥哥、小姐姐等你來勾搭!加小姐姐微信:epsila,她會帶你入羣。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推薦系統中最核心的模塊要數推薦算法了,研究和優化推薦算法也是推薦算法工程師平時的主要工作。推薦算法做得好不好直接影響推薦系統的價值發揮,因此在推薦系統中具有舉足輕重的地位。在本章中,作者根據自己的實踐經驗和總結感悟,對推薦算法進行抽象和歸類,提煉出推薦算法的一般範式,讓讀者從宏觀上把握推薦算法的應用脈絡,但不會深入講解算法的實現原理,只是概述算法的實現思路,後面的系列章節我們會對常用的重點算法進行細緻深入剖析。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本章會從推薦系統範式、推薦算法3階段pipeline架構、推薦召回算法概述、排序算法概述、推薦算法落地需要關注的幾個問題等5個部分來講解。完全個性化範式和標的物關聯標的物範式是最常用的推薦範式,在互聯網產品中有大量真實場景應用,也是本章要重點講解的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"讀者學習完本章後,需要知道每類範式常用的算法有哪些、實現的思路是什麼、以及常用的應用場景。本章爲後續推薦算法的詳細講解做好鋪墊工作,這裏講到的知識點可以作爲落地推薦算法到真實推薦場景的參考指南。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.1 推薦系統範式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推薦系統的目標是爲用戶推薦可能喜歡的標的物,這個過程涉及到用戶、標的物兩個重要要素,我們可以根據這兩個要素的不同組合產生不同的推薦形態,即所謂的不同","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"範式","attrs":{}},{"type":"text","text":"(","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"paradigm","attrs":{}},{"type":"text","text":")(數學專業的讀者不難理解範式的概念,如果不好理解可以將範式看成具備某種相似性質的對象的集合)。根據作者自己構建推薦系統的經驗,可以將推薦系統總結爲如下5種範式,這5種範式可以應用到產品的各種推薦場景中,後面會拿視頻APP舉例說明各種範式的具體應用場景。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"範式1:完全個性化範式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"完全個性化範式就是爲每個用戶提供個性化的推薦,每個用戶的推薦結果都不相同,這是粒度最細的一種推薦範式,精確到了每個用戶。常見的猜你喜歡就是這類推薦,可以用於進入首頁的綜合類猜你喜歡推薦,進入各個頻道(如電影)頁的猜你喜歡推薦。下圖是電視貓首頁興趣推薦,就是爲每個用戶提供不一樣的個性化推薦。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ca/cab218bdae983b1cc0f30b41ff364bd7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖1:電視貓首頁興趣推薦","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"範式2:羣組個性化範式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"羣組個性化範式首先將用戶分組(根據用戶的興趣,將興趣相似的歸爲一組),每組用戶提供一個個性化的推薦列表,同一組的用戶推薦列表一樣,不同組的用戶推薦列表不一樣。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏舉一個在作者公司利用範式2做推薦的例子。我們在頻道頁三級列表中,會根據用戶的興趣對列表做個性化重排序,讓與用戶更匹配的節目放到前面,提升節目轉化,但是在實現時,爲了節省存儲空間,先對用戶聚類,同一類用戶興趣相似,對這一類用戶,列表的排序是一樣的,但是不同類的用戶的列表是完全不一樣的。見下圖的戰爭風雲tab,右邊展示的節目集合總量不變,只是在不同組的用戶看到的排序不一樣,排序是根據與用戶的興趣匹配度高低來降序排列的。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5b/5b3826ce27caf196fefdbb92d8438978.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖2:電視貓頻道頁列表的個性化重排序","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"範式3:非個性化範式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"非個性化範式爲所有用戶提供完全一樣的推薦,這種推薦就是對所用用戶統一對待,沒有任何個性化成分。比如各類排行榜業務,就是這種推薦範式。排行榜既可以作爲首頁上的一個獨立的推薦模塊,方便用戶發現新熱內容,也可以作爲猜你喜歡推薦新用戶冷啓動的默認推薦,下圖是搜索模塊當用戶未輸入搜索關鍵詞時給出的熱門內容,也是採用該範式的例子。其實編輯人工編排的推薦也屬於非個性化推薦範式,只不過是人工進行的推薦,而排行榜是通過算法來自動實現的。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f1/f1b613225c0149814a40d9b5aeaa7410.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖3:電視貓搜索頁面當用戶無任何輸入時給出排行榜推薦","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"範式4:標的物關聯標的物範式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"標的物關聯標的物範式爲每個標的物關聯一組相關或者相似的標的物,作爲用戶在訪問標的物詳情頁時的推薦,每個用戶看到該標的物關聯推薦的標的物都是一樣的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當用戶瀏覽一個電影時,可以通過關聯相似的電影,爲用戶提供更多的選擇空間(下圖就是電視貓電影詳情頁關聯的相似影片)。還可以當用戶播放一個節目退出時,推薦用戶可能還喜歡的其他節目。針對短視頻,可以將相似節目做成連播推薦列表,用戶播放當前節目直接連播相似節目,提升節目分發與用戶體驗。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/72/72af2c66eb0c59a6a18598f128c92b38.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖4:電視貓電影的相似推薦就屬於標的物關聯標的物推薦範式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"範式5:笛卡爾積範式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"笛卡爾積範式是每個用戶跟每個標的物的組合產生的推薦結果都不相同。拿上面圖4來說,不同用戶在同一個視頻的詳情頁看到的推薦結果都不一樣。該範式跟範式4類似,只不過不同用戶在同一個節目得到的關聯節目不一樣,會結合用戶的興趣,給出更匹配用戶興趣的關聯節目。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"往往用戶數和標的物的數量都是巨大的,由於每個用戶跟每個標的物的組合推薦結果都不一樣, 沒有足夠的資源事先將所有的組合推薦結果先計算存儲下來,一般是在用戶觸發推薦時實時計算推薦結果呈現給用戶,計算過程也要儘量簡單,在亞秒級就可以算完。比如利用用戶的播放歷史,過濾掉用戶已經看過的關聯節目,就是一種最簡單的基於笛卡爾積範式的推薦。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面給一個簡單的圖示來說明這5種範式,讓讀者有一個直觀形象的理解。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ab/ab142f4c7eda75c7232707d56d0a1f9a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖5:推薦算法的5種範式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總之,推薦系統不是孤立存在的對象,它一定要整合到具體的業務中,在合適的產品交互流程中觸達用戶,通過用戶觸發推薦行爲。所以,推薦系統要應用到產品中需要嵌入到用戶使用產品的各個流程(頁面)中。當用戶訪問首頁時,可以通過綜合推薦(範式1)來給用戶提供個性化推薦,當用戶訪問詳情頁,可以通過相似影片(範式4)提供相似標的物推薦,當用戶進入搜索頁尚未輸入搜索內容時,可以通過熱門推薦給用戶推送新熱節目(範式3)。這樣在用戶瀏覽的各個頁面都有推薦,讓推薦系統無處不在,纔會使產品顯得更加智能。所有這些產品形態基本都可以用上面介紹的5種範式來囊括。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.2 推薦算法3階段pipeline架構","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"工業級推薦系統的推薦業務流程一般分爲召回、排序、業務調控3個階段,召回就是將用戶可能會感興趣的標的物通過算法從全量標的物庫中取出來,一般會採用多個算法來召回,比如熱門召回、協同過濾召回、標籤召回等,排序階段將召回階段的標的物列表根據用戶可能的點擊概率大小排序(即所謂的CTR預估)。在實際業務中,在排序後還會增加一層調控邏輯,根據業務規則及運營策略對排序後的列表進一步增補微調,滿足特定的運營需求。下面圖6是電視貓(一款基於OTT端[智能電視或者智能盒子]的視頻播放軟件)的推薦系統的業務流程,包含召回、排序和業務調控三大算法和策略模塊,可以作爲讀者設計推薦系統算法模塊的參考。本章只講解召回、排序兩個階段涉及到的算法,業務調控跟具體業務及公司運營策略強相關,本章不做過多描述,在第另一章會詳細講解。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/48/48124379b3ee56fcf1242188d9fe2fac.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖6:電視貓推薦系統業務流","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面我們介紹了常用的推薦範式及工業界推薦算法的pipeline架構,在下面一節我們對每種推薦範式涉及到的召回算法做一個綜述,希望讀者對這些算法有初步瞭解,知道這些算法可以應用於哪些場景中。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.3 推薦召回算法概述","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在本節我們會根據推薦召回算法的5種範式來講解每種範式常用的算法策略,讓讀者對各種算法有一個整體的瞭解。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.3.1 非個性化範式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"非個性化範式就是所有用戶推薦一樣的標的物列表,一般各種榜單就是這類推薦,如最新榜、最熱榜等等。這類排行榜就是基於某個規則來對標的物降序排列,將排序後的標的物取topN推薦給用戶。比如最新榜可以根據標的物上線的時間順序來倒序排列,取前面的topN推薦給用戶。最熱榜可以根據用戶播放量(點擊量)降序排列。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏面可能需要考慮標的物的多品類特性,甚至還會考慮地域、時間、價格等各個維度。在具體實施時會比較複雜,需要根據具體的產品及業務場景來設計。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"非個性化範式可以基於簡單的計數統計來生成推薦,基本不會用到很複雜的機器學習算法。當然,用來取topN的排行榜計算公式可能會整合各類用戶行爲數據,公式會比較複雜(如豆瓣評分公式就比較複雜)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"非個性化範式的排行榜等算法,實現起來很簡單,可解釋性也很強。雖然每個用戶推薦的內容都一樣,但是(從生物進化上)人都是有從衆心理的,大家都喜歡的東西,我們也喜歡的概率還是很大的,所以這類推薦效果還是非常不錯的。這類算法也可以作爲冷啓動或者默認的推薦算法。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.3.2 完全個性化範式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"完全個性化範式是最常用的推薦模式,可用的推薦方法非常多。下面對常用的算法及最新的算法進展進行簡單梳理。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.3.2.1 基於內容的個性化推薦算法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這類推薦算法只依賴於用戶自己的歷史行爲而不必知道其他用戶的行爲。該算法的核心思想是:標的物是有描述屬性的,用戶對標的物的操作行爲爲用戶打上了相關屬性的烙印,這些屬性就是用戶的興趣標籤,那麼我們就可以基於用戶的興趣來爲用戶生成推薦列表。拿視頻推薦來舉例,如果用戶過去看了科幻和恐怖兩類電影,那麼恐怖、科幻就是用戶的偏好標籤了,這時我們就可以給用戶推薦科幻、恐怖類的其他電影。具體來說,我們有如下兩類方法來爲用戶做推薦。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"l 基於用戶特徵表示的推薦","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"標的物是具備很多文本特徵的,比如標籤、描述信息、metadata信息等。我們可以將這些文本信息採用TF-IDF或者LDA等算法轉化爲特徵向量,如果是用標籤來描述標的物,那麼我們可以構建一個以標籤爲特徵的特徵向量。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了特徵向量,就可以將用戶所有操作過的標的物的特徵向量的(時間加權)平均作爲用戶的特徵向量,利用用戶特徵向量與標的物特徵向量的乘積就可以計算用戶與標的物的相似度,從而計算出用戶的推薦列表。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"l 基於倒排索引查詢的推薦","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果我們基於標籤來表示標的物屬性,那麼基於用戶的歷史行爲,可以構建用戶的興趣畫像,該畫像即是用戶對各個標籤的偏好,並且有相應的偏好權重。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"構建完用戶畫像後,我們可以構建出標籤與標的物的倒排索引查詢表(熟悉搜索的同學應該不難理解)。基於該反向索引表及用戶的興趣畫像,我們就可以爲用戶做個性化推薦了。該類算法其實就是基於標籤的召回算法。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"具體推薦過程是這樣的(見下面圖7):從用戶畫像中獲取用戶的興趣標籤,基於用戶的興趣標籤從倒排索引表中獲取該標籤對應的節目,這樣就可以從用戶關聯到節目了。其中用戶的每個興趣標籤及標籤關聯到的標的物都是有權重的。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/82/82d6b06ec3d22f00d9a203f8f8b2aece.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖7:基於倒排索引的電影推薦","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該類推薦算法是非常自然直觀的,可解釋性強。同時可以較好地解決冷啓動,只要用戶有一次行爲,就可以基於該行爲做推薦。但是,該類算法往往新穎性不足,給用戶的推薦往往侷限在一個狹小的範圍中,如果用戶不主動拓展自己的興趣空間,該方法很難爲用戶推薦新穎的內容。我們在第5章會對基於內容的推薦算法進行深入介紹。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.3.2.2 基於協同過濾的推薦算法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於協同過濾的推薦算法,核心思想是很樸素的“物以類聚、人以羣分”的思想。所謂物以類聚,就是計算出每個標的物最相似的標的物列表,我們就可以爲用戶推薦用戶喜歡的標的物相似的標的物,這就是基於物品的協同過濾。所謂人以羣分,就是我們可以將與該用戶相似的用戶喜歡過的標的物(而該用戶未曾操作過)的標的物推薦給該用戶,這就是基於用戶的協同過濾。具體思想可以參考下面的圖8。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/4b/4b615a298edf643be7be2410943b9581.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖8:”物以類聚,人以羣分“的樸素協同過濾推薦","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"協同過濾的核心是怎麼計算標的物之間的相似度以及用戶之間的相似度。我們可以採用非常樸素的思想來計算相似度。我們將用戶對標的物的評分(或者隱式反饋,如點擊等)構建如下矩陣(見下面圖9),矩陣的某個元素代表某個用戶對某個標的物的評分(如果是隱式反饋,值爲1),如果某個用戶對某個標的物未產生行爲,值爲0。其中行向量代表某個用戶對所有標的物的評分向量,列向量代表所有用戶對某個標的物的評分向量。有了行向量和列向量,我們就可以計算用戶與用戶之間、標的物與標的物之間的相似度了。具體來說,行向量之間的相似度就是用戶之間的相似度,列向量之間的相似度就是標的物之間的相似度。相似度的計算可以採用cosine餘弦相似度算法。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/bb/bb2bd04c6f8894ad801ed96ad7a908ed.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖9:用戶對標的物的操作行爲矩陣","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在互聯網產品中一般會採用基於物品的協同過濾,因爲對於互聯網產品來說,用戶相對於標的物變化更大,用戶是增長較快的,標的物增長相對較慢(這也不是絕對的,像新聞短視頻應用,標的物數量增長就比較快),利用基於物品的協同過濾算法效果更穩定。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"協同過濾算法思路非常直觀易懂,計算也相對簡單,易於分佈式實現,也不依賴於用戶及標的物的其他信息,效果也非常好,也能夠爲用戶推薦新穎性內容,所以在工業界得到非常廣泛的應用。我們在第6章會對協同過濾推薦算法進行深入介紹。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.3.2.3 基於模型的推薦算法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於模型的推薦算法種類非常多,最常用的有矩陣分解算法、分解機算法等。目前深度學習算法、強化學習算法、遷移學習算法也在推薦系統中得到大規模採用。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於模型的推薦算法基於用戶歷史行爲數據、標的物metadata信息、用戶畫像數據等構建一個機器學習模型,利用數據訓練模型,求解模型參數。最終利用該模型來預測用戶對未知標的物的偏好。下面圖10就是基於模型的推薦系統模型訓練與預測的流程。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/90/90345e2111f6acf15b5e9192af642f45.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖10:基於模型的推薦系統","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於模型的推薦算法有三類預測方式,一類是預測標的物的評分,基於評分的大小表示對標的物的偏好程度。第二類是採用概率的思路,預測用戶對標的物的喜好概率,利用概率值的大小來預測用戶對標的物的喜好程度。另外一類是採用分類的思路,將每個標的物看成一類,通過預測用戶下一個(幾個)標的物所屬的類別來做推薦。矩陣分解算法就是預測用戶對標的物的評分,logistic迴歸算法就是概率預測方法,而YouTube發表的深度學習推薦就是基於分類思路的算法(參見參考文獻10)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"矩陣分解算法是將用戶評分矩陣M分解爲兩個矩陣U、V的乘積。U代表的用戶特徵矩陣,V代表標的物特徵矩陣。某個用戶對某個標的物的評分,就可以採用矩陣U對應的行(該用戶的特徵向量)與矩陣V對應的列(該標的物的特徵向量)的乘積。分解機算法是矩陣分解算法的推廣,這裏不做介紹。我們會在第8章和第9章詳細講解矩陣分解和分解機算法。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着最近幾年深度學習在圖像識別、語音識別領域的大獲成功。有很多研究者及工業實踐者將深度學習用於推薦系統,也取得了非常好的成績,如YouTube、Netflix、阿里、京東、網易、攜程等,都將深度學習部署到了實際推薦業務中,並取得了非常好的轉化效果(參考後面的參考文獻中對應的論文)。我們會在第12章詳細介紹深度學習推薦系統。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"強化學習及遷移學習等新的方法也開始在推薦業務中嶄露頭角,有興趣的讀者可以閱讀文末對應的參考文獻。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.3.3 羣組個性化範式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"羣組個性化範式需要先將用戶分組,分組的原則及方法是非常重要的。一般有如下兩類分組方案。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.3.3.1 基於用戶畫像圈人的推薦","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"先基於用戶的人口統計學數據或者用戶行爲數據構建用戶畫像。用戶畫像一般用於做精準的運營,通過顯示特徵將一批人圈起來,對這批人做針對性的運營。在前面也做了介紹,這裏不再說明。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.3.3.2 採用聚類算法的推薦","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"聚類是非常直觀的一種思路,將行爲偏好相似的用戶聚成一類,他們有相似的興趣。常用的聚類策略有如下兩類。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"l 將用戶嵌入一個高維向量空間,基於用戶的向量表示做聚類","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"將用戶相關特徵嵌入向量空間的方式有很多,下面都是非常主流的做法。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採用基於內容推薦的思路,可以構建用戶的特徵向量(TF-IDF,LDA,標籤等,前面已經介紹過)。有了用戶的特徵向量就可以聚類,該類所有用戶特徵向量的加權平均就是該組用戶的特徵向量,再利用羣組特徵向量與標的物特徵向量的內積來計算羣組與標的物的相似度,從而爲該羣組做個性化推薦。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採用基於用戶的協同過濾的思路,可以構建用戶和標的物的行爲矩陣,矩陣的元素就是用戶對標的物的評分,利用該矩陣的行向量就構建了一個衡量用戶特徵的向量,基於該特徵向量可以對用戶聚類。先對該組用戶所有的特徵向量求均值,可以取k個最大的特徵向量,其他特徵向量忽略不計(設置爲0),最終得到該組用戶的特徵向量。最後就可以根據基於用戶協同過濾的思路來爲該組用戶計算推薦列表了。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"利用矩陣分解可以得到每個用戶的特徵向量,我們可以用該組用戶特徵向量的均值來作爲該用戶組的特徵向量。再利用用戶組的特徵向量與標的物特徵向量的內積來計算羣組對該標的物的偏好,所有偏好計算出來後,通過降序排列就可以爲該組用戶推薦topN的標的物列表了。前面我們提到的電視貓的重排序算法就是基於該思路實現的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"還可以基於詞嵌入的方式,將每個用戶對標的物的所有操作(購買、觀看等)看成一個文檔集合,標的物的sid(唯一標識符)就是一個單詞,採用類似word2vec的方式可以獲得標的物的向量表示(見參考文獻9),那麼用戶的向量表示就是用戶操作過的所有標的物的向量表示的均值(可以採用時間加權,對最早操作的標的物給予最低的權重),這樣就獲得了每個用戶的特徵向量了。該組所有用戶的平均特徵向量就是該組的特徵向量。這時可以採用類似上面矩陣分解的方式計算該組特徵向量與標的物特徵向量的內積爲該組用戶做個性化推薦。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了上面幾種計算羣組推薦的方法外,還有一種基於計數統計的更直觀的推薦方法。當我們對用戶進行聚類後,我們可以對這一組用戶操作過的標的物採用計數的方式統計每個標的物被操作的次數,將同一標的物的操作次數累加,最後按照標的物計數大小按照降序排列。將標的物列表topN推薦給該組,這個topN列表就是絕大多數人喜歡的標的物。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"l 基於圖的聚類","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們可以構建用戶關係圖,頂點是用戶,邊是用戶之間的關係,我們可以採用圖的分割技術,將圖分割成若干個聯通子圖,這些子圖即是用戶的聚類。還有一種方法是將圖嵌入到高維向量空間中,這樣就可以採用kmeans聚類方法做聚類了。有了用戶的聚類就可以採用上面基於計數統計的直觀方法做推薦了,或者採用更復雜的方案做推薦。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那怎麼構建用戶關係圖呢?一般有兩種方法。如果是社交類產品,可以基於社交關係來構建用戶關係圖,用戶之間的邊,代表好友關係。如果是非社交類產品,如果兩個用戶對同一標的物都有操作行爲,那麼這兩個用戶之間可以構建一條邊。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"羣組個性化推薦的優勢是每組給出一樣的推薦,可以減少推薦的計算和存儲。但該方案有一個最大的問題,同一組推薦一樣的標的物列表,很可能對某個用戶來說,推薦的標的物他已經看過,但是其他用戶沒有看過,所以無法過濾掉該標的物,針對某些用戶推薦體驗不夠好。另外,同一組用戶在興趣特徵上多少是有差別的,無法精細地照顧到每個用戶的興趣點。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"羣組個性化推薦的思路和優點也可以用於完全個性化範式的推薦。可以將用戶先分組,每一個分組看成一個等價類(熟悉數學的同學應該很容易理解,不熟悉的同學可以理解爲一個興趣小組),同一組的用戶當成一個用戶,這樣就可以利用完全個性化範式中的算法思路來做推薦。Google在07年發表的一篇論文(參考文獻17)就是採用該思路的協同過濾實現。將用戶分組可以減少計算量,支持大規模並行計算。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.3.4 標的物關聯標的物範式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"標的物關聯標的物就是爲每個標的物推薦一組標的物。該推薦範式的核心是怎麼從一個標的物關聯到一組標的物。這種關聯關係可以是相似的,也可以是基於其他維度的關聯。常用的推薦策略是相似推薦。下面給出4種常用的生成關聯推薦的策略。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.3.4.1 基於內容的推薦","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這類方法一般可以利用已知的數據和信息利用向量來描述標的物,如果每個標的物都被向量化了,那麼我們就可以利用向量之間的相似度來計算標的物之間的相似度。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果標的物是新聞等文本信息,可以採用TF-IDF將標的物映射爲詞向量,我們可以通過詞向量的相似度來計算標的物之間的相似度。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"即使不是文本,只要標的物具備metadata等文本信息,也可以採用該方法。很多互聯網產品是具備用戶評論功能的,這些評論文本就可以看成是標的物的描述信息。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"LDA模型也非常適合文本類的推薦,通過LDA模型將文章(文檔)表示爲主題及相關詞的概率,我們可以通過如下方式計算兩個文檔的相似度:先計算兩個文檔某個主題的相似度,將所有主題的相似度加權平均就可以得到兩篇文檔的相似度,而主題的相似度可以採用主題的詞向量的餘弦內積來表示。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.3.4.2 基於用戶行爲的推薦","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在一個成熟的推薦產品中,會包含很多用戶的行爲,如用戶的收藏、點贊、購買、播放、瀏覽、搜索等,這些行爲代表了用戶對標的物的某種偏好。我們可以基於該用戶行爲來進行關聯推薦。具體的策略有如下4類。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. 比如常用的矩陣分解算法,可以將用戶的行爲矩陣分解爲用戶特徵矩陣和物品特徵矩陣,物品特徵矩陣可以看成是衡量物品的一個向量,利用該向量我們就可以計算兩個標的物之間的相似度了。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採用嵌入的思路做推薦。用戶的所有行爲可以看成是一個文檔,每個標的物可以看成是一個詞,我們可以採用類似word2vec的思路,最終訓練出每個詞(即標的物)的向量表示,利用該向量表示可以計算標的物之間的相似度。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們可以將用戶對標的物的所有操作行爲投射到一個二維表(或者矩陣)上,行是用戶,列是標的物,表中的元素就是用戶對該標的物的操作(評分或者點擊等隱式行爲),通過這種方式我們就構建了一個二維表。這個二維表的列向量就可以用來表示標的物。這樣我們就可以採用向量相似來計算標的物之間的相似度了。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採用購物籃的思路做推薦,這種思路非常適合圖書、電商等的推薦。經常一起購買(或者瀏覽)的標的物形成一個列表(一個購物籃),將過去一段時間所有的購物籃收集起來。 任何一個標的物,我們可以找到跟它出現在同一個購物籃的標的物及次數,統計完該次數後,我們就可以按照該次數降序排列,那麼這個列表就可以當做標的物的關聯推薦了。該推薦思路非常直觀易懂,可解釋性強。下面圖11就是亞馬遜網站上採用該思路的兩類關聯推薦。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6a/6a8f52977dce4faaa737455bb83d4592.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖11:基於”購物籃“思路的關聯推薦","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.3.4.3 基於標籤推薦","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果標的物是包含標籤的,比如視頻推薦。我們就可以利用標籤來構建向量,每個標籤代表一個維度。總標籤的個數就是向量的維度,這樣每個標的物就可以利用標籤的向量來表示了。一般標的物的標籤個數遠遠小於總標籤的個數,所以這個向量是稀疏向量。這樣我們就可以基於稀疏向量的表示來計算標的物之間的相似度了。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.3.4.4 基於標的物聚類的推薦","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們可以將標的物按照某個維度聚類(如果標的物可以嵌入到向量空間,那麼就很容易聚類了),同一類具備某些相似性,那麼我們在推薦時,就可以將同一類的其他標的物作爲關聯推薦。我們需要解決的問題是,某些類可能數量很小,不夠做推薦,這時可以採用一些策略來補充(如補充熱門推薦等)不足的數量。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.3.5 笛卡爾積範式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"笛卡爾積範式的推薦算法一般可以先採用標的物關聯標的物範式計算出待推薦的標的物列表。再根據用戶的興趣來對該推薦列表做重排(調整標的物列表的順序)、增補(增加用戶的個性化興趣)、刪除(比如過濾掉用戶看過的)等。由於笛卡爾積範式的推薦算法在真實業務場景中使用不多,這裏不再詳細講解。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"到目前爲止,我們講完了常用的召回策略。召回除了根據上面的一些算法策略外,還跟具體業務及產品形態有關,可以基於更多的其他維度(如時間、地點、用戶屬性、收入、職業等)來做召回。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"智能電視上的推薦,早上、白天、晚上推薦的不一樣,節假日和平常推薦的也不一樣。上班族早上需要上班,時間不充足,可能推薦短視頻或者新聞更加合適,白天一般是老人在家,可以推薦戲曲、抗戰類節目等,晚上主人回家可以推薦電影、電視劇等內容。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於地點的召回,要求在不同的地方推薦不一樣的標的物,典型的應用有美團外賣,你在不同的地方,給你推薦的是你所在地附近幾公里範圍內的餐廳。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.4 排序算法概述","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推薦系統排序模塊將召回模塊產生的標的物列表(一般幾百個標的物),通過排序算法做重排,更好的反應用戶的點擊偏好,通過排序優化用戶的點擊行爲,將用戶更可能點擊的標的物(一般幾十個)取出來推薦給用戶,最終提升用戶體驗。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"排序模塊會用到很多特徵,基於這些特徵構建排序模型,排序特徵在排序的效果中起到非常關鍵的作用,常用的特徵可以抽象爲如下5大類:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"l 用戶側的特徵,如用戶的性別、年齡、地域、購買力、家庭結構等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"l 商品側的特徵,如商品描述信息、價格、標籤等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"l 上下文及場景特徵,如位置、頁面、是否是週末節假日等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"l 交叉特徵,如用戶側特徵與商品側特徵的交叉等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"l 用戶的行爲特徵,如用戶點擊、收藏、購買、觀看等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"排序框架需要充分利用上述五大類特徵,以便更好的預測用戶的點擊行爲。排序學習是機器學習中一個重要的研究領域,廣泛應用於信息檢索、搜索引擎、推薦系統、計算廣告等的排序任務中,有興趣的讀者可以參考微軟亞洲研究院劉鐵巖博士的專著《Learning to Rank for Information Retrieval》。常用的排序算法框架有pointwise、pairwise、listwise三類,見下面圖12。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/41/41adc646d007aa802eedca2680519583.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖12:三類排序學習算法框架","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖中x1,x2,... 代表的是訓練樣本1,2,... 的特徵,y1,y12,s1,... 等是訓練集的label(目標函數值)。pointwise學習單個樣本,如果最終預測目標是一個實數值,就是迴歸問題,如果目標是概率預測,就是一個分類問題,例如CTR預估。pairwise和listwise分別學習一對有序對和一個有序序列的樣本特徵,考慮得更加精細。在推薦系統中常用pointwise方法來做排序,它更直觀,易於理解,也更簡單。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"常用的排序學習算法有logistic迴歸、GBDT、Wide & Deep等,這裏對這些算法的實現原理做一個簡單描述。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.4.1 logistic迴歸模型","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"logistic迴歸是比較簡單的線性模型,通過學習用戶點擊行爲來構建CTR預估。利用logistic迴歸構建推薦算法模型,具體模型如下面公式,其中p是用戶喜歡某個標的物的概率,是權重,是需要學習的模型參數,","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是特徵i的值,特徵如上面所述,有5大類可用特徵。我們可以通過下述公式計算待推薦標的物的p值。最終我們可以按照p值的大小降序排列來對召回的標的物列表做排序。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e7/e767daa9ab93e9c155aefda1a7f19452.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在工業界,爲了更好地將該模型應用到真實業務場景中,很多公司對logistic迴歸模型做了推廣。比如用到在線實時推薦場景中做排序,有Google在2013年推廣的FTRL(見參考文獻14),以及阿里推廣的分片線性模型(見參考文獻13)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.4.2 GBDT模型","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"GBDT(Gradient Boosting Decision Tree)是一種基於迭代思路構造的決策樹算法(可以參考文獻15),該算法在實際問題中將生成多棵決策樹,並將所有樹的結果進行彙總來得到最終預測結果,該算法將決策樹與集成思想進行了有效的結合,通過將弱學習器提升爲強學習器的集成方法來提高預測精度。GBDT是一類泛化能力較強的學習算法。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2014年Facebook發表了一篇介紹將GBDT+LR(Logistic Regression)模型用於其廣告CTR預估的論文(參考文獻16),開啓了利用GBDT模型應用於搜索、推薦、廣告業務的先河。GBDT作爲一種常用的樹模型,可天然地對原始特徵進行特徵劃分、特徵組合和特徵選擇,並得到高階特徵屬性和非線性映射,從而可將GBDT模型抽象爲一個特徵處理器,通過GBDT分析原始特徵獲取到更利於LR分析的新特徵,這也正是GBDT+LR模型的核心思想——利用GBDT構造的新特徵來訓練LR模型。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.3.3 Wide & deep模型","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Wide&deep模型最早被Google提出來,並用於Android手機應用商店上APP的推薦排序。目前該算法在國內很多互聯網企業得到大規模的採用,有比較好的效果。該模型將傳統模型和深度學習模型相結合。wide部分(傳統模型,如logistic迴歸)起記憶(memorization)的作用,即從歷史數據中發現item(推薦內容)或者特徵之間的相關性,deep部分(深度學習模型)起泛化(generalization)的作用,即相關性的傳遞,發現在歷史數據中很少或者沒有出現的新的特徵組合,尋找用戶的新偏好。通過將這兩個模型結合起來可以更好地在用戶的歷史興趣和探索新的興趣點之間做到平衡。感興趣的讀者可以閱讀參考文獻12,我們在第12章中也會對該模型進行更細緻的介紹和分析。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.5 推薦算法落地需要關注的幾個問題","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前面幾節對推薦系統的算法做了初步描述,相信讀者對常用算法實現思路、怎麼用於真實產品中有了比較直觀的認識。在本節作者對算法落地中幾個重要問題加以說明,以便你可以更好地將推薦算法落地到真實業務場景中。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.5.1 推薦算法工程落地是否一定需要排序模塊","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"工業上的推薦算法一般分爲召回和排序模塊,召回的作用是從全量標的物集合(幾萬甚至上億)中將用戶可能喜歡的標的物取出來(幾百個),排序階段將召回的標的物集按照用戶點擊的可能性再做一次排序。但是排序階段不是必須的,特別是對於標的物池不大的產品及團隊資源較少的情形,沒必要一開始就開發出排序框架。召回算法一般也會對標的物做排序(如果是評分預測模型,如矩陣分解,可以按照評分大小排序,如果是概率模型,可以按照對標的物的偏好概率大小排序)。缺失了排序模塊的推薦系統可能精準度沒有那麼高,但是工程實現上相對更加簡單,可以快速落地上線。特別對於剛做推薦系統的團隊,可以讓系統快速上線,後面再逐步迭代,補全缺失模塊。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其實推薦系統中增加排序模塊主要是出於2個方面的原因:一是標的物池太大,通過召回進行初選,這樣候選集就小很多了,可以通過複雜的排序算法(如深度學習算法)精選;二是將推薦過程拆解爲兩個階段從工程實現上更加模塊化,更加可控,人員分工也更加精細化。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.5.2 推薦算法服務於用戶的兩種形式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推薦算法計算出的推薦結果可以直接插入數據庫(如Redis等),直接爲用戶提供服務,另外一種方式是將核心特徵計算好存儲下來,當用戶請求推薦業務時,推薦web服務通過簡單計算將特徵轉化爲最終給用戶的推薦結果返回給用戶。這兩種方式一個是事先計算好,拿來就用,另外一種是準備好核心數據,在請求時實時計算最終結果。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我拿餐廳服務外賣來類比說明,第一種方式是將餐廳有的菜先做好很多份,如果有外賣單過來,直接將做好的送出。第二種是將所有的配菜都準備好,接到外賣單立馬將配菜加上調料炒熟再送出去,只要配菜準備足夠好,炒菜的時間不太長並且可控,也是可以很好的服務用戶的。第一種方式是事先做好的,無法滿足用戶個性化需求,同時如果做好了沒人點的話就浪費了,第二種可以更好滿足用戶個性化需求,比如用戶說不要香菜多放辣椒就可以在現做的時候滿足。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二種方式對整個推薦系統要求更高,服務更加精細,但是第一種方案更加簡單,不過也需要更多的存儲資源(將所有用戶的推薦結果事先存下來)。在推薦系統構建的初級階段建議採用方案一。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"某些推薦業務用方案一是不可行的,比如上面的笛卡爾積範式的推薦系統,因爲用戶數乘以標的物數是一個巨大的天文數字,公司不可能有這麼多的資源將每個用戶關聯的每個標的物的推薦結果事先計算好存儲下來。我們在第21章會詳細介紹這兩種服務於用戶的推薦服務方案。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.5.3 推薦系統評估","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推薦系統是服務於公司商業目標的(盈利目標,提升用戶體驗、使用時長、DAU等,最終也是爲了盈利),所以推薦系統落地到真實業務場景中一定要定義推薦系統的優化目標,只有目標具體而清晰,並可量化,才能更好的通過不斷迭代優化推薦效果。讀者可以參考《推薦系統的商業價值》,瞭解從哪些維度定義推薦系統的商業指標。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"總結","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文對工業級推薦系統的5種範式、推薦算法依賴的數據、算法業務流程、具體召回和排序算法做了概述,希望讀者知道每類推薦範式有哪些可用的算法,各個推薦範式的應用場景,以及相關算法的實現思路。在後續章節中,我們會詳細講解主流核心算法的實現細節。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"參考文獻","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. Multi-Interest Network with Dynamic Routing for Recommendation at Tmall","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2. Deep Session Interest Network for Click-Through Rate Prediction","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3. Behavior Sequence Transformer for E-commerce Recommendation in Alibaba","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5. Personal Recommendation Using Deep Recurrent Neural Networks in NetEase","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"6. Deep Reinforcement Learning for List-wise Recommendations","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"7. Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"8. Learning Tree-based Deep Model for Recommender Systems","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"9. Item2Vec- Neural Item Embedding for Collaborative Filtering","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"10. Deep Neural Networks for YouTube Recommendations","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"11. Deep Learning based Recommender System- A Survey and New Perspectives","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"12. Wide & Deep Learning for Recommender Systems","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"13. Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"14. Ad Click Prediction- a View from the Trenches","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"15. Greedy function approximation: a gradient boosting machine","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"16. Practical Lessons from Predicting Clicks on Ads at Facebook","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"17. Google News Personalization: Scalable Online Collaborative Filtering","attrs":{}}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章