基於樸素ML思想的協同過濾算法(十七)

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"寫在前面:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大家好,我是強哥,一個熱愛分享的技術狂。目前已有 12 年大數據與AI相關項目經驗, 10 年推薦系統研究及實踐經驗。平時喜歡讀書、暴走和寫作。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業餘時間專注於輸出大數據、AI等相關文章,目前已經輸出了40萬字的推薦系統系列精品文章,今年 6 月底會出版「構建企業級推薦系統:算法、工程實現與案例分析」一書。如果這些文章能夠幫助你快速入門,實現職場升職加薪,我將不勝歡喜。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"想要獲得更多免費學習資料或內推信息,一定要看到文章最後喔。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"內推信息","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你正在看相關的招聘信息,請加我微信:liuq4360,我這裏有很多內推資源等着你,歡迎投遞簡歷。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"免費學習資料","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你想獲得更多免費的學習資料,請關注同名公衆號【數據與智能】,輸入“資料”即可!","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"學習交流羣","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你想找到組織,和大家一起學習成長,交流經驗,也可以加入我們的學習成長羣。羣裏有老司機帶你飛,另有小哥哥、小姐姐等你來勾搭!加小姐姐微信:epsila,她會帶你入羣。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們在第十六章協同過濾推薦算法中介紹了經典的item-based協同過濾推薦算法。在本章中我們會繼續介紹三種思路非常簡單樸素的協同過濾算法,這幾個算法的原理簡單,容易理解,也易於工程實現,非常適合我們快速搭建推薦算法原型,並快速上線到真實業務場景中,作爲其他更復雜算法的baseline,提供效果對比參考。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"具體來說,我們在本章中會介紹利用關聯規則、樸素貝葉斯(naive bayes)、聚類三類機器學習算法來做協同過濾推薦的方法。並且還會介紹3個核心思想基於這三類機器學習算法的工業級推薦系統,這3個推薦系統被YouTube和Google分別用於視頻和新聞推薦中(其中會介紹Google News的兩個推薦算法),這些算法在YouTube和Google News早期產品中得到採用,並且在當時情況下效果是非常不錯的,值得我們深入瞭解和學習。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"7.1 基於關聯規則的推薦算法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"關聯規則是數據挖掘領域非常經典的算法,該算法來源於一個真實的案例:“啤酒與尿布”的故事。該故事發生在20世紀90年代的美國沃爾瑪超市中,沃爾瑪的超市管理人員分析銷售數據時發現了一個令人難以置信的現象:在某些特定的情況下,“啤酒”與“尿布”兩件看上去毫無關係的商品會經常出現在同一個購物籃(用戶一次購物所買的所有商品形象地稱爲一個購物籃)中,這種獨特的銷售現象引起了管理人員的注意,經過後續調查發現,這種現象出現在年輕的父親身上。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在美國有嬰兒的家庭中,一般是母親在家中照看嬰兒,年輕的父親前去超市購買尿布。父親在購買尿布的同時,往往會順便爲自己購買啤酒,這樣就會出現啤酒與尿布這兩件看上去毫不相干的商品經常會出現在同一個購物籃的現象。沃爾瑪發現了這一獨特的現象,開始在賣場嘗試將啤酒與尿布擺放在相同的區域,讓年輕的父親可以方便地同時找到這兩件商品,並很快地完成購物。這樣做沃爾瑪超市就讓這些客戶一次購買了兩件商品、而不是一件,從而獲得了很好的商品銷售收入,這就是“啤酒與尿布”故事的由來。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面我們給出關聯規則的定義,假設","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/10/10f575e56de5600a4ae2d36db284fb2d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是所有標的物的集合(對於沃爾瑪超市來說,就是所有的商品集合)。關聯規則一般表示爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/35/3566e0fc5849bf365b05677630af9eb5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 的形式,其中","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/4c/4cb50d0ce11a7a10e9e835d0bf016326.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/0b/0b5e39df54538b7d3ff7ce20e1a254f8.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的子集,並且","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/22/22079c0bfee2a2a89dd8dc1dd28fb877.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"。關聯規則","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/35/3566e0fc5849bf365b05677630af9eb5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表示如果","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b1/b1433559f8344f0ea6e284db52323d90.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在用戶的購物籃中,那麼用戶有很大概率同時購買了","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ef/ef9bee4525663d975427bff2f1baa2ae.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"。通過定義關聯規則的度量指標,一些常用的關聯規則算法(如Apriori)能夠自動地發現所有關聯規則。關聯規則的度量指標主要有","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"支持度(support)","attrs":{}},{"type":"text","text":"和","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"置信度(confidence)","attrs":{}},{"type":"text","text":"兩個,支持度是指所有的購物籃中包含","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/33/33db192393da378bb603f67324899e66.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的購物籃的比例(即","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/4c/4cb50d0ce11a7a10e9e835d0bf016326.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時出現在一次交易中的概率),而置信度是指包含","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b1/b1433559f8344f0ea6e284db52323d90.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的購物籃中同時也包含","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ef/ef9bee4525663d975427bff2f1baa2ae.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的比例(即在","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b1/b1433559f8344f0ea6e284db52323d90.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"給定的情況下,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ef/ef9bee4525663d975427bff2f1baa2ae.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"出現的條件概率)。它們的定義如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/78/78de2f5125aec1dbe8288636c7a6e8e1.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5e/5e31009ad08de2759fefb954d787c148.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"支持度越大,包含","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/33/33db192393da378bb603f67324899e66.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的交易樣本越多,說明關聯規則","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/35/3566e0fc5849bf365b05677630af9eb5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有更多的樣本來支撐,“證據”更加充分。置信度越大,我們更有把握從包含","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b1/b1433559f8344f0ea6e284db52323d90.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的交易中推斷出該交易也包含","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ef/ef9bee4525663d975427bff2f1baa2ae.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"。關聯規則挖掘中,我們需要挖掘出支持度和置信度大於某個閾值的關聯規則,這樣的關聯規則才更可信,更有說服力,泛化能力也更強。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了關聯規則的定義,下面我們來講解怎麼將關聯規則用於個性化推薦中。對於推薦系統來說,一個購物籃即是用戶操作過的所有標的物的集合。關聯規則","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/35/3566e0fc5849bf365b05677630af9eb5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表示的意思是:如果用戶操作過","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b1/b1433559f8344f0ea6e284db52323d90.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中的所有標的物,那麼用戶很可能喜歡","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ef/ef9bee4525663d975427bff2f1baa2ae.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中的標的物。有了這些說明,那麼利用關聯規則爲用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ce/ce518b8a1c9c6904fabfef47edade938.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"生成推薦的算法流程如下(假設","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ce/ce518b8a1c9c6904fabfef47edade938.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所有操作過的標的物集合爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f5/f5fde0280c0e5db9eadae1c57b8a4807.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"):","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. 挖掘出所有滿足一定支持度和置信度(支持度和置信度大於某個常數)的關聯規則","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/35/3566e0fc5849bf365b05677630af9eb5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":";","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2. 從1中所有的關聯規則中篩選出所有滿足","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/9d/9d52f7f4f737c9f9057b3d90c47d6fd7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的關聯規則","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/35/3566e0fc5849bf365b05677630af9eb5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":";","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3. 爲用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ce/ce518b8a1c9c6904fabfef47edade938.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"生成推薦候選集,具體計算如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"      ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ff/ffa3896025e7ce0a4a1b3dd814e1ade7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"即將所有滿足2的關聯規則","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/35/3566e0fc5849bf365b05677630af9eb5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中的","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ef/ef9bee4525663d975427bff2f1baa2ae.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"合併,並剔除掉用戶已經操作過的標的物,這些標的物就是待推薦給用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ce/ce518b8a1c9c6904fabfef47edade938.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的。對於3中的候選推薦集","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/42/42f8e3d68f11b2c7306caa8e49b72efe.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",可以按照該標的物所在關聯規則的置信度的大小降序排列,對於多個關聯規則生成同樣的候選推薦標的物的,可以用置信度最大的那個關聯規則的置信度。除了可以採用置信度外,也可以用支持度和置信度的乘積作爲排序依據。對於4中排序好的標的物,可以取topN作爲推薦給用戶的推薦結果。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於關聯規則的推薦算法思路非常簡單樸素,算法也易於實現,Spark Mllib中有關聯規則的兩種分佈式實現","attrs":{}},{"type":"link","attrs":{"href":"http://spark.apache.org/docs/latest/ml-frequent-pattern-mining.html#fp-growth","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"FP-Growth","attrs":{}}]},{"type":"text","text":"和","attrs":{}},{"type":"link","attrs":{"href":"http://spark.apache.org/docs/latest/ml-frequent-pattern-mining.html#prefixspan","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"PrefixSpan","attrs":{}}]},{"type":"text","text":",大家可以直接拿來使用(關於這兩個關聯規則挖掘算法的實現細節,讀者可以閱讀參考文獻10、11、12)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"關於關聯規則算法的介紹及怎麼利用關聯規則進行個性化推薦,讀者還可以閱讀參考文獻4、5、6、7、8、9。利用關聯規則做推薦,是從用戶的過往行爲中挖掘用戶的行爲模式,並用於推薦,只用到了用戶的行爲數據,因此利用關聯規則做推薦也是一種協同過濾算法。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"7.2 基於naive bayes的推薦算法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"利用概率方法來構建算法模型爲用戶做推薦,可以將預測評分問題看成一個分類問題,將可能的評分離散化爲有限個離散值(比如1、2、3、4、5一共5個可行的分值),那麼預測用戶對某個標的物的評分,就轉化爲用戶在該標的物上的分類了(比如分爲1、2、3、4、5個類別,這裏不考慮不同類之間的有序關係)。在本節我們就利用最簡單的貝葉斯分類器來進行個性化推薦。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假設一共有k個不同的預測評分,我們記爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d0/d029b3077eaeaa602828adf1bdcd4fe4.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",所有用戶對標的物的評分構成用戶行爲矩陣","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/28/28c0efcd8cb2b85b45a681ad5deddcb1.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",該矩陣的","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/0b/0bca50881ea2309ffeb904896030f691.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"-元素記爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/55/556ff0654442219f13cff874fc33f776.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",即是用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ce/ce518b8a1c9c6904fabfef47edade938.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對標的物","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/27/2747df64c9962eabc30e110288bc1af5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的評分,取值爲評分集合","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/42/42f8e3d68f11b2c7306caa8e49b72efe.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中的某個元素。下面我們來講解怎麼利用貝葉斯公式來爲用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ce/ce518b8a1c9c6904fabfef47edade938.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"做推薦。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假設用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ce/ce518b8a1c9c6904fabfef47edade938.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有過評分的所有標的物記爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1e/1e2e97763facf19ed7c34f0f821d0317.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e4/e4ba2d2682f317d2142fe6060f88eaac.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"。現在我們需要預測用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ce/ce518b8a1c9c6904fabfef47edade938.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對未評分的標的物","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/81/81832e39a2aa788a8c71f4cc5687d095.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的評分","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d8/d8c301b61fcad4679dc43b1b4b4b0ce3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/77/77e88c3b80690657e3f48a5e21d1575c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":")。我們可以將這個過程理解爲在用戶已經有評分記錄","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1e/1e2e97763facf19ed7c34f0f821d0317.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的條件下,用戶對新標的物","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/81/81832e39a2aa788a8c71f4cc5687d095.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的評分","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d8/d8c301b61fcad4679dc43b1b4b4b0ce3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"取集合","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/42/42f8e3d68f11b2c7306caa8e49b72efe.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中某個值的條件概率:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a9/a939807f1bf2d8b0622af90df682e120.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"條件概率","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/06/06eb881946e17f1c4eaa4285dd13215c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",表示的是在事件","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6a/6a7860bd927513eb3e3e59b48de9ee2d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"發生的情況下事件","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f5/f5fde0280c0e5db9eadae1c57b8a4807.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"發生的概率,由著名的貝葉斯定理,條件概率可以通過如下公式來計算:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/38/3898ed74b47c0813d9bb195f6cf62990.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,回到我們的推薦問題,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ce/ce13daa025667649f6edda548ccd06d1.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",我們有","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/62/622a18b91784d956935c3742052c3fde.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們需要確定具體的","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fb/fb86115033b583781e510ad42aa24bec.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"值,讓上式左邊的","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a9/a939807f1bf2d8b0622af90df682e120.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的值最大,這個最大的值","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8f/8f2ebdc872d90ebd7810f31ebdd451a5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"就可以作爲用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ce/ce518b8a1c9c6904fabfef47edade938.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對未評分的標的物","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/81/81832e39a2aa788a8c71f4cc5687d095.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的評分(","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d8/d8c301b61fcad4679dc43b1b4b4b0ce3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"=","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8f/8f2ebdc872d90ebd7810f31ebdd451a5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":")。我們注意到上式中右邊的分母的值與具體的","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fb/fb86115033b583781e510ad42aa24bec.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"無關,因此右邊分子的值的大小才最終決定公式左邊的值的相對大小,基於該觀察,我們可以將上式記爲:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/3e/3efdf0f90bea89611d96e597acfa4559.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在的問題就是怎麼估計上式右邊項的值,實際上基於用戶評分矩陣,這些項的值是比較容易估計出來的,下面我們就來估計這些值。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. 估計","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7d/7d75c3000a6dde158794734702e96854.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7d/7d75c3000a6dde158794734702e96854.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其實是","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1d/1de4ba7abadcf5360e6fc1db76f482c3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的先驗概率,我們可以用對標的物","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/54/544a2986a44e691e0a184a1119e9702e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"評分爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/49/49b92b1a44586741899b90b9f37d2c46.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的用戶的比例來估計該值,即","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/61/61098f16b42a2980e5aa589873d89ed5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏分母是所有對標的物","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/81/81832e39a2aa788a8c71f4cc5687d095.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有過評分的用戶,而分子是對標的物","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/81/81832e39a2aa788a8c71f4cc5687d095.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"評分爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/49/49b92b1a44586741899b90b9f37d2c46.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的用戶。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"估計","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/38/3885c3602fd3e8ce2081110eff2cf288.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"要估計","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/38/3885c3602fd3e8ce2081110eff2cf288.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",我們需要做一個樸素的假設,即條件無關性假設:用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ce/ce518b8a1c9c6904fabfef47edade938.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所有的評分","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1e/1e2e97763facf19ed7c34f0f821d0317.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是獨立無關的,也就是不同的評分之間是沒有關聯的,互不影響(該假設就是該算法叫做naive bayes的由來)。實際上,同一用戶對不同標的物評分可能是有一定關聯的,在這裏做這個假設是爲了計算方便,在實際使用naive bayes做推薦時效果還是很不錯的,泛化能力也可以。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了條件無關性假設,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/38/3885c3602fd3e8ce2081110eff2cf288.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"就可以用如下公式來估計了:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/2b/2b23cb81a4f7e39ab34e3290dff9caf8.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/69/69c43e39d282f5bacccdd2c71f1273f5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可用所有對標的物","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/54/544a2986a44e691e0a184a1119e9702e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"評分爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/49/49b92b1a44586741899b90b9f37d2c46.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的用戶中對標的物","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/62/625fe11fcdfe277ff5ee1e0b82e51ae3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"評分爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ec/ecc054779d194a4f806b7a33a42ac988.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的比例來估計。即","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ca/ca01a319b0ef24ec70f044226b65f439.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了上面的兩個估計,那麼我們利用naive bayes計算用戶對標的物的評分概率問題最終可以表示爲","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ec/ec24965db859ecf9a5f18c5f1c7f783e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"公式1:用戶對標的物評分的概率估計","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了上式,一般來說,我們可以採用兩種方法來估計","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8d/8dd0a304c30df6acb81f0e3e787506e2.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的值。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. 類似極大似然的思路估計","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該方法就是用","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/2e/2ed42750f5567ea3d4a7cabe24ded447.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",使得","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a9/a939807f1bf2d8b0622af90df682e120.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"取值最大的p對應的","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/49/49b92b1a44586741899b90b9f37d2c46.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1d/1de4ba7abadcf5360e6fc1db76f482c3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的估計值,即","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/31/31a5aa93a34cf906d94faac1349d0f90.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該方法僅僅將用戶對標的物的評分看爲類別變量而忽略具體評分的數值,而下面的方法則利用了評分的具體數值。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採用加權平均來估計","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ce/ce518b8a1c9c6904fabfef47edade938.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對標的物","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/54/544a2986a44e691e0a184a1119e9702e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的估計","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ef/ef3aefc8da69c0f667da6d02060ffd86.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以取","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ca/cabe5d856a72918812ec4217c941a41f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中的任一值,基於上面的公式1,取每一個值都有一個概率估計","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a9/a939807f1bf2d8b0622af90df682e120.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",那麼最自然的方式是可以用這個概率作爲權重,利用加權平均來估計","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c8/c8330daacddda7697f407fa297ba1b34.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",具體的估計公式如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/4a/4af511d78ae35989431d639c6a41bc1a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了用戶對標的物評分的估計,那麼推薦就是順其自然的事情了。具體來說,我們可以計算出用戶對所有未評分標的物的估計值,再按照估計值的大小降序取topK作爲給用戶的推薦。這裏說明一下,對於採用極大似然思路的估計(即上面的第一種估計方法),因爲該方法是將評分看成類別變量,那麼肯定有很多標的物的估計值是一樣的,這時如果我們需要再區別這些評分一樣的標的物的話,可以採用估計該值時的概率大小再進行二次排序。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採用貝葉斯方法來做推薦會存在一些問題,具體來說,我們在估計","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7d/7d75c3000a6dde158794734702e96854.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"和估計","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/38/3885c3602fd3e8ce2081110eff2cf288.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"時,由於樣本數據稀疏,導致無法進行估計或者估計值不夠魯棒性的問題。比如在估計","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7d/7d75c3000a6dde158794734702e96854.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"時,我們用公式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/61/61098f16b42a2980e5aa589873d89ed5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"來估計,從該式可以看到,如果無用戶或者很少用戶對標的物","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/81/81832e39a2aa788a8c71f4cc5687d095.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有評分(這種情況是存在的,如","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/81/81832e39a2aa788a8c71f4cc5687d095.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是新加入的標的物或者是冷門標的物),這時可能會出現用  ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/dd/dd878cc6534016e300a2742223eef033.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 來估計的情況,即使不是","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/dd/dd878cc6534016e300a2742223eef033.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",當分子分母都很小時,估計值波動會很大,不夠魯棒,對估計結果影響很大。一般我們可以採用拉普拉斯平滑(Laplacian smoothing)的方法來處理,得到更加穩定的估計值。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對上面這個例子,我們來說下怎麼利用拉普拉斯平滑來處理:假設","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1d/1da0a3e6e3bfe1044f786ab028e19ced.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是對標的物","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/81/81832e39a2aa788a8c71f4cc5687d095.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"評分分別爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e8/e8c3e3f48b063dc75a8f817d23f49ad1.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的用戶數,那麼上面的估計就是","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/36/36f57954ec3eac5a20523f56ce7646d3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"增加拉普拉斯平滑後的估計公式就是","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b1/b17af01eba0bb80f0e645ae8e969b7dc.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從這個公式可以看出,當沒有用戶對標的物","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/81/81832e39a2aa788a8c71f4cc5687d095.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"評過分時,就用","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/3e/3ed596c8138fe9b87243fdbd1e4d74b7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"來估計,這是在沒有已知信息的情況下比較合理的估計。上式中","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/07/075b484a80fde9e9a75b3cdf599af16d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是光滑化因子,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/07/075b484a80fde9e9a75b3cdf599af16d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"值越大,估計越光滑(魯棒性越好),這時公式對數據就不夠敏感。對於","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/38/3885c3602fd3e8ce2081110eff2cf288.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的估計,也可以採用一樣的方法,這裏不再贅述。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"到此爲止,我們講完了怎麼利用naive bayes來爲用戶做推薦的方法,該方法也是隻利用了用戶的操作行爲矩陣,所以也是一種協同過濾算法。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"naive bayes方法是一個非常簡單直觀的方法,工程實現也非常容易,也易於並行化。它對噪音有一定的“免疫力”,不太會受到個別評分不準的影響,並且也不易於過擬合(個人覺得前面介紹的條件無關性假設是泛化能力強的主要原因),一般情況下預測效果還不錯,並且當用戶行爲不多時,也可以使用(需要利用拉普拉斯平滑來處理),而不像矩陣分解等算法,需要大量的用戶行爲才能進行推薦。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"讀者可以從從參考文獻13、14、15、16中瞭解更多關於怎麼利用貝葉斯及其他概率方法來做推薦的方案。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"7.3 基於聚類的推薦算法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於聚類來做推薦有兩種可行的方案,一種是將用戶聚類,另外一種是將標的物聚類。下面來簡單描述一下怎麼基於這兩種聚類來做推薦。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"7.3.1 基於用戶聚類的推薦","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果我們將所有用戶聚類了,就可以將該用戶所在類別的其他用戶操作過的標的物(但是該用戶沒有操作行爲)推薦給該用戶。具體計算公式如下,其中","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f8/f8a96beda26f47dfeeeeda787343463e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是給用戶u的推薦,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c6/c6bb184a9ca6777b008ecb594f385adb.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是用戶所在的聚類,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ac/ac34fa6802ecef1ad881eb76e49141fc.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"、","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/4b/4b49d6e8dfe81bc88d07f5c21ee003e7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分別是用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a7/a782d6c46919465c474437c36468ccbe.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"、","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ce/ce518b8a1c9c6904fabfef47edade938.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的操作歷史集合。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a4/a4ea8cc28805ac4e769fdf17a77d0402.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼怎麼對用戶聚類呢?可行的方案主要有如下幾類:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"(1) 基於用戶的人口統計學特徵對用戶聚類","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶的年齡、性別、地域、家庭組成、學歷、收入等信息都可以作爲一個特徵,類別特徵可以採用one-hot編碼,所有特徵最終都可以轉化爲數值的,最終獲得用戶特徵的向量表示,通過Kmeans聚類算法來對用戶聚類。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"(2) 基於用戶行爲對用戶聚類","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"比如採用矩陣分解就可以獲得用戶的嵌入表示,用戶操作行爲矩陣的行向量也是用戶的一種向量表示,再利用Kmeans對用戶進行聚類。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"(3) 基於社交關係對用戶聚類","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果是社交產品,用戶之間的社交鏈條可以構成一個用戶關係圖,該社交圖中所有的聯通區域就形成了用戶的一種聚類。這種推薦其實就是將你的好友喜歡的標的物推薦給你。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"7.3.2 基於標的物聚類的推薦","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果有了標的物的聚類,推薦就好辦了。從用戶歷史行爲中的標的物所在的類別挑選用戶沒有操作行爲的標的物推薦給該用戶,這種推薦方式是非常直觀自然的。具體計算公式如下,其中","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f8/f8a96beda26f47dfeeeeda787343463e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是給用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/55/55f53a249a23d52ca0fb54de99a6a96d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的推薦,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7b/7b407c68ca521579f54061f76b81cf0e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是用戶的歷史操作行爲集合,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6a/6a0e7d428c6c5baa7549f6f597f2426f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是標的物","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/85/85d7ed0729c0a85d691aad61038d8f7d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所在的聚類。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8e/8ea259bd8eb4f2246a4e99ecf0165195.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時,有了標的物聚類,我們還可以做標的物關聯標的物的關聯推薦,具體做法是將標的物A所在類別中的其他標的物作爲關聯推薦結果。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼怎麼對標的物聚類呢?可行的方法有利用標的物的metadata信息,採用TF-IDF、LDA、Word2vec等方式獲得標的物的向量表示,再利用Kmeans聚類。具體的實現細節這裏不介紹,感興趣的讀者可以自行搜索相關材料做深入學習,作者在第五章《基於內容的推薦算法》中也做了比較詳細的介紹。另外,也可以基於用戶的歷史操作行爲,獲得標的物的嵌入表示(矩陣分解、item2vec等算法),用戶行爲矩陣的列向量也是標的物的一種向量表示。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"參考文獻17、18、19、20有更多關於怎麼用聚類來做推薦的算法,感興趣的讀者可以參考學習。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"到此爲止,我們講完了基於關聯規則、naive bayes、聚類做個性化推薦的方法。下面我們就基於這幾個方法的思想來介紹三個工業級的推薦引擎,供大家學習參考,同時也希望藉助這幾個工業級推薦系統的介紹讓大家加深對這三個方法的思路的理解。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"7.4 YouTube基於關聯規則思路的視頻推薦算法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"YouTube基於關聯規則思路的視頻推薦算法(見參考文獻1)建立在一個基本假設基礎之上:如果用戶喜歡種子視頻","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7d/7dae360e30c2ae88e8361579fd5c5def.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",那麼用戶喜歡與種子視頻","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7d/7dae360e30c2ae88e8361579fd5c5def.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"相似的候選視頻","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c1/c16b999355de919ec6c68f83f14be424.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的概率一定很大,候選視頻","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c1/c16b999355de919ec6c68f83f14be424.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7d/7dae360e30c2ae88e8361579fd5c5def.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"越相似,那麼用戶喜歡候選視頻","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c1/c16b999355de919ec6c68f83f14be424.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的概率也越大。那麼剩下就是怎麼解決這兩個問題了:一是怎麼計算兩個視頻的相似度,二是怎麼選擇種子視頻,下面我們來分別介紹。同時,我們也會介紹最終怎麼給用戶生成個性化推薦。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"7.4.1 計算兩個視頻的相似度(關聯度)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該算法是利用關聯規則的思路,在一定時間內(比如24小時內)統計兩兩被用戶同時播放過的視頻對","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/be/bea12e797c1b852eb00b88566fcf8da1.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",將播放次數計爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/aa/aa34d8d2b59d55b38278e7e65e5474d0.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",那麼候選視頻","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c1/c16b999355de919ec6c68f83f14be424.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7d/7dae360e30c2ae88e8361579fd5c5def.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的相似度可以表示如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e7/e7042221e3714874fc5823af015ea028.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f6/f6b74fcb97d71ceab7c15fcb8ef198e8.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是一個歸一化常數,會綜合考慮種子視頻","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7d/7dae360e30c2ae88e8361579fd5c5def.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與候選視頻","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c1/c16b999355de919ec6c68f83f14be424.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的“全局流行度”,如果我們分別記","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6d/6d458838d5589376c044e8710db4b163.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"、","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ea/eaac4ca97c4e94546d139327bd5b557e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲視頻","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7d/7dae360e30c2ae88e8361579fd5c5def.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"、","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c1/c16b999355de919ec6c68f83f14be424.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在上面的一段時間內總的播放次數。那麼我們可以定義","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/61/6197d9d468abc64e08c41b3b1a31f7b5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該歸一化函數是非常直觀簡單的,用其他歸一化函數也是可以的。如果用該歸一化函數的話,對所有候選視頻","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c1/c16b999355de919ec6c68f83f14be424.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"來說,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6d/6d458838d5589376c044e8710db4b163.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是一樣的,所以可以忽略,其實我們是用候選視頻的“全局流行度”","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ea/eaac4ca97c4e94546d139327bd5b557e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"來歸一化。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ea/eaac4ca97c4e94546d139327bd5b557e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在分母中,這說明","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ea/eaac4ca97c4e94546d139327bd5b557e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"越大的視頻,與種子視頻","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7d/7dae360e30c2ae88e8361579fd5c5def.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的相似度會更小,該歸一化方法更加偏向於偏冷門的候選視頻。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面只是一個非常簡單的描述和計算公式,我們也可以將視頻的metadata、觀看時間等信息整合進來計算相似度。另外,還需要處理髒的播放行爲數據。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"7.4.2 基於單個種子視頻生成候選視頻集","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於相似度計算公式","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c1/c11ca2ccbf9c889ba0fe64dd0c5fe4d8.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",我們可以選擇出種子視頻","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7d/7dae360e30c2ae88e8361579fd5c5def.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的最相似的topN候選集","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a2/a22a20d87fbb62ee9267b46cb32b9dbf.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",一般我們會確定一個最小的閾值,需要相似度大於該閾值纔會選出來,這也是爲了避免選擇很多隻有很少播放量的視頻(這時種子視頻和候選視頻被同時播放的次數也很小)導致推薦結果太差。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於上述從種子視頻選擇候選視頻的規則,我們可以構建一個有向圖。將所有視頻集合構成一個有向圖,對於任何一個視頻對","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/be/bea12e797c1b852eb00b88566fcf8da1.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",如果","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a9/a9a82d14a60740e96d8daffbf93430be.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(候選視頻","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c1/c16b999355de919ec6c68f83f14be424.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在種子視頻","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7d/7dae360e30c2ae88e8361579fd5c5def.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的相似視頻列表集中),那麼存在一條從","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7d/7dae360e30c2ae88e8361579fd5c5def.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"到","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c1/c16b999355de919ec6c68f83f14be424.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的邊,邊的權重爲這兩個視頻的相似度","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c1/c11ca2ccbf9c889ba0fe64dd0c5fe4d8.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"利用該方式爲種子視頻生成的視頻候選集,可以作爲視頻的關聯推薦,爲種子視頻推薦相關的視頻。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"7.4.3 基於用戶行爲爲用戶生成推薦候選集","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面講到了單個視頻怎麼生成候選集,那麼對於單個用戶也是很容易採用上面的方式生成推薦候選集的。我們可以將用戶播放過的視頻或者明確表示過喜歡的視頻作爲初始種子視頻集。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於初始種子集","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/42/42f8e3d68f11b2c7306caa8e49b72efe.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",我們可以採用如下方式來生成候選集:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於上面視頻的有向圖解釋,我們可以沿着集合","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/42/42f8e3d68f11b2c7306caa8e49b72efe.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的有向邊向外拓展,對於任意的種子視頻","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7d/7dae360e30c2ae88e8361579fd5c5def.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/2f/2fd82b8b38d3a5bffa54ce6929085c8d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"),我們考慮它的相關視頻集","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a2/a22a20d87fbb62ee9267b46cb32b9dbf.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"。我們將所有通過這種方式拓展出的視頻集記爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/06/063f17cebc4cb7b182738b10e232ebac.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",那麼我們有","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/49/49fab68263c5ad4a9db908b14cb62abb.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一般情況下,我們計算出","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/06/063f17cebc4cb7b182738b10e232ebac.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"就足夠獲得比較多的、效果還可以的、有一定多樣性的推薦候選集了。但實際上,通過這種方式生成的推薦候選集種類比較狹窄,跟用戶的興趣太相似。這種方式雖然生成了用戶可能感興趣的視頻,但是可能用戶沒有太多驚喜,會讓用戶沉浸在比較小的視頻範圍內,就像進入了一個漩渦中,無法發現更大更精彩的世界。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了拓展用戶的候選推薦集的空間,解決上述越推越窄的問題,我們可以沿着種子視頻集","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/42/42f8e3d68f11b2c7306caa8e49b72efe.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所在的視頻有向圖進行n次向外拓展(用圖論的術語,就是n次傳遞閉包),我們記","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/01/018e89b840196237917dd30cee1df40a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲通過種子集","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/42/42f8e3d68f11b2c7306caa8e49b72efe.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中的某個種子視頻通過不超過n次路由可達的所有視頻組成的集合,那麼我們有","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c2/c2b222ec07f1ec1777b7c95e978fe48e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"注意,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/26/2610cbb553a884e23fca6e4ab0b60808.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",上面的公式也跟前面提到的","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/06/063f17cebc4cb7b182738b10e232ebac.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的公式是兼容的。那麼我們最終生成的推薦候選集就是","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/2a/2a27fba5c3fc7920462fc728ac918662.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一般N很小(拓展很少的幾步)時就可以獲得非常多具備多樣性的推薦結果,即使對種子集很小的用戶也是如此。通過上述路徑拓展,我們可以爲每個候選的推薦視頻關聯一個種子視頻(通過上述拓展,從某個種子視頻到候選視頻的路徑就將候選視頻關聯到了種子視頻),該種子視頻既可用於後面的推薦結果排序,還可以作爲推薦解釋(例如如果候選視頻","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c1/c16b999355de919ec6c68f83f14be424.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是通過種子視頻","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7d/7dae360e30c2ae88e8361579fd5c5def.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"獲得的,我們可以用“因爲你喜歡/看過","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7d/7dae360e30c2ae88e8361579fd5c5def.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"”來作爲推薦解釋語,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7d/7dae360e30c2ae88e8361579fd5c5def.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c1/c16b999355de919ec6c68f83f14be424.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是通過多步相似鏈接在一起的,它們多少是有一些相似性的,用戶從這兩個視頻中是可以直接感知到這種相似的,所以該推薦理由是有一定說服力的)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"7.4.4 推薦結果排序","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過上面7.4.3中的介紹,我們可以爲用戶生成推薦候選集了,那麼這些候選集中的視頻怎麼推薦給用戶呢?也即我們怎麼對這些候選視頻排序呢?我們需要將用戶更願意點擊的排在前面。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們可以從視頻質量、用戶對視頻的偏好度、多樣性等幾個維度來考慮。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"視頻質量是指視頻本身的吸引力,比如視頻的海報清晰度、視頻播放次數、視頻被點讚的次數、視頻被轉載的次數、視頻被收藏的次數等。這些不同維度的視頻質量,我們可以通過打分獲得一個固定的質量得分,具體打分方式可以多種多樣,這裏不再細說。我們這裏記視頻","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/11/111d380bcea903d6659fe965616f0437.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的質量爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/70/707b45b3dc6a33ba782451de2d519dab.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過前面的介紹我們知道每個候選視頻是通過某個種子沿着有向圖通過若干步的拓展獲得的。如下圖,假設候選視頻","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/39/3904351a36189360ba84adf76d22ec27.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是通過種子視頻","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/79/79df767e5ca7d65ab529e4f68bf49dbf.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經過k步獲得,下圖中箭頭上方的","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/9b/9bb641d69bf9e8227258e729b2a1da04.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是相鄰兩個視頻的相似度。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/07/077b1cf3171fba200dfcfb395aa0364a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼用戶對視頻","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/39/3904351a36189360ba84adf76d22ec27.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的偏好度可以用如下公式來計算","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b6/b69f24666a4b38ced3309c25546cbf94.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7c/7c188113708f2be571f40f1437fa899b.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是用戶對視頻","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/79/79df767e5ca7d65ab529e4f68bf49dbf.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的偏好度及它自身的受歡迎程度,我們可以通過用戶播放","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/79/79df767e5ca7d65ab529e4f68bf49dbf.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的時長或者","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/79/79df767e5ca7d65ab529e4f68bf49dbf.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總播放量等數值來度量。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果從","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/79/79df767e5ca7d65ab529e4f68bf49dbf.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"到","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/39/3904351a36189360ba84adf76d22ec27.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有多條路徑,可以選擇最短的路徑,這是因爲我們是一步一步拓展獲取候選集的,當某個視頻被某一步拓展到了,後面再拓展到它時,前面已經計算了,所以在這一步就忽略掉。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了上面介紹的候選視頻得分及用戶對視頻的偏好度,我們可以用下面公式來計算用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/55/55f53a249a23d52ca0fb54de99a6a96d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對視頻","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/39/3904351a36189360ba84adf76d22ec27.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最終的評分:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/0b/0bc0ba7fe13c692e042dd73de61f8bc4.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過將所有候選視頻按照上述公式降序排列就可以得到候選視頻的排序了。利用上面公式,距離用戶種子視頻集","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/42/42f8e3d68f11b2c7306caa8e49b72efe.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"越遠,通過公式","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/3e/3e93f2d033b09d3e1e66b8f6eec41281.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 的相似性連乘得到的值也越小,那麼就有可能排在後面了。那麼我們怎樣解決這個問題獲得多樣性呢?一般我們可以通過限制由同一種子視頻生成的候選視頻的數量來獲得多樣性(因爲同一種子視頻生成的視頻多少是有一些相似度的),或者限制由同一渠道產生視頻數量(比如由同一個用戶上傳的視頻)。通過在上述排好序的推薦列表中,剔除掉部分由同一種子視頻生成的視頻或者同一渠道產生的視頻,就可以獲得最終的推薦結果。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面介紹的就是YouTube基於視頻被用戶共同觀看的次數獲得視頻之間的相似度,進而通過視頻相似度傳遞獲得最終推薦的方法。該方法本質上就是關聯規則的思路,只用到了用戶的播放行爲信息,因此也是一種協同過濾算法。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"7.5 Google News基於貝葉斯框架的推薦算法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前面7.2節簡單講了怎麼利用naive bayes算法來爲用戶生成個性化推薦的一般思路,在這一節我們講解Google News利用貝葉斯框架來做推薦的具體方法。Google的這篇文章(參考文獻2)採用另外的思路,基於用戶過去看新聞的歷史行爲利用貝葉斯框架來預測用戶當前對新聞的興趣,再結合協同過濾來做推薦。下面我們來講解該篇文章的核心思想。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"7.5.1 基於用戶過去的行爲來分析用戶的興趣點","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先將所有新聞按照事先確定好的類別分成若干類(主題):","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ef/ef9bc778a25c30535de42dc87abe5ebe.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",如“世界”、“體育”、“娛樂”等類別。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先計算某個用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ce/ce518b8a1c9c6904fabfef47edade938.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在某段時間週期","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/11/11e8aca6190029b09993a8df73cdd04f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(比如按照一個月一個週期等)內的點擊行爲在上述類別上的分佈,記爲","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d0/d0af070680a56f3bd97ef511923f92b5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"公式1:用戶u在時間週期t內的行爲在新聞主題上的分佈","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/01/0164404b5bc457d4ef358b6d1e96a89b.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"代表用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ce/ce518b8a1c9c6904fabfef47edade938.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在時間週期","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/11/11e8aca6190029b09993a8df73cdd04f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"內點擊主題類別","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6d/6d458838d5589376c044e8710db4b163.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的次數。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d4/d4a7f7490f4f0106d5152202f097afaa.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是該用戶在這段時間週期內點擊新聞的總數量。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/99/9909d542f9674d761fbf8bef9b4140e1.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表示用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ce/ce518b8a1c9c6904fabfef47edade938.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在時間週期","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/11/11e8aca6190029b09993a8df73cdd04f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"內在各個新聞主題類別上的時間花費分佈,反映了用戶的興趣分佈。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"新聞是有地域差異性的,同樣地,類似單個用戶的興趣偏好分佈,我們可以統計某個國家或者某個地區的所有用戶點擊行爲的整體在時間週期","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/11/11e8aca6190029b09993a8df73cdd04f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"內在上述新聞主題上的分佈。我們將該分佈記爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c9/c9fce73c2cc61b598f10061ae92efd89.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"。計算方法和上面單個用戶一樣,將該國家或地區所有用戶當成一個整體來計算。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該文章通過大量的數據分析,最終得到如下4個結論,後面的貝葉斯框架也是基於這幾個結論來展開的。細節的分析讀者可以閱讀參考文獻2。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(1) 用戶的興趣確實隨着時間變化;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(2) 公衆對新聞的點擊分佈反映了興趣的發展趨勢,並受到重大事件(如世界盃等)的影響;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(3) 不同國家/地區對新聞的偏好是不一樣的,存在不同的發展趨勢;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(4) 從某種程度上說,單個用戶的興趣變化趨勢是受到該用戶所在國家/地區的新聞趨勢變化影響的;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了上面的基本概念和初步數據分析得到的結論作爲基礎,下面我們來說明怎麼用貝葉斯框架來爲用戶的興趣建模。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"7.5.2 利用貝葉斯框架來建模用戶的興趣","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們可以將用戶的興趣分爲兩種:一種是用戶的真實興趣,一種是會受到國家/地區大的興趣趨勢影響的臨時興趣。用戶的真實興趣是由用戶的性別、年齡、受教育程度、專業等決定的,它是相對穩定的,不太會隨着時間而急劇變化。另一方面,當用戶覺得要讀什麼新聞時,是會受到用戶所在區域新聞趨勢的影響的,這種影響是短期的,是隨着時間變化的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於用戶點擊行爲模式和用戶所在地羣體的行爲模式,可以通過貝葉斯框架很好地預測用戶當前對新聞的興趣,具體可以通過如下三個步驟來獲得用戶的當前興趣。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"(1) 預測用戶在某個時間週期內的真正興趣","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於特定時間週期","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/11/11e8aca6190029b09993a8df73cdd04f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"內的某個用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ce/ce518b8a1c9c6904fabfef47edade938.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/99/9909d542f9674d761fbf8bef9b4140e1.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是用戶在所有新聞主題上的點擊分佈,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c9/c9fce73c2cc61b598f10061ae92efd89.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是用戶所在地域整體用戶的興趣分佈,代表的是興趣趨勢。我們期望學習用戶的從","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/99/9909d542f9674d761fbf8bef9b4140e1.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"呈現出的而不會受到","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c9/c9fce73c2cc61b598f10061ae92efd89.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"干擾的真實興趣。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們用","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/06/0695d31b4e03595312f1ccd971a13227.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"來表示用戶在新聞主題","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6d/6d458838d5589376c044e8710db4b163.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上的真實興趣,它是一個條件概率,表示在新聞主題爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6d/6d458838d5589376c044e8710db4b163.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的條件下,用戶進行點擊的概率。利用貝葉斯公式,我們可以採用下式來計算","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/06/0695d31b4e03595312f1ccd971a13227.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/41/4121995498205c1ab1b3b12ff91f7316.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"公式2:用戶點擊屬於主題","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6d/6d458838d5589376c044e8710db4b163.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的文章的概率","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上式中","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/07/0740d669ce82a10b7d284aa48a739a8d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是用戶點擊的所有新聞中該新聞屬於主題","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6d/6d458838d5589376c044e8710db4b163.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的概率,它可以從用戶的點擊分佈","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/99/9909d542f9674d761fbf8bef9b4140e1.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中估算出(參見上面的公式1,可以用","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/99/9909d542f9674d761fbf8bef9b4140e1.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的第i個分量來估算)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5d/5d935e80c2453c95b6fb816aeec86dbe.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是新聞屬於主題","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6d/6d458838d5589376c044e8710db4b163.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的先驗概率,也就是在時間週期","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/11/11e8aca6190029b09993a8df73cdd04f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"內所有發佈的新聞屬於類別","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6d/6d458838d5589376c044e8710db4b163.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的比例,它與用戶所在地域的新聞變化趨勢相關,如有有更多的有關主題","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6d/6d458838d5589376c044e8710db4b163.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的新聞事件發生,那麼關於主題","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6d/6d458838d5589376c044e8710db4b163.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的新聞就會越多。我們可以用整體分佈","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c9/c9fce73c2cc61b598f10061ae92efd89.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中的第i個分量來近似估計","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5d/5d935e80c2453c95b6fb816aeec86dbe.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5c/5c8140b53c9fc6345448b8880c400615.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是用戶點擊任何一個類別的文章的先驗概率,與具體的文章主題無關。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從上面公式2知道,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/06/0695d31b4e03595312f1ccd971a13227.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表示用戶對主題","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6d/6d458838d5589376c044e8710db4b163.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的感興趣程度,不同於該地區其他用戶的興趣。如果用戶讀了很多體育類的新聞,而很多其他用戶也讀了體育類新聞,這可能是有一些體育相關熱點事件引起的。相反,如果該用戶閱讀了大量體育新聞而該地區其他用戶很少讀體育新聞,這就代表的是用戶真的對體育感興趣。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"(2) 結合用戶在不同時間週期的興趣,獲得用戶精確的與時間無關的真實興趣","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面(1)中計算了用戶在一個時間週期","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"內的興趣偏好,我們可以將用戶在過去統計週期內所有時間週期的興趣歸併起來獲得用戶綜合的對新聞類別的真實興趣偏好,具體參見下面公式的計算邏輯。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/4e/4e899406ddcadbe9fd235c04722cc997.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/4e/4e899406ddcadbe9fd235c04722cc997.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上式中","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/9e/9eb5f16df9d10f8e4d32386603a6d40a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是用戶在時間週期","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/11/11e8aca6190029b09993a8df73cdd04f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"內總的新聞播放量。我們可以假設用戶在所有時間週期內點擊一篇新聞的先驗概率是固定不變的,也即假設上式中的","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c4/c4d882f2951d4d3a496f6cee5f2e7aa7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與時間週期","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/11/11e8aca6190029b09993a8df73cdd04f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"無關,我們記爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e1/e13042d36dcba643f2b7aa9088284e70.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"。那麼上式可以改寫爲下面的","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/65/65c2312ac858017017b80f30f48e0318.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"公式3:用戶在過去統計週期內所有時間段的綜合興趣","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面的公式3就是用戶的真實興趣,該興趣其實是用戶在多個時間週期內的興趣的某種加權平均。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"(3) 結合用戶真實興趣和當前的新聞趨勢,預測用戶當前的興趣","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如前面所說,用戶的興趣可以分解爲兩部分,一部分是用戶長期的的真實興趣,另外一部分是受到當前趨勢影響的短期興趣。(1)、(2)基於用戶過去的點擊播放行爲計算出了用戶的長期的真實興趣。爲了度量用戶當前對新聞的短期興趣,我們用用戶所在地域的所有用戶在一個較短時間段(比如過去一個小時)的整體點擊分佈來刻畫(用","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/29/299165c09284ff58330d00b0bf4b52c4.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"來表示),由於所在地域有大量用戶,在這段較短時間內也有足夠多的數據來準確計算出哪些新聞主題是熱門的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們的最終目標是預測用戶在將來一段時間的點擊分佈,同樣地,我們可以用貝葉斯公式得到下面的計算公式:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/09/09de797bfbe6b86428104f17d948b95c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們用用戶的真實興趣","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e0/e0426b9e90d0468cedbdc10677ae64c2.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(公式3)來估計","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/dc/dc087d2a66753eef9635cacebe1cadb3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",並且假設用戶點擊任何一篇新聞的概率爲常數,不受時間影響(即假設","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/10/1055dd12983b5c06d0e8bdf6494d4b25.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"=","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e7/e75c70ea57b3aaee155253f91d8819fb.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"),那麼上式就可以表示爲(將公式3代入進來)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d4/d418d08d9bc8ae7abde252fc36ff54f4.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們可以在上式中加上一個虛擬點擊項,它跟當前新聞趨勢的概率分佈","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c1/c134c11f0e434d95c65752cd9fb109f5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同分布,那麼用戶在未來短時間內對新聞的興趣偏好概率最終變爲","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7a/7a908144ca516e687940fd02d131a03f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"公式4:用戶當前對新聞主題的興趣偏好概率","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上式中的","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d2/d2a688087c1d85aec102e6fee7fb6d0f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"就是虛擬點擊數(在參考文獻2中取值爲10),它可以看成是一個光滑化的因子,當某個用戶只有非常少的點擊歷史時,這時是用戶當前的新聞趨勢(","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ee/ee3477f4ce71893a8d1f766b3d993219.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":")來預估該用戶的點擊概率,這在沒有太多該用戶歷史數據的情況下是一個合理的估計。當","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7c/7c7a7b79351ff3c51248d98ccc9ae79c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"遠遠大於","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d2/d2a688087c1d85aec102e6fee7fb6d0f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"時,上式就可以忽略","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d2/d2a688087c1d85aec102e6fee7fb6d0f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",還原爲用戶真實的點擊分佈預估。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上述預估用戶將來短時間內的興趣分佈的方法的一個重要優點是,我們可以增量地計算用戶的點擊分佈。我們可以將過去每個時間週期","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/11/11e8aca6190029b09993a8df73cdd04f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對應的","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b6/b6feb409afa05e093607793382d8241c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"和","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6f/6fbb66310067cb02f2beed3e6a4b4cd7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的值事先存起來,當更新用戶的興趣偏好概率時,只需要將最近一個時間週期的值計算出,利用上面的公司4及預先存下來的過去時間段的值就可以得到用戶最新的興趣偏好分佈。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"7.5.3 爲用戶做個性化推薦","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了對推薦候選集進行排序獲得最終的推薦結果,該推薦算法計算出兩個統計量:一個是","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d1/d1fff0932d64af440746e1f69233870c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",稱之爲信息過濾得分,另外一個是","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/81/8146c5fa738688e3cc99a6490bd283d0.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",即協同過濾得分(利用協同過濾算法預測的用戶對新聞的得分,可以利用參考文獻3中的方法得到)。其中","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6d/6d458838d5589376c044e8710db4b163.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的計算過程是這樣的,先獲得該文章的類別","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6d/6d458838d5589376c044e8710db4b163.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",再基於上面的公式4得到用戶對類別","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6d/6d458838d5589376c044e8710db4b163.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的的偏好概率,該值作爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1a/1acd5c87caa2fb38fbf2a3f0226a3729.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的值。我們將這兩個得分相乘,最終利用如下的公式來計算用戶對某個新聞的興趣得分。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b5/b5afc00e35e230005f2f83d811cc3ef3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最終基於上述公式計算出該用戶對所有新聞的得分,取得分最高的topN作爲最終的推薦結果。該方法通過在Google news上驗證,比單獨採用協同過濾有更好的預測效果。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該方法利用用戶過去及用戶所在地區的點擊行爲來預測用戶當前對新聞的偏好概率,再與協同過濾結合進行最終的推薦。預測用戶興趣偏好概率的過程只用到了用戶及用戶所在地域全體用戶對新聞的點擊數據,因此也是一種協同過濾方法。細節的實現方案,讀者可以閱讀參考文獻2進行進一步瞭解。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"7.6 Google News基於用戶聚類的推薦算法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"參考文獻3中利用了3種模型來預估一個用戶對一個新聞的評分,最終通過加權平均獲得用戶對新聞的最終評分,其中第三種方法covisitation的思想我們在第6章《協同過濾推薦算法》第6節“近實時協同過濾算法的工程實現“有詳細講解,本質上是一種關聯規則的思路,我們這裏不再介紹。另外兩種算法分別是基於MinHash和PLSI聚類的方法,在這裏我們只介紹MinHash算法,PLSI算法讀者可以自行閱讀參考文獻3來了解。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"7.6.1 基於MinHash聚類","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/55/55f53a249a23d52ca0fb54de99a6a96d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",他的點擊歷史記爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b7/b780870e063041266d218b631d7f52f9.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"。那麼兩個用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/83/83420fe338ed63e05e679d085981a8e0.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的相似度可以用Jaccard係數來表示:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f1/f1d9daa149033398ec80d1f65ecc1b87.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了兩個用戶之間的相似度,我們可以非常形式化地將相似用戶點擊過的新聞通過上面的相似度加權推薦給用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/55/55f53a249a23d52ca0fb54de99a6a96d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"。但是用戶數和新聞數都是天文數字,無法在極短時間內爲大量用戶完成計算。這時,我們可以採用LSH(","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"L","attrs":{}},{"type":"text","text":"ocality ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"S","attrs":{}},{"type":"text","text":"ensitive ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"H","attrs":{}},{"type":"text","text":"ashing)的思路來計算,大大減少計算複雜度。LSH的核心思想是,對於一組數據(對於我們這個場景來說,這一組數據就是用戶的歷史點擊記錄","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b7/b780870e063041266d218b631d7f52f9.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"),我們可以用一組哈希函數來獲得哈希值,如果兩組數據非常相似,那麼哈希值衝突的概率也越大,而衝突率是等於這兩組數據的相似度的。當我們用Jaccard係數來計算用戶點擊歷史的相似度時,這時的LSH就叫做MinHash。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們先來說明怎麼爲一個用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/55/55f53a249a23d52ca0fb54de99a6a96d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"計算hash值,所有新聞集爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/78/78c588fc149ffe8d6365f4b38665ad20.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",我們對集合","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/78/78c588fc149ffe8d6365f4b38665ad20.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"進行一次隨機置換,置換後的有序集合記爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/78/783e3499179eb7972f407ce02bdd7027.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c3/c3ed9252758810d3cbfc729a687e788e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",即一共有m+1條新聞),我們計算用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/55/55f53a249a23d52ca0fb54de99a6a96d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的哈希值爲","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/70/70dba63b692c08ac479eff5746efa59c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上式表示的意思是,按順序從左到右數,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b7/b780870e063041266d218b631d7f52f9.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中第一個包含在","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b7/b780870e063041266d218b631d7f52f9.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中的元素的下標就是","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/55/55f53a249a23d52ca0fb54de99a6a96d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的哈希值。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以證明(有興趣的讀者可以自行證明,或者參考相關參考文獻),兩個用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/90/9074ce89113e8f747f12edbab4c27bbb.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的哈希值衝突(值一樣)的概率就是","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/90/9074ce89113e8f747f12edbab4c27bbb.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用Jaccard係數計算的相似度。所以我們可以將MinHash認爲是一種概率聚類方法,對應的哈希值就是一個類:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5d/5d1efef6236f76e75d58e8797c2a937d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",所有哈希值爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e3/e3462e37dca54767653c18038dcb59c5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的用戶聚爲一類","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c0/c037246f0e339da9457d218e2a3ac96e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們可以取","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fb/fb86115033b583781e510ad42aa24bec.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"個置換(即","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fb/fb86115033b583781e510ad42aa24bec.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"個哈希函數),將這","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fb/fb86115033b583781e510ad42aa24bec.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"個哈希函數得到的哈希值拼接起來,那麼兩個用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/90/9074ce89113e8f747f12edbab4c27bbb.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fb/fb86115033b583781e510ad42aa24bec.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"個拼接的哈希值一樣的概率就等於","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/30/30411ed4e7b6aa17410657c916752ef6.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"。很顯然,通過將","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fb/fb86115033b583781e510ad42aa24bec.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"個哈希值拼接起來得到的聚類更精細(","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fb/fb86115033b583781e510ad42aa24bec.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"個哈希值都相等的用戶肯定更少,所以類","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c0/c037246f0e339da9457d218e2a3ac96e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"更小,更精細),並且相似度也會更高。從找用戶最近的領域(最相似的用戶)的角度來看,將","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fb/fb86115033b583781e510ad42aa24bec.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"個哈希值拼接起來的方法的精準度更高,但是召回率更低(聚類更小了,因此召回的新聞更少了,所以一般召回率就降低了),爲了提升召回率,獲得更多的相似用戶,我們可以將這個過程並行進行","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fc/fcde43a58f0f99371169978b01097a80.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"次,最終每個用戶獲得","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fc/fcde43a58f0f99371169978b01097a80.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"個聚類,相當於進行了","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fc/fcde43a58f0f99371169978b01097a80.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"次召回,在論文中作者建議","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fb/fb86115033b583781e510ad42aa24bec.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"取2到4,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fc/fcde43a58f0f99371169978b01097a80.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"取10到20。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在實際的工程實踐上,由於新聞數量太大,進行置換操作耗時很大,可以採用簡化的思路(精度會打折扣),對於上面提到的","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/38/380197618fcc8a3bbfd5149d990d281f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"個哈希函數(置換),我們事先取","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/38/380197618fcc8a3bbfd5149d990d281f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"個獨立的隨機種子值與之一一對應,每個隨機種子就是哈希函數的替代。對於每個新聞及每個種子值,我們利用該種子值和該新聞的Id(是整數值),利用一個特定的哈希函數","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c6/c6bb184a9ca6777b008ecb594f385adb.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"計算出一個哈希值,該哈希值作爲利用前面講的通過置換後該新聞的下標,那麼採用這種方法後用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/55/55f53a249a23d52ca0fb54de99a6a96d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的哈希計算公式如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b4/b47522388ff3f85420baef58842e5d98.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該方法不需要對新聞集進行置換,會大大減少計算量,並且只要這個特定哈希函數","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c6/c6bb184a9ca6777b008ecb594f385adb.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的取值範圍在0到","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/24/2421b1f2e7bbd9d01aa35cd3998a08c6.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"之間,那麼衝突概率就會比較小,通過這種近似的MinHash值與真正的MinHash值性質近似。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面講解了MinHash聚類的算法實現細節,具體工程實現時是可以用Hadoop或者Spark等分佈式計算平臺來並行計算的,這裏不細說。我們可以將每個用戶的聚類及每個聚類包含的用戶用倒排索引表的格式存儲起來,方便後面做推薦時查閱。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"7.6.2 基於聚類爲用戶做推薦","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了用戶的聚類,我們可以採用如下步驟來爲單個用戶生成個性化推薦,每個用戶的推薦策略是一樣的,所以可以採用分佈式計算平臺Spark等工具來並行化處理。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(1)  計算出用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/55/55f53a249a23d52ca0fb54de99a6a96d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對新聞","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/85/85d7ed0729c0a85d691aad61038d8f7d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的評分","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(2) 計算出用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/55/55f53a249a23d52ca0fb54de99a6a96d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對所有新聞的評分","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(3) 將所有新聞評分降序排列,取topK作爲該用戶的推薦","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面的(2)、(3)是非常簡單的處理,我們重點來說一下(1),怎麼計算用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/55/55f53a249a23d52ca0fb54de99a6a96d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對新聞","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/85/85d7ed0729c0a85d691aad61038d8f7d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的評分。首先我們可以得到用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/55/55f53a249a23d52ca0fb54de99a6a96d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所屬的所有類別","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/63/63cd7f3081339785220e1858f3cfd670.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",對於每個類別","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6a/6a37920d3e6468af62c1a9de73f258ea.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":",取出該類別中所有的用戶對新聞","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/85/85d7ed0729c0a85d691aad61038d8f7d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的點擊次數之和(我們可以事先將每個類別中用戶點擊過的新聞及次數存儲起來,方便查找),再除以該類別所有點擊之和,得到該類別對新聞","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/85/85d7ed0729c0a85d691aad61038d8f7d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的評分,那麼用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/55/55f53a249a23d52ca0fb54de99a6a96d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所屬的類別對新聞","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/85/85d7ed0729c0a85d691aad61038d8f7d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的總評分爲:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/9e/9e757ebf2053de8d667ded3ae9b61f30.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏的","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/be/be367d506b1ddaed3b24c5f833c7c66d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"即是剛剛提到的類","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b6/b6f8d94c92537d44931e7c475aa400fa.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中所有用戶對新聞","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/85/85d7ed0729c0a85d691aad61038d8f7d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"點擊次數之和,","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c3/c347ed887b6aa2c83534e3cde1b3de49.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是類","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b6/b6f8d94c92537d44931e7c475aa400fa.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所有用戶的所有點擊次數之和。我們用上式計算出的","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5f/5f9d6288735b7de534116ba9243f5cd5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"來表示用戶","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/55/55f53a249a23d52ca0fb54de99a6a96d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對新聞","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/85/85d7ed0729c0a85d691aad61038d8f7d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"的評分。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"至此,基於作者自己的理解,簡單介紹完了Google News基於用戶聚類的推薦算法。該方法也只用到了用戶及其他用戶的新聞點擊行爲,因此也是一種協同過濾算法,該算法的細節讀者可以閱讀參考文獻3。參考文獻17、18、19、20有更多關於怎麼用聚類來做推薦的算法,感興趣的讀者可以參考學習。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"總結","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本章講解了關聯規則、樸素貝葉斯、聚類等三類基礎機器學習算法怎麼用於個性化推薦的理論知識,同時從算法原理和工程實現的角度簡單總結了YouTube和Google News的三篇分別採用關聯規則、樸素貝葉斯、聚類思路來做推薦的論文。這幾篇論文有很強的工程指導意義,值得讀者學習。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然這些算法原理簡單、容易理解,看起來不那麼高大上,但是這些算法卻在工業界有過非常好的應用,在當時算是非常優秀的算法。這些算法現在可能看起來太簡單了,也可能不會用在現在的推薦系統上,但它們樸素的思想下面蘊含的是深刻的道理,值得我們推薦從業者學習、思考、借鑑,希望讀者可以很好地理解它們,並吸收這些樸素思想背後的精華。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"參考文獻","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. The YouTube Video Recommendation System","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2. Personalized News Recommendation Based on Click Behavior ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3. Google news personalization: Scalable online collaborative flitering","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4. Robustness of collaborative recommendation based on association rule mining","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5. Analysis of recommendation algorithms for e-commerce","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"6. Fast algorithms for mining association rules","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"7. Efficient adaptive-support association rule mining for recommender systems","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"8. Mining navigation history for recommendation","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"9. Collaborative filtering by mining association rules from user acess sequences","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"10. Mining frequent patterns without candidate generation","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"11. Pfp: parallel fp-growth for query recommendation","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"12. Mining sequential patterns by pattern-growth: the PrefixSpan approach","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"13. Empirical analysis of predictive algorithms for collaborative filtering","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"14. A Bayesian model for collaborative filtering","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"15. Collaborative filtering with the simple Bayesian classifier","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"16. Probabilistic memory-based collaborative filtering","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"17. Clustering methods for collaborative filtering","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"18. Rectree: A efficient collaborative filtering method","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"19. Scalable collaborative filtering using cluster-based smoothing","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"20. Latent class models for collaborative filtering","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章