LFM隱語義模型梯度下降法矩陣分解

前情提要:某BAT大牛親授個性化推薦算法???

關於這個玩意我看了下,網上搜索的都是導向其他小衆連接,很多水軍,註冊購買才能看,真是垃圾。

原是在慕課網的,現在都過去一年啦,還是那個價格,很多公開的要麼是代碼不全,要麼沒有視頻,或者根本沒法看。

更有欺騙者,寫着百度網盤下載,其實點擊根本不是百度網盤,是其他小網站。

有的下載後要解壓密碼,我去年買了個表,要麼給要麼不給,真像bitch

正文:

看了LFM矩陣分解,大牛的做法是將評分大於等於4的視爲偏好喜歡,小於則視爲不喜歡,從而設置標籤1和0,這種做法看似要進行分類或借鑑分類的方法來做,其實不是,大牛又採用了餘弦距離用於user/item向量,使得結果恰好在[0,1]之間,這種做法總感覺不是那麼好。選用數據集爲movielens,也不知道是哪個版本,100k,1M,10M??反正與現在官網的存儲格式都不一樣。

我的做法:既然評分是1~5之間的正數,何不將餘弦距離再乘以5,或者將評分進行最大值歸一化。

但結果有點不大好,且看如下:迭代1000次

對於user id爲24的推薦結果爲:

['Distinguished Gentleman, The (1992)', 'Comedy']
['Burnt Offerings (1976)', 'Horror']
['Gods Must Be Crazy, The (1980)', 'Comedy']
['Rocketship X-M (1950)', 'Sci-Fi']
["She's All That (1999)", 'Comedy|Romance']
['Sesame Street Presents Follow That Bird (1985)', "Children's|Comedy"]
['Congo (1995)', 'Action|Adventure|Mystery|Sci-Fi']
['Hard Rain (1998)', 'Action|Thriller']
["Nightmare on Elm Street Part 2: Freddy's Revenge, A (1985)", 'Horror']
['Cape Fear (1991)', 'Thriller']
['Piano, The (1993)', 'Drama|Romance']
['Big Hit, The (1998)', 'Action|Comedy']
['Peanuts - Die Bank zahlt alles (1996)', 'Comedy']
['Eat Drink Man Woman (1994)', 'Comedy|Drama']
["Farmer's Wife, The (1928)", 'Comedy']
['Secrets & Lies (1996)', 'Drama']
['Unzipped (1995)', 'Documentary']
['South Pacific (1958)', 'Musical|Romance|War']

可以看到comedy居多。然而事實上的理論數據爲drama居多,真是令人悲哀

["Who's Afraid of Virginia Woolf? (1966)", 'Drama']
['Deer Hunter, The (1978)', 'Drama|War']
["One Flew Over the Cuckoo's Nest (1975)", 'Drama']
['Raiders of the Lost Ark (1981)', 'Action|Adventure']
['Silence of the Lambs, The (1991)', 'Drama|Thriller']
['Good Morning, Vietnam (1987)', 'Comedy|Drama|War']
['Sling Blade (1996)', 'Drama|Thriller']
['This Is Spinal Tap (1984)', 'Comedy|Drama|Musical']
['Room with a View, A (1986)', 'Drama|Romance']
['Star Wars: Episode IV - A New Hope (1977)', 'Action|Adventure|Fantasy|Sci-Fi']
['Casablanca (1942)', 'Drama|Romance|War']
['Wizard of Oz, The (1939)', "Adventure|Children's|Drama|Musical"]
['Caddyshack (1980)', 'Comedy']
['Terms of Endearment (1983)', 'Comedy|Drama']
['Out of Africa (1985)', 'Drama|Romance']
['Gone with the Wind (1939)', 'Drama|Romance|War']
['Godfather, The (1972)', 'Action|Crime|Drama']
['Breakfast Club, The (1985)', 'Comedy|Drama']

然而user_id=204似乎看起來不錯

#rec:most are comedy and drama
['Cold Comfort Farm (1995)', 'Comedy']
['Sheltering Sky, The (1990)', 'Drama']
['Little Princess, The (1939)', "Children's|Drama"]
['Fast Times at Ridgemont High (1982)', 'Comedy']
['8 Seconds (1994)', 'Drama']
['Cérémonie, La (1995)', 'Drama']
["Farmer's Wife, The (1928)", 'Comedy']
['Postman, The (1997)', 'Drama']
['Selena (1997)', 'Drama|Musical']
['Mortal Kombat: Annihilation (1997)', 'Action|Adventure']
['Crazy in Alabama (1999)', 'Comedy|Drama']
['Nights of Cabiria (Le Notti di Cabiria) (1957)', 'Drama']
['Whipped (2000)', 'Comedy']
['Character (Karakter) (1997)', 'Drama']
['Waterboy, The (1998)', 'Comedy']
['Interiors (1978)', 'Drama']
["Angela's Ashes (1999)", 'Drama']
['Grease 2 (1982)', 'Comedy|Musical|Romance']
['Holiday Inn (1942)', 'Comedy|Musical']
['Raw Deal (1948)', 'Film-Noir']

#true_result
['Bridge on the River Kwai, The (1957)', 'Drama|War']
['Stand by Me (1986)', 'Adventure|Comedy|Drama']
['Ghost (1990)', 'Comedy|Romance|Thriller']
['Terminator 2: Judgment Day (1991)', 'Action|Sci-Fi|Thriller']
['Meet the Parents (2000)', 'Comedy']
['Groundhog Day (1993)', 'Comedy|Romance']
['Star Wars: Episode V - The Empire Strikes Back (1980)', 'Action|Adventure|Drama|Sci-Fi|War']
['Princess Bride, The (1987)', 'Action|Adventure|Comedy|Romance']
['Raiders of the Lost Ark (1981)', 'Action|Adventure']
['Young Frankenstein (1974)', 'Comedy|Horror']
['Ben-Hur (1959)', 'Action|Adventure|Drama']
['Awakenings (1990)', 'Drama']
['Indiana Jones and the Last Crusade (1989)', 'Action|Adventure']
['Wizard of Oz, The (1939)', "Adventure|Children's|Drama|Musical"]
['Life Is Beautiful (La Vita è bella) (1997)', 'Comedy|Drama']
['Gone with the Wind (1939)', 'Drama|Romance|War']
['Independence Day (ID4) (1996)', 'Action|Sci-Fi|War']
['Godfather, The (1972)', 'Action|Crime|Drama']
['Few Good Men, A (1992)', 'Crime|Drama']

沒比較那麼多,似乎好壞各半。下面試試將評分最大值歸一化,儘管看起來一樣,但誰能絕對保證結果一樣呢,驗證才知道。

並設置如果loss-loss<1e-15,那麼不再迭代。

然而結果是loss並沒降低到那麼小,而且比較大,6e-4,要知道這是歸一化後的啊,還沒之前的大數的loss小,至少1e-14

但結果似乎看起來不錯,如下24和204

#rec for 24
['Excess Baggage (1997)', 'Adventure|Romance']
['Little Mermaid, The (1989)', "Animation|Children's|Comedy|Musical|Romance"]
['Cutthroat Island (1995)', 'Action|Adventure|Romance']
['Ogre, The (Der Unhold) (1996)', 'Drama']
['American Psycho (2000)', 'Comedy|Horror|Thriller']
['Black Sabbath (Tre Volti Della Paura, I) (1963)', 'Horror']
['Broken Vessels (1998)', 'Drama']
['Fear and Loathing in Las Vegas (1998)', 'Comedy|Drama']
['Pompatus of Love, The (1996)', 'Comedy|Drama']
['With Friends Like These... (1998)', 'Comedy']
['Amityville Horror, The (1979)', 'Horror']
['Curse of the Puppet Master (1998)', 'Horror|Sci-Fi|Thriller']
['Dancing at Lughnasa (1998)', 'Drama']
['Braindead (1992)', 'Comedy|Horror']
['Nil By Mouth (1997)', 'Drama']
['Meet Joe Black (1998)', 'Romance']
['Birdy (1984)', 'Drama|War']
['Karate Kid, Part II, The (1986)', 'Action|Adventure|Drama']
['Teenage Mutant Ninja Turtles III (1993)', "Action|Children's|Fantasy"]
['Price of Glory (2000)', 'Drama']

#rec for 204
['Children of Heaven, The (Bacheha-Ye Aseman) (1997)', 'Drama']
['Carnal Knowledge (1971)', 'Drama']
['Carmen Miranda: Bananas Is My Business (1994)', 'Documentary']
['Emma (1996)', 'Comedy|Drama|Romance']
['Life Less Ordinary, A (1997)', 'Romance|Thriller']
['On Any Sunday (1971)', 'Documentary']
['Oscar and Lucinda (a.k.a. Oscar & Lucinda) (1997)', 'Drama|Romance']
['Face in the Crowd, A (1957)', 'Drama']
['Santa Fe Trail (1940)', 'Drama|Romance|Western']
['Stranger Than Paradise (1984)', 'Comedy']
['Soft Toilet Seats (1999)', 'Comedy']
['Killing Fields, The (1984)', 'Drama|War']
['Alligator (1980)', 'Action|Horror|Sci-Fi']
['Stepford Wives, The (1975)', 'Sci-Fi|Thriller']
['Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)', 'Drama']
['Absent Minded Professor, The (1961)', "Children's|Comedy|Fantasy"]
['Pretty Woman (1990)', 'Comedy|Romance']
['Homegrown (1998)', 'Comedy|Thriller']
['Shop Around the Corner, The (1940)', 'Comedy|Romance']
['Knock Off (1998)', 'Action']

這種東西真是玄幻。

小明哥點評:

1.總體上來說LFM矩陣分解方法對於冷啓動問題不能解決,這是矩陣分解方法都面臨的問題,另外還有正負樣本均衡問題。對於冷啓動問題大多采用其他策略來解決,如果是用戶冷啓動,可採用熱點推薦,也可分析用戶畫像採用協同方法。若是item冷啓動可採用本身的屬性,或者做基於內容的推薦。其他也有很多。正負樣本均衡問題是否可用userCF解決呢?我覺得可以

2.細節問題,所謂隱特徵,就是不知道有多少個(潛在)特徵,也可能特徵之間也不是孤立的,而是相互關聯的,似乎和多分類有點像。這種就是經驗問題了,20,50,挨個嘗試??不妥,過多過少都不好。

3.在推薦中,不考慮1的問題,即所有的user和item都已經做好了向量,那麼推送時是將每個item與user做餘弦距離,這種來做的話似乎沒有用到召回??直接從所有的數據中選出來所有的東西,根本不考慮所謂的召回率,因爲都進行了比較,也就是查全率【樣本中的正例有多少被預測爲正例】肯定是100%,但P值就差了遠了,然後再做sort排序,這特麼和我當初的感覺是一樣的,推薦就是倆函數,一個recall,一個sort????一臉矇蔽。所以這種方法可能會耗時較長,而且矩陣維度大的話也做不到實時推薦,當然用戶的興趣點也不可能變化很快。

 

推薦閱讀:另一種LFM矩陣分解方法

另外有相關問題可以加入QQ羣討論,不設微信羣

QQ羣:868373192 

語音圖像視頻深度-學習羣

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章