LFM隐语义模型梯度下降法矩阵分解

前情提要:某BAT大牛亲授个性化推荐算法???

关于这个玩意我看了下,网上搜索的都是导向其他小众连接,很多水军,注册购买才能看,真是垃圾。

原是在慕课网的,现在都过去一年啦,还是那个价格,很多公开的要么是代码不全,要么没有视频,或者根本没法看。

更有欺骗者,写着百度网盘下载,其实点击根本不是百度网盘,是其他小网站。

有的下载后要解压密码,我去年买了个表,要么给要么不给,真像bitch

正文:

看了LFM矩阵分解,大牛的做法是将评分大于等于4的视为偏好喜欢,小於则视为不喜欢,从而设置标签1和0,这种做法看似要进行分类或借鉴分类的方法来做,其实不是,大牛又采用了余弦距离用于user/item向量,使得结果恰好在[0,1]之间,这种做法总感觉不是那么好。选用数据集为movielens,也不知道是哪个版本,100k,1M,10M??反正与现在官网的存储格式都不一样。

我的做法:既然评分是1~5之间的正数,何不将余弦距离再乘以5,或者将评分进行最大值归一化。

但结果有点不大好,且看如下:迭代1000次

对于user id为24的推荐结果为:

['Distinguished Gentleman, The (1992)', 'Comedy']
['Burnt Offerings (1976)', 'Horror']
['Gods Must Be Crazy, The (1980)', 'Comedy']
['Rocketship X-M (1950)', 'Sci-Fi']
["She's All That (1999)", 'Comedy|Romance']
['Sesame Street Presents Follow That Bird (1985)', "Children's|Comedy"]
['Congo (1995)', 'Action|Adventure|Mystery|Sci-Fi']
['Hard Rain (1998)', 'Action|Thriller']
["Nightmare on Elm Street Part 2: Freddy's Revenge, A (1985)", 'Horror']
['Cape Fear (1991)', 'Thriller']
['Piano, The (1993)', 'Drama|Romance']
['Big Hit, The (1998)', 'Action|Comedy']
['Peanuts - Die Bank zahlt alles (1996)', 'Comedy']
['Eat Drink Man Woman (1994)', 'Comedy|Drama']
["Farmer's Wife, The (1928)", 'Comedy']
['Secrets & Lies (1996)', 'Drama']
['Unzipped (1995)', 'Documentary']
['South Pacific (1958)', 'Musical|Romance|War']

可以看到comedy居多。然而事实上的理论数据为drama居多,真是令人悲哀

["Who's Afraid of Virginia Woolf? (1966)", 'Drama']
['Deer Hunter, The (1978)', 'Drama|War']
["One Flew Over the Cuckoo's Nest (1975)", 'Drama']
['Raiders of the Lost Ark (1981)', 'Action|Adventure']
['Silence of the Lambs, The (1991)', 'Drama|Thriller']
['Good Morning, Vietnam (1987)', 'Comedy|Drama|War']
['Sling Blade (1996)', 'Drama|Thriller']
['This Is Spinal Tap (1984)', 'Comedy|Drama|Musical']
['Room with a View, A (1986)', 'Drama|Romance']
['Star Wars: Episode IV - A New Hope (1977)', 'Action|Adventure|Fantasy|Sci-Fi']
['Casablanca (1942)', 'Drama|Romance|War']
['Wizard of Oz, The (1939)', "Adventure|Children's|Drama|Musical"]
['Caddyshack (1980)', 'Comedy']
['Terms of Endearment (1983)', 'Comedy|Drama']
['Out of Africa (1985)', 'Drama|Romance']
['Gone with the Wind (1939)', 'Drama|Romance|War']
['Godfather, The (1972)', 'Action|Crime|Drama']
['Breakfast Club, The (1985)', 'Comedy|Drama']

然而user_id=204似乎看起来不错

#rec:most are comedy and drama
['Cold Comfort Farm (1995)', 'Comedy']
['Sheltering Sky, The (1990)', 'Drama']
['Little Princess, The (1939)', "Children's|Drama"]
['Fast Times at Ridgemont High (1982)', 'Comedy']
['8 Seconds (1994)', 'Drama']
['Cérémonie, La (1995)', 'Drama']
["Farmer's Wife, The (1928)", 'Comedy']
['Postman, The (1997)', 'Drama']
['Selena (1997)', 'Drama|Musical']
['Mortal Kombat: Annihilation (1997)', 'Action|Adventure']
['Crazy in Alabama (1999)', 'Comedy|Drama']
['Nights of Cabiria (Le Notti di Cabiria) (1957)', 'Drama']
['Whipped (2000)', 'Comedy']
['Character (Karakter) (1997)', 'Drama']
['Waterboy, The (1998)', 'Comedy']
['Interiors (1978)', 'Drama']
["Angela's Ashes (1999)", 'Drama']
['Grease 2 (1982)', 'Comedy|Musical|Romance']
['Holiday Inn (1942)', 'Comedy|Musical']
['Raw Deal (1948)', 'Film-Noir']

#true_result
['Bridge on the River Kwai, The (1957)', 'Drama|War']
['Stand by Me (1986)', 'Adventure|Comedy|Drama']
['Ghost (1990)', 'Comedy|Romance|Thriller']
['Terminator 2: Judgment Day (1991)', 'Action|Sci-Fi|Thriller']
['Meet the Parents (2000)', 'Comedy']
['Groundhog Day (1993)', 'Comedy|Romance']
['Star Wars: Episode V - The Empire Strikes Back (1980)', 'Action|Adventure|Drama|Sci-Fi|War']
['Princess Bride, The (1987)', 'Action|Adventure|Comedy|Romance']
['Raiders of the Lost Ark (1981)', 'Action|Adventure']
['Young Frankenstein (1974)', 'Comedy|Horror']
['Ben-Hur (1959)', 'Action|Adventure|Drama']
['Awakenings (1990)', 'Drama']
['Indiana Jones and the Last Crusade (1989)', 'Action|Adventure']
['Wizard of Oz, The (1939)', "Adventure|Children's|Drama|Musical"]
['Life Is Beautiful (La Vita è bella) (1997)', 'Comedy|Drama']
['Gone with the Wind (1939)', 'Drama|Romance|War']
['Independence Day (ID4) (1996)', 'Action|Sci-Fi|War']
['Godfather, The (1972)', 'Action|Crime|Drama']
['Few Good Men, A (1992)', 'Crime|Drama']

没比较那么多,似乎好坏各半。下面试试将评分最大值归一化,尽管看起来一样,但谁能绝对保证结果一样呢,验证才知道。

并设置如果loss-loss<1e-15,那么不再迭代。

然而结果是loss并没降低到那么小,而且比较大,6e-4,要知道这是归一化后的啊,还没之前的大数的loss小,至少1e-14

但结果似乎看起来不错,如下24和204

#rec for 24
['Excess Baggage (1997)', 'Adventure|Romance']
['Little Mermaid, The (1989)', "Animation|Children's|Comedy|Musical|Romance"]
['Cutthroat Island (1995)', 'Action|Adventure|Romance']
['Ogre, The (Der Unhold) (1996)', 'Drama']
['American Psycho (2000)', 'Comedy|Horror|Thriller']
['Black Sabbath (Tre Volti Della Paura, I) (1963)', 'Horror']
['Broken Vessels (1998)', 'Drama']
['Fear and Loathing in Las Vegas (1998)', 'Comedy|Drama']
['Pompatus of Love, The (1996)', 'Comedy|Drama']
['With Friends Like These... (1998)', 'Comedy']
['Amityville Horror, The (1979)', 'Horror']
['Curse of the Puppet Master (1998)', 'Horror|Sci-Fi|Thriller']
['Dancing at Lughnasa (1998)', 'Drama']
['Braindead (1992)', 'Comedy|Horror']
['Nil By Mouth (1997)', 'Drama']
['Meet Joe Black (1998)', 'Romance']
['Birdy (1984)', 'Drama|War']
['Karate Kid, Part II, The (1986)', 'Action|Adventure|Drama']
['Teenage Mutant Ninja Turtles III (1993)', "Action|Children's|Fantasy"]
['Price of Glory (2000)', 'Drama']

#rec for 204
['Children of Heaven, The (Bacheha-Ye Aseman) (1997)', 'Drama']
['Carnal Knowledge (1971)', 'Drama']
['Carmen Miranda: Bananas Is My Business (1994)', 'Documentary']
['Emma (1996)', 'Comedy|Drama|Romance']
['Life Less Ordinary, A (1997)', 'Romance|Thriller']
['On Any Sunday (1971)', 'Documentary']
['Oscar and Lucinda (a.k.a. Oscar & Lucinda) (1997)', 'Drama|Romance']
['Face in the Crowd, A (1957)', 'Drama']
['Santa Fe Trail (1940)', 'Drama|Romance|Western']
['Stranger Than Paradise (1984)', 'Comedy']
['Soft Toilet Seats (1999)', 'Comedy']
['Killing Fields, The (1984)', 'Drama|War']
['Alligator (1980)', 'Action|Horror|Sci-Fi']
['Stepford Wives, The (1975)', 'Sci-Fi|Thriller']
['Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)', 'Drama']
['Absent Minded Professor, The (1961)', "Children's|Comedy|Fantasy"]
['Pretty Woman (1990)', 'Comedy|Romance']
['Homegrown (1998)', 'Comedy|Thriller']
['Shop Around the Corner, The (1940)', 'Comedy|Romance']
['Knock Off (1998)', 'Action']

这种东西真是玄幻。

小明哥点评:

1.总体上来说LFM矩阵分解方法对于冷启动问题不能解决,这是矩阵分解方法都面临的问题,另外还有正负样本均衡问题。对于冷启动问题大多采用其他策略来解决,如果是用户冷启动,可采用热点推荐,也可分析用户画像采用协同方法。若是item冷启动可采用本身的属性,或者做基于内容的推荐。其他也有很多。正负样本均衡问题是否可用userCF解决呢?我觉得可以

2.细节问题,所谓隐特征,就是不知道有多少个(潜在)特征,也可能特征之间也不是孤立的,而是相互关联的,似乎和多分类有点像。这种就是经验问题了,20,50,挨个尝试??不妥,过多过少都不好。

3.在推荐中,不考虑1的问题,即所有的user和item都已经做好了向量,那么推送时是将每个item与user做余弦距离,这种来做的话似乎没有用到召回??直接从所有的数据中选出来所有的东西,根本不考虑所谓的召回率,因为都进行了比较,也就是查全率【样本中的正例有多少被预测为正例】肯定是100%,但P值就差了远了,然后再做sort排序,这特么和我当初的感觉是一样的,推荐就是俩函数,一个recall,一个sort????一脸蒙蔽。所以这种方法可能会耗时较长,而且矩阵维度大的话也做不到实时推荐,当然用户的兴趣点也不可能变化很快。

 

推荐阅读:另一种LFM矩阵分解方法

另外有相关问题可以加入QQ群讨论,不设微信群

QQ群:868373192 

语音图像视频深度-学习群

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章