movie_recommendation_spark1

mllib建立推薦模型

數據準備

  • 數據包含在ml-100k的文件夾中,文件夾中比較重要的幾個文件是u.user(用戶屬性數據集)、u.item(電影元數據)和u.data(用戶對電影的評分數據)
  • (1)u.user數據的每列分別代表用戶ID、年齡、性別、職業、郵政編碼,其分隔符爲“|”;
  • (2)u.item數據包含的列爲電影ID、電影名、上映日期及其它一些屬性信息,分隔符也爲“|”;
  • (3)u.data數據包含用戶ID、電影ID、評分(1-5分)和時間戳,分隔符爲製表符(\t)
  • 其他數據的說明可以從README獲取
rawData = sc.textFile('hdfs://master:9000/ml-100k/u.data')
rawData.take(5)
[u'196\t242\t3\t881250949',
 u'186\t302\t3\t891717742',
 u'22\t377\t1\t878887116',
 u'244\t51\t2\t880606923',
 u'166\t346\t1\t886397596']
rawRatings = rawData.map(lambda line: line.split('\t'))
rawRatings.take(5)
[[u'196', u'242', u'3', u'881250949'],
 [u'186', u'302', u'3', u'891717742'],
 [u'22', u'377', u'1', u'878887116'],
 [u'244', u'51', u'2', u'880606923'],
 [u'166', u'346', u'1', u'886397596']]

在mllib的recommendation模塊中,提供了一個類Rating,用於將數據轉化爲用於ALS算法的指定結構,轉化的過程如下:

from pyspark.mllib.recommendation import Rating
ratings = rawRatings.map(lambda line: Rating(int(line[0]),
                                             int(line[1]),
                                             float(line[2])
                                            )
                        )
ratings.take(5)
[Rating(user=196, product=242, rating=3.0),
 Rating(user=186, product=302, rating=3.0),
 Rating(user=22, product=377, rating=1.0),
 Rating(user=244, product=51, rating=2.0),
 Rating(user=166, product=346, rating=1.0)]

轉化後的RDD是由Rating對象構成的,從結果中可以看出,Rating對象包含了三個值,user、product和rating,即用戶、產品和打分。原始評分數據中第四列爲時間戳,在本例中用不到,因此被拋棄,Rating類只能接受三個參數。該對象的使用也很簡單,要提取指定的值,只需要使用原點“.”加屬性名即可,也可以使用索引,將要引用的成員的索引號以方括號跟在對象後面即可

r_1 = ratings.first()
print r_1.user, r_1.product, r_1.rating
print r_1[0], r_1[1], r_1[2]
196 242 3.0
196 242 3.0

建模

from pyspark.mllib.recommendation import ALS
cf_model = ALS.train(ratings, 50, 10, 0.01, nonnegative=False, seed=12345)
cf_model
<pyspark.mllib.recommendation.MatrixFactorizationModel at 0x7fd1a8023390>

訓練後的模型爲一個MatrixFactorizationModel對象,該對象提供的方法可用於提取因子矩陣和進行預測,比如:

cf_model.userFeatures().first()
(2,
 array('d', [0.8257154226303101, -0.08174031972885132, -0.4485216736793518, 0.2816902697086334, 0.281324565410614, 0.2871280014514923, -0.28037557005882263, -0.5780994892120361, 0.04380865767598152, -0.03685721382498741, 0.33663856983184814, 0.8575121164321899, -0.26763004064559937, -0.22665703296661377, -0.030370648950338364, -0.4087982177734375, 0.28470417857170105, 0.17012149095535278, -0.46445152163505554, -0.39363399147987366, 0.4133472442626953, 0.0196047555655241, -0.6278623342514038, 0.8203023672103882, 0.36110371351242065, -0.3623308539390564, 0.07974052429199219, 0.3489876985549927, 0.009540693834424019, -0.1018930971622467, -0.3096586763858795, -0.08348742127418518, 0.546208918094635, 0.14119906723499298, -0.11057484149932861, 0.003356723114848137, -0.42252105474472046, 0.5306751728057861, 0.18785302340984344, 0.30044302344322205, -0.017208704724907875, 0.4387732148170471, -0.06367648392915726, 0.1654045730829239, 0.28026890754699707, -0.18949449062347412, -0.17139069736003876, -0.24911031126976013, 0.05620288848876953, -0.48843708634376526]))
cf_model.productFeatures().take(1)
[(2,
  array('d', [0.7962743639945984, 0.17431776225566864, -0.1990462988615036, -1.1859782934188843, -0.3959435522556305, -0.5246215462684631, 0.5594768524169922, -1.0115996599197388, 0.16964347660541534, 0.5268467664718628, -0.25127914547920227, -0.6580895185470581, 0.5533314943313599, 0.2781536877155304, -0.8546806573867798, 0.003281824290752411, -0.1445930451154709, -0.4302116930484772, -0.9390072226524353, -0.012757998891174793, -0.2912135422229767, -0.1968940943479538, -1.0604552030563354, 0.8730130195617676, 0.20659275352954865, -1.2206825017929077, -1.2894175052642822, 0.3126126825809479, 0.3025003671646118, 0.3809513449668884, -0.6017365455627441, 0.46676522493362427, -0.17819972336292267, 0.03601028397679329, 0.5260732769966125, -0.3788948357105255, -1.3027641773223877, -0.08637615293264389, -0.22254005074501038, 1.1796964406967163, 0.7695205807685852, 0.420034259557724, -0.31719499826431274, -0.43826058506965637, 0.7229039669036865, 0.1820073425769806, 0.3955117464065552, 0.4843427240848541, 0.5735178589820862, -1.1487818956375122]))]

預測

cf_model.predict(123, 456)
0.5189201089615622

一般來說我們不會關心某個具體用戶對某部具體電影的評分,而是希望能對預測的用戶評分進行排序,此時需要使用predictAll方法,該方法接受“用戶-產品”類型的RDD,返回所有預測評分,比如我們可以先生成一個RDD,該RDD包含用戶123所有評分過的電影:

r_123 = ratings.filter(lambda r: r.user == 123)
user_product_pairs = r_123.map(lambda r: (r.user, r.product))
user_product_pairs.take(5)
[(123, 427), (123, 531), (123, 135), (123, 192), (123, 13)]

預測123所有評分

est_123 = cf_model.predictAll(user_product_pairs)
est_123.take(5)
[Rating(user=123, product=14, rating=4.781807508293159),
 Rating(user=123, product=192, rating=4.8440853953154654),
 Rating(user=123, product=64, rating=3.189367663814919),
 Rating(user=123, product=432, rating=4.826974367452291),
 Rating(user=123, product=480, rating=3.4997134540180883)]

因爲所有預測的評分都是用戶123已經確實評分過的,我們可以將兩個結果放在一起,方便比較:

left = r_123.map(lambda r: (r.product, r.rating))
right = est_123.map(lambda r: (r.product, r.rating))
left.join(right).take(5)
[(704, (3.0, 3.12740470516471)),
 (64, (3.0, 3.189367663814919)),
 (132, (3.0, 3.255692030148917)),
 (192, (5.0, 4.8440853953154654)),
 (288, (3.0, 2.6833371635987526))]

recommendProducts和recommendUsers推薦產品和用戶

topK_users = cf_model.recommendUsers(456, 5)
topK_users
[Rating(user=534, product=456, rating=4.915272594291057),
 Rating(user=620, product=456, rating=4.554502950952971),
 Rating(user=462, product=456, rating=4.548246109196231),
 Rating(user=283, product=456, rating=4.514564326007004),
 Rating(user=56, product=456, rating=4.2624704017243324)]
topK_movies = cf_model.recommendProducts(123, 5)
topK_movies
[Rating(user=123, product=287, rating=6.238110881748787),
 Rating(user=123, product=269, rating=6.168910818255823),
 Rating(user=123, product=515, rating=5.954825359796975),
 Rating(user=123, product=416, rating=5.921099652606939),
 Rating(user=123, product=503, rating=5.670748725720875)]

連接電影名稱

movies = sc.textFile('hdfs://master:9000/ml-100k/u.item')
titles = movies.map(lambda line: line.split('|'))\
               .map(lambda x: (int(x[0]), x[1])).collectAsMap()
titles
{1: u'Toy Story (1995)',
 2: u'GoldenEye (1995)',
 3: u'Four Rooms (1995)',
 4: u'Get Shorty (1995)',
 5: u'Copycat (1995)',
 6: u'Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)',
 7: u'Twelve Monkeys (1995)',
 8: u'Babe (1995)',
 9: u'Dead Man Walking (1995)',
 10: u'Richard III (1995)',
 11: u'Seven (Se7en) (1995)',
 12: u'Usual Suspects, The (1995)',
 13: u'Mighty Aphrodite (1995)',
 14: u'Postino, Il (1994)',
 15: u"Mr. Holland's Opus (1995)",
 16: u'French Twist (Gazon maudit) (1995)',
 17: u'From Dusk Till Dawn (1996)',
 18: u'White Balloon, The (1995)',
 19: u"Antonia's Line (1995)",
 20: u'Angels and Insects (1995)',
 21: u'Muppet Treasure Island (1996)',
 22: u'Braveheart (1995)',
 23: u'Taxi Driver (1976)',
 24: u'Rumble in the Bronx (1995)',
 25: u'Birdcage, The (1996)',
 26: u'Brothers McMullen, The (1995)',
 27: u'Bad Boys (1995)',
 28: u'Apollo 13 (1995)',
 29: u'Batman Forever (1995)',
 30: u'Belle de jour (1967)',
 31: u'Crimson Tide (1995)',
 32: u'Crumb (1994)',
 33: u'Desperado (1995)',
 34: u'Doom Generation, The (1995)',
 35: u'Free Willy 2: The Adventure Home (1995)',
 36: u'Mad Love (1995)',
 37: u'Nadja (1994)',
 38: u'Net, The (1995)',
 39: u'Strange Days (1995)',
 40: u'To Wong Foo, Thanks for Everything! Julie Newmar (1995)',
 41: u'Billy Madison (1995)',
 42: u'Clerks (1994)',
 43: u'Disclosure (1994)',
 44: u'Dolores Claiborne (1994)',
 45: u'Eat Drink Man Woman (1994)',
 46: u'Exotica (1994)',
 47: u'Ed Wood (1994)',
 48: u'Hoop Dreams (1994)',
 49: u'I.Q. (1994)',
 50: u'Star Wars (1977)',
 51: u'Legends of the Fall (1994)',
 52: u'Madness of King George, The (1994)',
 53: u'Natural Born Killers (1994)',
 54: u'Outbreak (1995)',
 55: u'Professional, The (1994)',
 56: u'Pulp Fiction (1994)',
 57: u'Priest (1994)',
 58: u'Quiz Show (1994)',
 59: u'Three Colors: Red (1994)',
 60: u'Three Colors: Blue (1993)',
 61: u'Three Colors: White (1994)',
 62: u'Stargate (1994)',
 63: u'Santa Clause, The (1994)',
 64: u'Shawshank Redemption, The (1994)',
 65: u"What's Eating Gilbert Grape (1993)",
 66: u'While You Were Sleeping (1995)',
 67: u'Ace Ventura: Pet Detective (1994)',
 68: u'Crow, The (1994)',
 69: u'Forrest Gump (1994)',
 70: u'Four Weddings and a Funeral (1994)',
 71: u'Lion King, The (1994)',
 72: u'Mask, The (1994)',
 73: u'Maverick (1994)',
 74: u'Faster Pussycat! Kill! Kill! (1965)',
 75: u'Brother Minister: The Assassination of Malcolm X (1994)',
 76: u"Carlito's Way (1993)",
 77: u'Firm, The (1993)',
 78: u'Free Willy (1993)',
 79: u'Fugitive, The (1993)',
 80: u'Hot Shots! Part Deux (1993)',
 81: u'Hudsucker Proxy, The (1994)',
 82: u'Jurassic Park (1993)',
 83: u'Much Ado About Nothing (1993)',
 84: u"Robert A. Heinlein's The Puppet Masters (1994)",
 85: u'Ref, The (1994)',
 86: u'Remains of the Day, The (1993)',
 87: u'Searching for Bobby Fischer (1993)',
 88: u'Sleepless in Seattle (1993)',
 89: u'Blade Runner (1982)',
 90: u'So I Married an Axe Murderer (1993)',
 91: u'Nightmare Before Christmas, The (1993)',
 92: u'True Romance (1993)',
 93: u'Welcome to the Dollhouse (1995)',
 94: u'Home Alone (1990)',
 95: u'Aladdin (1992)',
 96: u'Terminator 2: Judgment Day (1991)',
 97: u'Dances with Wolves (1990)',
 98: u'Silence of the Lambs, The (1991)',
 99: u'Snow White and the Seven Dwarfs (1937)',
 100: u'Fargo (1996)',
 101: u'Heavy Metal (1981)',
 102: u'Aristocats, The (1970)',
 103: u'All Dogs Go to Heaven 2 (1996)',
 104: u'Theodore Rex (1995)',
 105: u'Sgt. Bilko (1996)',
 106: u'Diabolique (1996)',
 107: u'Moll Flanders (1996)',
 108: u'Kids in the Hall: Brain Candy (1996)',
 109: u'Mystery Science Theater 3000: The Movie (1996)',
 110: u'Operation Dumbo Drop (1995)',
 111: u'Truth About Cats & Dogs, The (1996)',
 112: u'Flipper (1996)',
 113: u'Horseman on the Roof, The (Hussard sur le toit, Le) (1995)',
 114: u'Wallace & Gromit: The Best of Aardman Animation (1996)',
 115: u'Haunted World of Edward D. Wood Jr., The (1995)',
 116: u'Cold Comfort Farm (1995)',
 117: u'Rock, The (1996)',
 118: u'Twister (1996)',
 119: u'Maya Lin: A Strong Clear Vision (1994)',
 120: u'Striptease (1996)',
 121: u'Independence Day (ID4) (1996)',
 122: u'Cable Guy, The (1996)',
 123: u'Frighteners, The (1996)',
 124: u'Lone Star (1996)',
 125: u'Phenomenon (1996)',
 126: u'Spitfire Grill, The (1996)',
 127: u'Godfather, The (1972)',
 128: u'Supercop (1992)',
 129: u'Bound (1996)',
 130: u'Kansas City (1996)',
 131: u"Breakfast at Tiffany's (1961)",
 132: u'Wizard of Oz, The (1939)',
 133: u'Gone with the Wind (1939)',
 134: u'Citizen Kane (1941)',
 135: u'2001: A Space Odyssey (1968)',
 136: u'Mr. Smith Goes to Washington (1939)',
 137: u'Big Night (1996)',
 138: u'D3: The Mighty Ducks (1996)',
 139: u'Love Bug, The (1969)',
 140: u'Homeward Bound: The Incredible Journey (1993)',
 141: u'20,000 Leagues Under the Sea (1954)',
 142: u'Bedknobs and Broomsticks (1971)',
 143: u'Sound of Music, The (1965)',
 144: u'Die Hard (1988)',
 145: u'Lawnmower Man, The (1992)',
 146: u'Unhook the Stars (1996)',
 147: u'Long Kiss Goodnight, The (1996)',
 148: u'Ghost and the Darkness, The (1996)',
 149: u'Jude (1996)',
 150: u'Swingers (1996)',
 151: u'Willy Wonka and the Chocolate Factory (1971)',
 152: u'Sleeper (1973)',
 153: u'Fish Called Wanda, A (1988)',
 154: u"Monty Python's Life of Brian (1979)",
 155: u'Dirty Dancing (1987)',
 156: u'Reservoir Dogs (1992)',
 157: u'Platoon (1986)',
 158: u"Weekend at Bernie's (1989)",
 159: u'Basic Instinct (1992)',
 160: u'Glengarry Glen Ross (1992)',
 161: u'Top Gun (1986)',
 162: u'On Golden Pond (1981)',
 163: u'Return of the Pink Panther, The (1974)',
 164: u'Abyss, The (1989)',
 165: u'Jean de Florette (1986)',
 166: u'Manon of the Spring (Manon des sources) (1986)',
 167: u'Private Benjamin (1980)',
 168: u'Monty Python and the Holy Grail (1974)',
 169: u'Wrong Trousers, The (1993)',
 170: u'Cinema Paradiso (1988)',
 171: u'Delicatessen (1991)',
 172: u'Empire Strikes Back, The (1980)',
 173: u'Princess Bride, The (1987)',
 174: u'Raiders of the Lost Ark (1981)',
 175: u'Brazil (1985)',
 176: u'Aliens (1986)',
 177: u'Good, The Bad and The Ugly, The (1966)',
 178: u'12 Angry Men (1957)',
 179: u'Clockwork Orange, A (1971)',
 180: u'Apocalypse Now (1979)',
 181: u'Return of the Jedi (1983)',
 182: u'GoodFellas (1990)',
 183: u'Alien (1979)',
 184: u'Army of Darkness (1993)',
 185: u'Psycho (1960)',
 186: u'Blues Brothers, The (1980)',
 187: u'Godfather: Part II, The (1974)',
 188: u'Full Metal Jacket (1987)',
 189: u'Grand Day Out, A (1992)',
 190: u'Henry V (1989)',
 191: u'Amadeus (1984)',
 192: u'Raging Bull (1980)',
 193: u'Right Stuff, The (1983)',
 194: u'Sting, The (1973)',
 195: u'Terminator, The (1984)',
 196: u'Dead Poets Society (1989)',
 197: u'Graduate, The (1967)',
 198: u'Nikita (La Femme Nikita) (1990)',
 199: u'Bridge on the River Kwai, The (1957)',
 200: u'Shining, The (1980)',
 201: u'Evil Dead II (1987)',
 202: u'Groundhog Day (1993)',
 203: u'Unforgiven (1992)',
 204: u'Back to the Future (1985)',
 205: u'Patton (1970)',
 206: u'Akira (1988)',
 207: u'Cyrano de Bergerac (1990)',
 208: u'Young Frankenstein (1974)',
 209: u'This Is Spinal Tap (1984)',
 210: u'Indiana Jones and the Last Crusade (1989)',
 211: u'M*A*S*H (1970)',
 212: u'Unbearable Lightness of Being, The (1988)',
 213: u'Room with a View, A (1986)',
 214: u'Pink Floyd - The Wall (1982)',
 215: u'Field of Dreams (1989)',
 216: u'When Harry Met Sally... (1989)',
 217: u"Bram Stoker's Dracula (1992)",
 218: u'Cape Fear (1991)',
 219: u'Nightmare on Elm Street, A (1984)',
 220: u'Mirror Has Two Faces, The (1996)',
 221: u'Breaking the Waves (1996)',
 222: u'Star Trek: First Contact (1996)',
 223: u'Sling Blade (1996)',
 224: u'Ridicule (1996)',
 225: u'101 Dalmatians (1996)',
 226: u'Die Hard 2 (1990)',
 227: u'Star Trek VI: The Undiscovered Country (1991)',
 228: u'Star Trek: The Wrath of Khan (1982)',
 229: u'Star Trek III: The Search for Spock (1984)',
 230: u'Star Trek IV: The Voyage Home (1986)',
 231: u'Batman Returns (1992)',
 232: u'Young Guns (1988)',
 233: u'Under Siege (1992)',
 234: u'Jaws (1975)',
 235: u'Mars Attacks! (1996)',
 236: u'Citizen Ruth (1996)',
 237: u'Jerry Maguire (1996)',
 238: u'Raising Arizona (1987)',
 239: u'Sneakers (1992)',
 240: u'Beavis and Butt-head Do America (1996)',
 241: u'Last of the Mohicans, The (1992)',
 242: u'Kolya (1996)',
 243: u'Jungle2Jungle (1997)',
 244: u"Smilla's Sense of Snow (1997)",
 245: u"Devil's Own, The (1997)",
 246: u'Chasing Amy (1997)',
 247: u'Turbo: A Power Rangers Movie (1997)',
 248: u'Grosse Pointe Blank (1997)',
 249: u'Austin Powers: International Man of Mystery (1997)',
 250: u'Fifth Element, The (1997)',
 251: u'Shall We Dance? (1996)',
 252: u'Lost World: Jurassic Park, The (1997)',
 253: u'Pillow Book, The (1995)',
 254: u'Batman & Robin (1997)',
 255: u"My Best Friend's Wedding (1997)",
 256: u'When the Cats Away (Chacun cherche son chat) (1996)',
 257: u'Men in Black (1997)',
 258: u'Contact (1997)',
 259: u'George of the Jungle (1997)',
 260: u'Event Horizon (1997)',
 261: u'Air Bud (1997)',
 262: u'In the Company of Men (1997)',
 263: u'Steel (1997)',
 264: u'Mimic (1997)',
 265: u'Hunt for Red October, The (1990)',
 266: u'Kull the Conqueror (1997)',
 267: u'unknown',
 268: u'Chasing Amy (1997)',
 269: u'Full Monty, The (1997)',
 270: u'Gattaca (1997)',
 271: u'Starship Troopers (1997)',
 272: u'Good Will Hunting (1997)',
 273: u'Heat (1995)',
 274: u'Sabrina (1995)',
 275: u'Sense and Sensibility (1995)',
 276: u'Leaving Las Vegas (1995)',
 277: u'Restoration (1995)',
 278: u'Bed of Roses (1996)',
 279: u'Once Upon a Time... When We Were Colored (1995)',
 280: u'Up Close and Personal (1996)',
 281: u'River Wild, The (1994)',
 282: u'Time to Kill, A (1996)',
 283: u'Emma (1996)',
 284: u'Tin Cup (1996)',
 285: u'Secrets & Lies (1996)',
 286: u'English Patient, The (1996)',
 287: u"Marvin's Room (1996)",
 288: u'Scream (1996)',
 289: u'Evita (1996)',
 290: u'Fierce Creatures (1997)',
 291: u'Absolute Power (1997)',
 292: u'Rosewood (1997)',
 293: u'Donnie Brasco (1997)',
 294: u'Liar Liar (1997)',
 295: u'Breakdown (1997)',
 296: u'Promesse, La (1996)',
 297: u"Ulee's Gold (1997)",
 298: u'Face/Off (1997)',
 299: u'Hoodlum (1997)',
 300: u'Air Force One (1997)',
 301: u'In & Out (1997)',
 302: u'L.A. Confidential (1997)',
 303: u"Ulee's Gold (1997)",
 304: u'Fly Away Home (1996)',
 305: u'Ice Storm, The (1997)',
 306: u'Mrs. Brown (Her Majesty, Mrs. Brown) (1997)',
 307: u"Devil's Advocate, The (1997)",
 308: u'FairyTale: A True Story (1997)',
 309: u'Deceiver (1997)',
 310: u'Rainmaker, The (1997)',
 311: u'Wings of the Dove, The (1997)',
 312: u'Midnight in the Garden of Good and Evil (1997)',
 313: u'Titanic (1997)',
 314: u'3 Ninjas: High Noon At Mega Mountain (1998)',
 315: u'Apt Pupil (1998)',
 316: u'As Good As It Gets (1997)',
 317: u'In the Name of the Father (1993)',
 318: u"Schindler's List (1993)",
 319: u'Everyone Says I Love You (1996)',
 320: u'Paradise Lost: The Child Murders at Robin Hood Hills (1996)',
 321: u'Mother (1996)',
 322: u'Murder at 1600 (1997)',
 323: u"Dante's Peak (1997)",
 324: u'Lost Highway (1997)',
 325: u'Crash (1996)',
 326: u'G.I. Jane (1997)',
 327: u'Cop Land (1997)',
 328: u'Conspiracy Theory (1997)',
 329: u'Desperate Measures (1998)',
 330: u'187 (1997)',
 331: u'Edge, The (1997)',
 332: u'Kiss the Girls (1997)',
 333: u'Game, The (1997)',
 334: u'U Turn (1997)',
 335: u'How to Be a Player (1997)',
 336: u'Playing God (1997)',
 337: u'House of Yes, The (1997)',
 338: u'Bean (1997)',
 339: u'Mad City (1997)',
 340: u'Boogie Nights (1997)',
 341: u'Critical Care (1997)',
 342: u'Man Who Knew Too Little, The (1997)',
 343: u'Alien: Resurrection (1997)',
 344: u'Apostle, The (1997)',
 345: u'Deconstructing Harry (1997)',
 346: u'Jackie Brown (1997)',
 347: u'Wag the Dog (1997)',
 348: u'Desperate Measures (1998)',
 349: u'Hard Rain (1998)',
 350: u'Fallen (1998)',
 351: u'Prophecy II, The (1998)',
 352: u'Spice World (1997)',
 353: u'Deep Rising (1998)',
 354: u'Wedding Singer, The (1998)',
 355: u'Sphere (1998)',
 356: u'Client, The (1994)',
 357: u"One Flew Over the Cuckoo's Nest (1975)",
 358: u'Spawn (1997)',
 359: u'Assignment, The (1997)',
 360: u'Wonderland (1997)',
 361: u'Incognito (1997)',
 362: u'Blues Brothers 2000 (1998)',
 363: u'Sudden Death (1995)',
 364: u'Ace Ventura: When Nature Calls (1995)',
 365: u'Powder (1995)',
 366: u'Dangerous Minds (1995)',
 367: u'Clueless (1995)',
 368: u'Bio-Dome (1996)',
 369: u'Black Sheep (1996)',
 370: u'Mary Reilly (1996)',
 371: u'Bridges of Madison County, The (1995)',
 372: u'Jeffrey (1995)',
 373: u'Judge Dredd (1995)',
 374: u'Mighty Morphin Power Rangers: The Movie (1995)',
 375: u'Showgirls (1995)',
 376: u'Houseguest (1994)',
 377: u'Heavyweights (1994)',
 378: u'Miracle on 34th Street (1994)',
 379: u'Tales From the Crypt Presents: Demon Knight (1995)',
 380: u'Star Trek: Generations (1994)',
 381: u"Muriel's Wedding (1994)",
 382: u'Adventures of Priscilla, Queen of the Desert, The (1994)',
 383: u'Flintstones, The (1994)',
 384: u'Naked Gun 33 1/3: The Final Insult (1994)',
 385: u'True Lies (1994)',
 386: u'Addams Family Values (1993)',
 387: u'Age of Innocence, The (1993)',
 388: u'Beverly Hills Cop III (1994)',
 389: u'Black Beauty (1994)',
 390: u'Fear of a Black Hat (1993)',
 391: u'Last Action Hero (1993)',
 392: u'Man Without a Face, The (1993)',
 393: u'Mrs. Doubtfire (1993)',
 394: u'Radioland Murders (1994)',
 395: u'Robin Hood: Men in Tights (1993)',
 396: u'Serial Mom (1994)',
 397: u'Striking Distance (1993)',
 398: u'Super Mario Bros. (1993)',
 399: u'Three Musketeers, The (1993)',
 400: u'Little Rascals, The (1994)',
 401: u'Brady Bunch Movie, The (1995)',
 402: u'Ghost (1990)',
 403: u'Batman (1989)',
 404: u'Pinocchio (1940)',
 405: u'Mission: Impossible (1996)',
 406: u'Thinner (1996)',
 407: u'Spy Hard (1996)',
 408: u'Close Shave, A (1995)',
 409: u'Jack (1996)',
 410: u'Kingpin (1996)',
 411: u'Nutty Professor, The (1996)',
 412: u'Very Brady Sequel, A (1996)',
 413: u'Tales from the Crypt Presents: Bordello of Blood (1996)',
 414: u'My Favorite Year (1982)',
 415: u'Apple Dumpling Gang, The (1975)',
 416: u'Old Yeller (1957)',
 417: u'Parent Trap, The (1961)',
 418: u'Cinderella (1950)',
 419: u'Mary Poppins (1964)',
 420: u'Alice in Wonderland (1951)',
 421: u"William Shakespeare's Romeo and Juliet (1996)",
 422: u'Aladdin and the King of Thieves (1996)',
 423: u'E.T. the Extra-Terrestrial (1982)',
 424: u'Children of the Corn: The Gathering (1996)',
 425: u'Bob Roberts (1992)',
 426: u'Transformers: The Movie, The (1986)',
 427: u'To Kill a Mockingbird (1962)',
 428: u'Harold and Maude (1971)',
 429: u'Day the Earth Stood Still, The (1951)',
 430: u'Duck Soup (1933)',
 431: u'Highlander (1986)',
 432: u'Fantasia (1940)',
 433: u'Heathers (1989)',
 434: u'Forbidden Planet (1956)',
 435: u'Butch Cassidy and the Sundance Kid (1969)',
 436: u'American Werewolf in London, An (1981)',
 437: u"Amityville 1992: It's About Time (1992)",
 438: u'Amityville 3-D (1983)',
 439: u'Amityville: A New Generation (1993)',
 440: u'Amityville II: The Possession (1982)',
 441: u'Amityville Horror, The (1979)',
 442: u'Amityville Curse, The (1990)',
 443: u'Birds, The (1963)',
 444: u'Blob, The (1958)',
 445: u'Body Snatcher, The (1945)',
 446: u'Burnt Offerings (1976)',
 447: u'Carrie (1976)',
 448: u'Omen, The (1976)',
 449: u'Star Trek: The Motion Picture (1979)',
 450: u'Star Trek V: The Final Frontier (1989)',
 451: u'Grease (1978)',
 452: u'Jaws 2 (1978)',
 453: u'Jaws 3-D (1983)',
 454: u'Bastard Out of Carolina (1996)',
 455: u"Jackie Chan's First Strike (1996)",
 456: u'Beverly Hills Ninja (1997)',
 457: u'Free Willy 3: The Rescue (1997)',
 458: u'Nixon (1995)',
 459: u'Cry, the Beloved Country (1995)',
 460: u'Crossing Guard, The (1995)',
 461: u'Smoke (1995)',
 462: u'Like Water For Chocolate (Como agua para chocolate) (1992)',
 463: u'Secret of Roan Inish, The (1994)',
 464: u'Vanya on 42nd Street (1994)',
 465: u'Jungle Book, The (1994)',
 466: u'Red Rock West (1992)',
 467: u'Bronx Tale, A (1993)',
 468: u'Rudy (1993)',
 469: u'Short Cuts (1993)',
 470: u'Tombstone (1993)',
 471: u'Courage Under Fire (1996)',
 472: u'Dragonheart (1996)',
 473: u'James and the Giant Peach (1996)',
 474: u'Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1963)',
 475: u'Trainspotting (1996)',
 476: u'First Wives Club, The (1996)',
 477: u'Matilda (1996)',
 478: u'Philadelphia Story, The (1940)',
 479: u'Vertigo (1958)',
 480: u'North by Northwest (1959)',
 481: u'Apartment, The (1960)',
 482: u'Some Like It Hot (1959)',
 483: u'Casablanca (1942)',
 484: u'Maltese Falcon, The (1941)',
 485: u'My Fair Lady (1964)',
 486: u'Sabrina (1954)',
 487: u'Roman Holiday (1953)',
 488: u'Sunset Blvd. (1950)',
 489: u'Notorious (1946)',
 490: u'To Catch a Thief (1955)',
 491: u'Adventures of Robin Hood, The (1938)',
 492: u'East of Eden (1955)',
 493: u'Thin Man, The (1934)',
 494: u'His Girl Friday (1940)',
 495: u'Around the World in 80 Days (1956)',
 496: u"It's a Wonderful Life (1946)",
 497: u'Bringing Up Baby (1938)',
 498: u'African Queen, The (1951)',
 499: u'Cat on a Hot Tin Roof (1958)',
 500: u'Fly Away Home (1996)',
 501: u'Dumbo (1941)',
 502: u'Bananas (1971)',
 503: u'Candidate, The (1972)',
 504: u'Bonnie and Clyde (1967)',
 505: u'Dial M for Murder (1954)',
 506: u'Rebel Without a Cause (1955)',
 507: u'Streetcar Named Desire, A (1951)',
 508: u'People vs. Larry Flynt, The (1996)',
 509: u'My Left Foot (1989)',
 510: u'Magnificent Seven, The (1954)',
 511: u'Lawrence of Arabia (1962)',
 512: u'Wings of Desire (1987)',
 513: u'Third Man, The (1949)',
 514: u'Annie Hall (1977)',
 515: u'Boot, Das (1981)',
 516: u'Local Hero (1983)',
 517: u'Manhattan (1979)',
 518: u"Miller's Crossing (1990)",
 519: u'Treasure of the Sierra Madre, The (1948)',
 520: u'Great Escape, The (1963)',
 521: u'Deer Hunter, The (1978)',
 522: u'Down by Law (1986)',
 523: u'Cool Hand Luke (1967)',
 524: u'Great Dictator, The (1940)',
 525: u'Big Sleep, The (1946)',
 526: u'Ben-Hur (1959)',
 527: u'Gandhi (1982)',
 528: u'Killing Fields, The (1984)',
 529: u'My Life as a Dog (Mitt liv som hund) (1985)',
 530: u'Man Who Would Be King, The (1975)',
 531: u'Shine (1996)',
 532: u'Kama Sutra: A Tale of Love (1996)',
 533: u'Daytrippers, The (1996)',
 534: u'Traveller (1997)',
 535: u'Addicted to Love (1997)',
 536: u'Ponette (1996)',
 537: u'My Own Private Idaho (1991)',
 538: u'Anastasia (1997)',
 539: u'Mouse Hunt (1997)',
 540: u'Money Train (1995)',
 541: u'Mortal Kombat (1995)',
 542: u'Pocahontas (1995)',
 543: u'Mis\ufffdrables, Les (1995)',
 544: u"Things to Do in Denver when You're Dead (1995)",
 545: u'Vampire in Brooklyn (1995)',
 546: u'Broken Arrow (1996)',
 547: u"Young Poisoner's Handbook, The (1995)",
 548: u'NeverEnding Story III, The (1994)',
 549: u'Rob Roy (1995)',
 550: u'Die Hard: With a Vengeance (1995)',
 551: u'Lord of Illusions (1995)',
 552: u'Species (1995)',
 553: u'Walk in the Clouds, A (1995)',
 554: u'Waterworld (1995)',
 555: u"White Man's Burden (1995)",
 556: u'Wild Bill (1995)',
 557: u'Farinelli: il castrato (1994)',
 558: u'Heavenly Creatures (1994)',
 559: u'Interview with the Vampire (1994)',
 560: u"Kid in King Arthur's Court, A (1995)",
 561: u"Mary Shelley's Frankenstein (1994)",
 562: u'Quick and the Dead, The (1995)',
 563: u"Stephen King's The Langoliers (1995)",
 564: u'Tales from the Hood (1995)',
 565: u'Village of the Damned (1995)',
 566: u'Clear and Present Danger (1994)',
 567: u"Wes Craven's New Nightmare (1994)",
 568: u'Speed (1994)',
 569: u'Wolf (1994)',
 570: u'Wyatt Earp (1994)',
 571: u'Another Stakeout (1993)',
 572: u'Blown Away (1994)',
 573: u'Body Snatchers (1993)',
 574: u'Boxing Helena (1993)',
 575: u"City Slickers II: The Legend of Curly's Gold (1994)",
 576: u'Cliffhanger (1993)',
 577: u'Coneheads (1993)',
 578: u'Demolition Man (1993)',
 579: u'Fatal Instinct (1993)',
 580: u'Englishman Who Went Up a Hill, But Came Down a Mountain, The (1995)',
 581: u'Kalifornia (1993)',
 582: u'Piano, The (1993)',
 583: u'Romeo Is Bleeding (1993)',
 584: u'Secret Garden, The (1993)',
 585: u'Son in Law (1993)',
 586: u'Terminal Velocity (1994)',
 587: u'Hour of the Pig, The (1993)',
 588: u'Beauty and the Beast (1991)',
 589: u'Wild Bunch, The (1969)',
 590: u'Hellraiser: Bloodline (1996)',
 591: u'Primal Fear (1996)',
 592: u'True Crime (1995)',
 593: u'Stalingrad (1993)',
 594: u'Heavy (1995)',
 595: u'Fan, The (1996)',
 596: u'Hunchback of Notre Dame, The (1996)',
 597: u'Eraser (1996)',
 598: u'Big Squeeze, The (1996)',
 599: u'Police Story 4: Project S (Chao ji ji hua) (1993)',
 600: u"Daniel Defoe's Robinson Crusoe (1996)",
 601: u'For Whom the Bell Tolls (1943)',
 602: u'American in Paris, An (1951)',
 603: u'Rear Window (1954)',
 604: u'It Happened One Night (1934)',
 605: u'Meet Me in St. Louis (1944)',
 606: u'All About Eve (1950)',
 607: u'Rebecca (1940)',
 608: u'Spellbound (1945)',
 609: u'Father of the Bride (1950)',
 610: u'Gigi (1958)',
 611: u'Laura (1944)',
 612: u'Lost Horizon (1937)',
 613: u'My Man Godfrey (1936)',
 614: u'Giant (1956)',
 615: u'39 Steps, The (1935)',
 616: u'Night of the Living Dead (1968)',
 617: u'Blue Angel, The (Blaue Engel, Der) (1930)',
 618: u'Picnic (1955)',
 619: u'Extreme Measures (1996)',
 620: u'Chamber, The (1996)',
 621: u'Davy Crockett, King of the Wild Frontier (1955)',
 622: u'Swiss Family Robinson (1960)',
 623: u'Angels in the Outfield (1994)',
 624: u'Three Caballeros, The (1945)',
 625: u'Sword in the Stone, The (1963)',
 626: u'So Dear to My Heart (1949)',
 627: u'Robin Hood: Prince of Thieves (1991)',
 628: u'Sleepers (1996)',
 629: u'Victor/Victoria (1982)',
 630: u'Great Race, The (1965)',
 631: u'Crying Game, The (1992)',
 632: u"Sophie's Choice (1982)",
 633: u'Christmas Carol, A (1938)',
 634: u"Microcosmos: Le peuple de l'herbe (1996)",
 635: u'Fog, The (1980)',
 636: u'Escape from New York (1981)',
 637: u'Howling, The (1981)',
 638: u'Return of Martin Guerre, The (Retour de Martin Guerre, Le) (1982)',
 639: u'Tin Drum, The (Blechtrommel, Die) (1979)',
 640: u'Cook the Thief His Wife & Her Lover, The (1989)',
 641: u'Paths of Glory (1957)',
 642: u'Grifters, The (1990)',
 643: u'The Innocent (1994)',
 644: u'Thin Blue Line, The (1988)',
 645: u'Paris Is Burning (1990)',
 646: u'Once Upon a Time in the West (1969)',
 647: u'Ran (1985)',
 648: u'Quiet Man, The (1952)',
 649: u'Once Upon a Time in America (1984)',
 650: u'Seventh Seal, The (Sjunde inseglet, Det) (1957)',
 651: u'Glory (1989)',
 652: u'Rosencrantz and Guildenstern Are Dead (1990)',
 653: u'Touch of Evil (1958)',
 654: u'Chinatown (1974)',
 655: u'Stand by Me (1986)',
 656: u'M (1931)',
 657: u'Manchurian Candidate, The (1962)',
 658: u'Pump Up the Volume (1990)',
 659: u'Arsenic and Old Lace (1944)',
 660: u'Fried Green Tomatoes (1991)',
 661: u'High Noon (1952)',
 662: u'Somewhere in Time (1980)',
 663: u'Being There (1979)',
 664: u'Paris, Texas (1984)',
 665: u'Alien 3 (1992)',
 666: u"Blood For Dracula (Andy Warhol's Dracula) (1974)",
 667: u'Audrey Rose (1977)',
 668: u'Blood Beach (1981)',
 669: u'Body Parts (1991)',
 670: u'Body Snatchers (1993)',
 671: u'Bride of Frankenstein (1935)',
 672: u'Candyman (1992)',
 673: u'Cape Fear (1962)',
 674: u'Cat People (1982)',
 675: u'Nosferatu (Nosferatu, eine Symphonie des Grauens) (1922)',
 676: u'Crucible, The (1996)',
 677: u'Fire on the Mountain (1996)',
 678: u'Volcano (1997)',
 679: u'Conan the Barbarian (1981)',
 680: u'Kull the Conqueror (1997)',
 681: u'Wishmaster (1997)',
 682: u'I Know What You Did Last Summer (1997)',
 683: u'Rocket Man (1997)',
 684: u'In the Line of Fire (1993)',
 685: u'Executive Decision (1996)',
 686: u'Perfect World, A (1993)',
 687: u"McHale's Navy (1997)",
 688: u'Leave It to Beaver (1997)',
 689: u'Jackal, The (1997)',
 690: u'Seven Years in Tibet (1997)',
 691: u'Dark City (1998)',
 692: u'American President, The (1995)',
 693: u'Casino (1995)',
 694: u'Persuasion (1995)',
 695: u'Kicking and Screaming (1995)',
 696: u'City Hall (1996)',
 697: u'Basketball Diaries, The (1995)',
 698: u'Browning Version, The (1994)',
 699: u'Little Women (1994)',
 700: u'Miami Rhapsody (1995)',
 701: u'Wonderful, Horrible Life of Leni Riefenstahl, The (1993)',
 702: u'Barcelona (1994)',
 703: u"Widows' Peak (1994)",
 704: u'House of the Spirits, The (1993)',
 705: u"Singin' in the Rain (1952)",
 706: u'Bad Moon (1996)',
 707: u'Enchanted April (1991)',
 708: u'Sex, Lies, and Videotape (1989)',
 709: u'Strictly Ballroom (1992)',
 710: u'Better Off Dead... (1985)',
 711: u'Substance of Fire, The (1996)',
 712: u'Tin Men (1987)',
 713: u'Othello (1995)',
 714: u'Carrington (1995)',
 715: u'To Die For (1995)',
 716: u'Home for the Holidays (1995)',
 717: u'Juror, The (1996)',
 718: u'In the Bleak Midwinter (1995)',
 719: u'Canadian Bacon (1994)',
 720: u'First Knight (1995)',
 721: u'Mallrats (1995)',
 722: u'Nine Months (1995)',
 723: u'Boys on the Side (1995)',
 724: u'Circle of Friends (1995)',
 725: u'Exit to Eden (1994)',
 726: u'Fluke (1995)',
 727: u'Immortal Beloved (1994)',
 728: u'Junior (1994)',
 729: u'Nell (1994)',
 730: u'Queen Margot (Reine Margot, La) (1994)',
 731: u'Corrina, Corrina (1994)',
 732: u'Dave (1993)',
 733: u'Go Fish (1994)',
 734: u'Made in America (1993)',
 735: u'Philadelphia (1993)',
 736: u'Shadowlands (1993)',
 737: u'Sirens (1994)',
 738: u'Threesome (1994)',
 739: u'Pretty Woman (1990)',
 740: u'Jane Eyre (1996)',
 741: u'Last Supper, The (1995)',
 742: u'Ransom (1996)',
 743: u'Crow: City of Angels, The (1996)',
 744: u'Michael Collins (1996)',
 745: u'Ruling Class, The (1972)',
 746: u'Real Genius (1985)',
 747: u'Benny & Joon (1993)',
 748: u'Saint, The (1997)',
 749: u'MatchMaker, The (1997)',
 750: u'Amistad (1997)',
 751: u'Tomorrow Never Dies (1997)',
 752: u'Replacement Killers, The (1998)',
 753: u'Burnt By the Sun (1994)',
 754: u'Red Corner (1997)',
 755: u'Jumanji (1995)',
 756: u'Father of the Bride Part II (1995)',
 757: u'Across the Sea of Time (1995)',
 758: u'Lawnmower Man 2: Beyond Cyberspace (1996)',
 759: u'Fair Game (1995)',
 760: u'Screamers (1995)',
 761: u'Nick of Time (1995)',
 762: u'Beautiful Girls (1996)',
 763: u'Happy Gilmore (1996)',
 764: u'If Lucy Fell (1996)',
 765: u'Boomerang (1992)',
 766: u'Man of the Year (1995)',
 767: u'Addiction, The (1995)',
 768: u'Casper (1995)',
 769: u'Congo (1995)',
 770: u'Devil in a Blue Dress (1995)',
 771: u'Johnny Mnemonic (1995)',
 772: u'Kids (1995)',
 773: u'Mute Witness (1994)',
 774: u'Prophecy, The (1995)',
 775: u'Something to Talk About (1995)',
 776: u'Three Wishes (1995)',
 777: u'Castle Freak (1995)',
 778: u'Don Juan DeMarco (1995)',
 779: u'Drop Zone (1994)',
 780: u'Dumb & Dumber (1994)',
 781: u'French Kiss (1995)',
 782: u'Little Odessa (1994)',
 783: u'Milk Money (1994)',
 784: u'Beyond Bedlam (1993)',
 785: u'Only You (1994)',
 786: u'Perez Family, The (1995)',
 787: u'Roommates (1995)',
 788: u'Relative Fear (1994)',
 789: u'Swimming with Sharks (1995)',
 790: u'Tommy Boy (1995)',
 791: u'Baby-Sitters Club, The (1995)',
 792: u'Bullets Over Broadway (1994)',
 793: u'Crooklyn (1994)',
 794: u'It Could Happen to You (1994)',
 795: u'Richie Rich (1994)',
 796: u'Speechless (1994)',
 797: u'Timecop (1994)',
 798: u'Bad Company (1995)',
 799: u'Boys Life (1995)',
 800: u'In the Mouth of Madness (1995)',
 801: u'Air Up There, The (1994)',
 802: u'Hard Target (1993)',
 803: u'Heaven & Earth (1993)',
 804: u'Jimmy Hollywood (1994)',
 805: u'Manhattan Murder Mystery (1993)',
 806: u'Menace II Society (1993)',
 807: u'Poetic Justice (1993)',
 808: u'Program, The (1993)',
 809: u'Rising Sun (1993)',
 810: u'Shadow, The (1994)',
 811: u'Thirty-Two Short Films About Glenn Gould (1993)',
 812: u'Andre (1994)',
 813: u'Celluloid Closet, The (1995)',
 814: u'Great Day in Harlem, A (1994)',
 815: u'One Fine Day (1996)',
 816: u'Candyman: Farewell to the Flesh (1995)',
 817: u'Frisk (1995)',
 818: u'Girl 6 (1996)',
 819: u'Eddie (1996)',
 820: u'Space Jam (1996)',
 821: u'Mrs. Winterbourne (1996)',
 822: u'Faces (1968)',
 823: u'Mulholland Falls (1996)',
 824: u'Great White Hype, The (1996)',
 825: u'Arrival, The (1996)',
 826: u'Phantom, The (1996)',
 827: u'Daylight (1996)',
 828: u'Alaska (1996)',
 829: u'Fled (1996)',
 830: u'Power 98 (1995)',
 831: u'Escape from L.A. (1996)',
 832: u'Bogus (1996)',
 833: u'Bulletproof (1996)',
 834: u'Halloween: The Curse of Michael Myers (1995)',
 835: u'Gay Divorcee, The (1934)',
 836: u'Ninotchka (1939)',
 837: u'Meet John Doe (1941)',
 838: u'In the Line of Duty 2 (1987)',
 839: u'Loch Ness (1995)',
 840: u'Last Man Standing (1996)',
 841: u'Glimmer Man, The (1996)',
 842: u'Pollyanna (1960)',
 843: u'Shaggy Dog, The (1959)',
 844: u'Freeway (1996)',
 845: u'That Thing You Do! (1996)',
 846: u'To Gillian on Her 37th Birthday (1996)',
 847: u'Looking for Richard (1996)',
 848: u'Murder, My Sweet (1944)',
 849: u'Days of Thunder (1990)',
 850: u'Perfect Candidate, A (1996)',
 851: u'Two or Three Things I Know About Her (1966)',
 852: u'Bloody Child, The (1996)',
 853: u'Braindead (1992)',
 854: u'Bad Taste (1987)',
 855: u'Diva (1981)',
 856: u'Night on Earth (1991)',
 857: u'Paris Was a Woman (1995)',
 858: u'Amityville: Dollhouse (1996)',
 859: u"April Fool's Day (1986)",
 860: u'Believers, The (1987)',
 861: u'Nosferatu a Venezia (1986)',
 862: u'Jingle All the Way (1996)',
 863: u'Garden of Finzi-Contini, The (Giardino dei Finzi-Contini, Il) (1970)',
 864: u'My Fellow Americans (1996)',
 865: u'Ice Storm, The (1997)',
 866: u'Michael (1996)',
 867: u'Whole Wide World, The (1996)',
 868: u'Hearts and Minds (1996)',
 869: u'Fools Rush In (1997)',
 870: u'Touch (1997)',
 871: u'Vegas Vacation (1997)',
 872: u'Love Jones (1997)',
 873: u'Picture Perfect (1997)',
 874: u'Career Girls (1997)',
 875: u"She's So Lovely (1997)",
 876: u'Money Talks (1997)',
 877: u'Excess Baggage (1997)',
 878: u'That Darn Cat! (1997)',
 879: u'Peacemaker, The (1997)',
 880: u'Soul Food (1997)',
 881: u'Money Talks (1997)',
 882: u'Washington Square (1997)',
 883: u'Telling Lies in America (1997)',
 884: u'Year of the Horse (1997)',
 885: u'Phantoms (1998)',
 886: u'Life Less Ordinary, A (1997)',
 887: u"Eve's Bayou (1997)",
 888: u'One Night Stand (1997)',
 889: u'Tango Lesson, The (1997)',
 890: u'Mortal Kombat: Annihilation (1997)',
 891: u'Bent (1997)',
 892: u'Flubber (1997)',
 893: u'For Richer or Poorer (1997)',
 894: u'Home Alone 3 (1997)',
 895: u'Scream 2 (1997)',
 896: u'Sweet Hereafter, The (1997)',
 897: u'Time Tracers (1995)',
 898: u'Postman, The (1997)',
 899: u'Winter Guest, The (1997)',
 900: u'Kundun (1997)',
 901: u'Mr. Magoo (1997)',
 902: u'Big Lebowski, The (1998)',
 903: u'Afterglow (1997)',
 904: u'Ma vie en rose (My Life in Pink) (1997)',
 905: u'Great Expectations (1998)',
 906: u'Oscar & Lucinda (1997)',
 907: u'Vermin (1998)',
 908: u'Half Baked (1998)',
 909: u'Dangerous Beauty (1998)',
 910: u'Nil By Mouth (1997)',
 911: u'Twilight (1998)',
 912: u'U.S. Marshalls (1998)',
 913: u'Love and Death on Long Island (1997)',
 914: u'Wild Things (1998)',
 915: u'Primary Colors (1998)',
 916: u'Lost in Space (1998)',
 917: u'Mercury Rising (1998)',
 918: u'City of Angels (1998)',
 919: u'City of Lost Children, The (1995)',
 920: u'Two Bits (1995)',
 921: u'Farewell My Concubine (1993)',
 922: u'Dead Man (1995)',
 923: u'Raise the Red Lantern (1991)',
 924: u'White Squall (1996)',
 925: u'Unforgettable (1996)',
 926: u'Down Periscope (1996)',
 927: u'Flower of My Secret, The (Flor de mi secreto, La) (1995)',
 928: u'Craft, The (1996)',
 929: u'Harriet the Spy (1996)',
 930: u'Chain Reaction (1996)',
 931: u'Island of Dr. Moreau, The (1996)',
 932: u'First Kid (1996)',
 933: u'Funeral, The (1996)',
 934: u"Preacher's Wife, The (1996)",
 935: u'Paradise Road (1997)',
 936: u'Brassed Off (1996)',
 937: u'Thousand Acres, A (1997)',
 938: u'Smile Like Yours, A (1997)',
 939: u'Murder in the First (1995)',
 940: u'Airheads (1994)',
 941: u'With Honors (1994)',
 942: u"What's Love Got to Do with It (1993)",
 943: u'Killing Zoe (1994)',
 944: u'Renaissance Man (1994)',
 945: u'Charade (1963)',
 946: u'Fox and the Hound, The (1981)',
 947: u'Big Blue, The (Grand bleu, Le) (1988)',
 948: u'Booty Call (1997)',
 949: u'How to Make an American Quilt (1995)',
 950: u'Georgia (1995)',
 951: u'Indian in the Cupboard, The (1995)',
 952: u'Blue in the Face (1995)',
 953: u'Unstrung Heroes (1995)',
 954: u'Unzipped (1995)',
 955: u'Before Sunrise (1995)',
 956: u"Nobody's Fool (1994)",
 957: u'Pushing Hands (1992)',
 958: u'To Live (Huozhe) (1994)',
 959: u'Dazed and Confused (1993)',
 960: u'Naked (1993)',
 961: u'Orlando (1993)',
 962: u'Ruby in Paradise (1993)',
 963: u'Some Folks Call It a Sling Blade (1993)',
 964: u'Month by the Lake, A (1995)',
 965: u'Funny Face (1957)',
 966: u'Affair to Remember, An (1957)',
 967: u'Little Lord Fauntleroy (1936)',
 968: u'Inspector General, The (1949)',
 969: u'Winnie the Pooh and the Blustery Day (1968)',
 970: u'Hear My Song (1991)',
 971: u'Mediterraneo (1991)',
 972: u'Passion Fish (1992)',
 973: u'Grateful Dead (1995)',
 974: u'Eye for an Eye (1996)',
 975: u'Fear (1996)',
 976: u'Solo (1996)',
 977: u'Substitute, The (1996)',
 978: u"Heaven's Prisoners (1996)",
 979: u'Trigger Effect, The (1996)',
 980: u'Mother Night (1996)',
 981: u'Dangerous Ground (1997)',
 982: u'Maximum Risk (1996)',
 983: u"Rich Man's Wife, The (1996)",
 984: u'Shadow Conspiracy (1997)',
 985: u'Blood & Wine (1997)',
 986: u'Turbulence (1997)',
 987: u'Underworld (1997)',
 988: u'Beautician and the Beast, The (1997)',
 989: u"Cats Don't Dance (1997)",
 990: u'Anna Karenina (1997)',
 991: u'Keys to Tulsa (1997)',
 992: u'Head Above Water (1996)',
 993: u'Hercules (1997)',
 994: u'Last Time I Committed Suicide, The (1997)',
 995: u'Kiss Me, Guido (1997)',
 996: u'Big Green, The (1995)',
 997: u'Stuart Saves His Family (1995)',
 998: u'Cabin Boy (1994)',
 999: u'Clean Slate (1994)',
 1000: u'Lightning Jack (1994)',
 ...}
r_with_titles = r_123.map(lambda x: (titles[x.product], x.rating))
r_with_titles.sortBy(lambda x:-x[1]).take(10)
[(u'Fantasia (1940)', 5.0),
 (u'Postino, Il (1994)', 5.0),
 (u'2001: A Space Odyssey (1968)', 5.0),
 (u'Raging Bull (1980)', 5.0),
 (u'Jean de Florette (1986)', 5.0),
 (u'Secrets & Lies (1996)', 5.0),
 (u'My Fair Lady (1964)', 5.0),
 (u'Godfather, The (1972)', 5.0),
 (u'Lawrence of Arabia (1962)', 5.0),
 (u'Enchanted April (1991)', 5.0)]
topK_with_titles = map(lambda x:(titles[x.product], x.rating), topK_movies)
topK_with_titles
[(u"Marvin's Room (1996)", 6.238110881748787),
 (u'Full Monty, The (1997)', 6.168910818255823),
 (u'Boot, Das (1981)', 5.954825359796975),
 (u'Old Yeller (1957)', 5.921099652606939),
 (u'Candidate, The (1972)', 5.670748725720875)]

相似度計算

sampleRDD = sc.parallelize([['A', 111], ['A', 222], ['B', 333]])
sampleRDD.lookup('A')
[111, 222]

使用餘弦相似度來計算電影之間的相似程度,比如我們要計算編號爲456的電影與其他電影的餘弦相似度,首先我們要將電影456的因子提取出來:

arr = cf_model.productFeatures().lookup(456)[0]
arr
array('d', [0.029076049104332924, 0.21009714901447296, 0.0290555227547884, -0.5571964383125305, -0.3824714124202728, -0.4592842161655426, 0.6329585313796997, -0.362333744764328, -0.1305536925792694, 0.8419598340988159, -0.13552409410476685, -0.6138198971748352, 0.02604905515909195, 0.08060657978057861, -0.16706441342830658, -0.3220045566558838, 0.43676093220710754, 0.07212082296609879, 0.16547970473766327, 0.049271613359451294, -0.018478330224752426, 0.4917396306991577, -1.259914517402649, 0.30777591466903687, 0.3512609004974365, -1.1641650199890137, -0.08893561363220215, 0.5041327476501465, -0.5516676902770996, -0.13129214942455292, -0.7094163298606873, -0.095136858522892, 0.0024825106374919415, -0.5574610233306885, 0.6876130104064941, -0.14038291573524475, -0.3861311674118042, 0.08736740052700043, 0.7943630218505859, 1.0195096731185913, 0.49429407715797424, -0.07107719779014587, 0.21480131149291992, -0.572085976600647, 0.030756879597902298, 1.120257019996643, 0.012996670790016651, 0.5901889801025391, 0.9270225167274475, -0.8173779845237732])
# 與lookup效果一樣
cf_model.productFeatures().filter(lambda x:x[0] == 456)\
.map(lambda x:x[1]).first()
array('d', [0.029076049104332924, 0.21009714901447296, 0.0290555227547884, -0.5571964383125305, -0.3824714124202728, -0.4592842161655426, 0.6329585313796997, -0.362333744764328, -0.1305536925792694, 0.8419598340988159, -0.13552409410476685, -0.6138198971748352, 0.02604905515909195, 0.08060657978057861, -0.16706441342830658, -0.3220045566558838, 0.43676093220710754, 0.07212082296609879, 0.16547970473766327, 0.049271613359451294, -0.018478330224752426, 0.4917396306991577, -1.259914517402649, 0.30777591466903687, 0.3512609004974365, -1.1641650199890137, -0.08893561363220215, 0.5041327476501465, -0.5516676902770996, -0.13129214942455292, -0.7094163298606873, -0.095136858522892, 0.0024825106374919415, -0.5574610233306885, 0.6876130104064941, -0.14038291573524475, -0.3861311674118042, 0.08736740052700043, 0.7943630218505859, 1.0195096731185913, 0.49429407715797424, -0.07107719779014587, 0.21480131149291992, -0.572085976600647, 0.030756879597902298, 1.120257019996643, 0.012996670790016651, 0.5901889801025391, 0.9270225167274475, -0.8173779845237732])

提取的電影456的因子以數組形式返回,爲了計算餘弦相似度,需要將其向量化

from pyspark.mllib.linalg import DenseVector

selectedVector = DenseVector(arr)
selectedVector
DenseVector([0.0291, 0.2101, 0.0291, -0.5572, -0.3825, -0.4593, 0.633, -0.3623, -0.1306, 0.842, -0.1355, -0.6138, 0.026, 0.0806, -0.1671, -0.322, 0.4368, 0.0721, 0.1655, 0.0493, -0.0185, 0.4917, -1.2599, 0.3078, 0.3513, -1.1642, -0.0889, 0.5041, -0.5517, -0.1313, -0.7094, -0.0951, 0.0025, -0.5575, 0.6876, -0.1404, -0.3861, 0.0874, 0.7944, 1.0195, 0.4943, -0.0711, 0.2148, -0.5721, 0.0308, 1.1203, 0.013, 0.5902, 0.927, -0.8174])
# 定義餘弦相似度函數
def cosSim(vectorA, vectorB):
    return vectorA.dot(vectorB) / (vectorA.norm(2)*vectorB.norm(2))

cosSim(selectedVector, selectedVector)
1.0

使用map方法將cosSim函數映射到每一個電影的因子上,返回由電影編號和餘弦相似度組成的元組

sims = cf_model.productFeatures()\
.map(lambda x:(x[0], cosSim(selectedVector, DenseVector(x[1]))))
sims.take(5)
[(2, 0.63205163077832793),
 (4, 0.57081651505456033),
 (6, 0.57056619721078805),
 (8, 0.61730808637739021),
 (10, 0.55560185135898443)]

取相似度最高的10部電影

simsTopK = sims.top(10, lambda x:x[1])
simsTopK
[(456, 1.0),
 (1446, 0.77381634899106144),
 (249, 0.75186850153352536),
 (1206, 0.75042081098056868),
 (1028, 0.74412474419118724),
 (1435, 0.7440397393142627),
 (42, 0.73813968356865434),
 (1249, 0.73347222767192244),
 (411, 0.73245706263195443),
 (240, 0.7307674356843924)]

使用top方法和takeOrdered方法的效率比較高,因爲只要將指定的記錄返回就可以了,不需要對所有記錄都進行排序;而使用sortBy方法的執行效率則較低,因爲這要將所有記錄都排序之後再選擇記錄。

sims.takeOrdered(10, lambda x:-x[1])
sims.sortBy(lambda x:x[1], False).take(10)
[(456, 1.0),
 (1446, 0.77381634899106144),
 (249, 0.75186850153352536),
 (1206, 0.75042081098056868),
 (1028, 0.74412474419118724),
 (1435, 0.7440397393142627),
 (42, 0.73813968356865434),
 (1249, 0.73347222767192244),
 (411, 0.73245706263195443),
 (240, 0.7307674356843924)]
map( lambda x:(titles[x[0]], x[1]), simsTopK)
[(u'Beverly Hills Ninja (1997)', 1.0),
 (u'Bye Bye, Love (1995)', 0.77381634899106144),
 (u'Austin Powers: International Man of Mystery (1997)', 0.75186850153352536),
 (u'Amos & Andrew (1993)', 0.75042081098056868),
 (u'Grumpier Old Men (1995)', 0.74412474419118724),
 (u'Steal Big, Steal Little (1995)', 0.7440397393142627),
 (u'Clerks (1994)', 0.73813968356865434),
 (u'For Love or Money (1993)', 0.73347222767192244),
 (u'Nutty Professor, The (1996)', 0.73245706263195443),
 (u'Beavis and Butt-head Do America (1996)', 0.7307674356843924)]

模型驗證

MSE\RMSE\MAE

mllib在其evaluation模塊中提供了相應的RegressionMetrics類,可用於計算MSE、 RMSE和MAE,該類只需要傳入一個由“實際評分-預測評分”組成的RDD即可生成相應對象

actual = ratings.map(lambda r: ((r.user, r.product), r.rating))
prediction = cf_model.predictAll(actual.map(lambda x: x[0]))\
                     .map(lambda r: ((r.user, r.product), r.rating))
actual_prediction = actual.join(prediction)
actual_prediction.take(5)
[((506, 568), (5.0, 4.495298253573846)),
 ((109, 365), (4.0, 4.068188145295891)),
 ((621, 577), (3.0, 3.2018734286795425)),
 ((720, 286), (5.0, 4.963056006543837)),
 ((812, 326), (4.0, 3.999871511192283))]
from pyspark.mllib.evaluation import RegressionMetrics

metrics = RegressionMetrics(actual_prediction.map(lambda x: x[1]))
print 'MSE =', metrics.meanSquaredError
print 'RMSE =', metrics.rootMeanSquaredError
print 'MAE =', metrics.meanAbsoluteError
MSE = 0.0845211871908
RMSE = 0.290725277867
MAE = 0.204405585188

MAP

Mllib在其evaluation模塊中有一個RankingMetrics類,可以很方便地計算PK和MAP。該類需要傳入一個“(Prediction, Labels)”類型的RDD,其中Prediction是某個用戶按模型預測排序的產品列表,Labels爲該用戶實際購買的產品列表

productIDs = cf_model.productFeatures().map(lambda p: p[0]).collect()
productMatrix = cf_model.productFeatures().map(lambda p: p[1]).collect()

產品因子矩陣與每一個用戶因子矩陣做點積計算預測評分,把相對應的電影ID關聯進去之後排序,排序後預測評分就不需要,只保留排過序的電影ID

import numpy as np

estRatings = cf_model.userFeatures()\
.map(lambda x: (x[0], list(np.dot(productMatrix, x[1]))))\
.map(lambda x: (x[0], zip(x[1], productIDs)))\
.map(lambda x: (x[0], sorted(x[1] ,key=(lambda m: m[0]), reverse=True)))\
.map(lambda x: (x[0], [i[1] for i in x[1]]))
estRatings.first()
(2,
 [778,
  530,
  519,
  211,
  521,
  654,
  528,
  671,
  491,
  604,
  936,
  511,
  770,
  48,
  408,
  506,
  482,
  641,
  87,
  498,
  923,
  649,
  191,
  474,
  1194,
  584,
  520,
  507,
  610,
  963,
  241,
  487,
  178,
  45,
  524,
  427,
  144,
  97,
  495,
  187,
  615,
  601,
  493,
  162,
  513,
  132,
  1012,
  509,
  568,
  489,
  504,
  381,
  133,
  855,
  648,
  492,
  212,
  194,
  699,
  199,
  182,
  216,
  127,
  516,
  195,
  124,
  1210,
  526,
  837,
  514,
  510,
  1126,
  57,
  387,
  454,
  275,
  655,
  272,
  251,
  620,
  55,
  735,
  100,
  508,
  589,
  129,
  523,
  59,
  223,
  208,
  44,
  747,
  529,
  661,
  283,
  302,
  612,
  500,
  488,
  183,
  423,
  613,
  82,
  285,
  152,
  180,
  656,
  663,
  525,
  614,
  107,
  116,
  96,
  50,
  603,
  229,
  471,
  169,
  316,
  479,
  79,
  134,
  242,
  742,
  311,
  1020,
  657,
  313,
  736,
  98,
  955,
  1142,
  203,
  606,
  646,
  694,
  684,
  1147,
  527,
  168,
  450,
  402,
  435,
  484,
  174,
  595,
  404,
  462,
  543,
  633,
  65,
  724,
  753,
  480,
  1203,
  712,
  318,
  517,
  1197,
  136,
  126,
  215,
  921,
  872,
  558,
  665,
  151,
  157,
  651,
  317,
  486,
  638,
  664,
  466,
  430,
  847,
  645,
  468,
  233,
  224,
  1046,
  477,
  550,
  591,
  330,
  685,
  193,
  378,
  15,
  618,
  197,
  502,
  515,
  421,
  746,
  531,
  709,
  205,
  915,
  566,
  675,
  1222,
  490,
  9,
  867,
  750,
  189,
  605,
  481,
  188,
  172,
  165,
  58,
  30,
  1074,
  32,
  198,
  693,
  185,
  270,
  428,
  880,
  414,
  110,
  632,
  286,
  705,
  1004,
  1094,
  356,
  739,
  972,
  150,
  33,
  636,
  22,
  367,
  570,
  295,
  226,
  76,
  269,
  6,
  64,
  745,
  900,
  52,
  909,
  159,
  201,
  210,
  445,
  23,
  496,
  28,
  357,
  204,
  690,
  1204,
  14,
  166,
  485,
  4,
  61,
  135,
  303,
  805,
  12,
  483,
  1,
  1152,
  503,
  273,
  348,
  588,
  256,
  153,
  292,
  806,
  227,
  60,
  114,
  499,
  432,
  721,
  265,
  354,
  300,
  277,
  443,
  965,
  279,
  740,
  306,
  209,
  293,
  299,
  25,
  111,
  856,
  304,
  631,
  297,
  228,
  255,
  276,
  161,
  301,
  237,
  660,
  727,
  478,
  1591,
  154,
  1121,
  1124,
  602,
  1119,
  310,
  177,
  494,
  137,
  125,
  335,
  239,
  826,
  1021,
  639,
  617,
  644,
  775,
  282,
  13,
  8,
  958,
  176,
  429,
  1286,
  729,
  616,
  213,
  380,
  340,
  732,
  449,
  969,
  676,
  470,
  99,
  1109,
  951,
  20,
  69,
  149,
  257,
  1269,
  512,
  1134,
  234,
  261,
  1167,
  903,
  119,
  879,
  1039,
  754,
  628,
  245,
  611,
  716,
  956,
  522,
  121,
  284,
  89,
  246,
  11,
  959,
  71,
  53,
  56,
  962,
  1172,
  434,
  647,
  650,
  441,
  1221,
  686,
  436,
  88,
  929,
  653,
  1284,
  703,
  338,
  1421,
  1063,
  95,
  416,
  382,
  16,
  883,
  702,
  622,
  835,
  1016,
  1454,
  252,
  898,
  672,
  1007,
  349,
  730,
  77,
  364,
  790,
  141,
  322,
  844,
  794,
  202,
  21,
  181,
  171,
  344,
  1592,
  914,
  1285,
  196,
  697,
  329,
  924,
  31,
  848,
  334,
  995,
  1056,
  26,
  576,
  714,
  463,
  1238,
  578,
  164,
  92,
  1098,
  692,
  1176,
  536,
  326,
  708,
  1169,
  980,
  781,
  460,
  941,
  546,
  1105,
  336,
  874,
  744,
  768,
  887,
  54,
  186,
  148,
  179,
  765,
  1448,
  207,
  882,
  86,
  537,
  1381,
  222,
  117,
  219,
  371,
  258,
  674,
  459,
  813,
  1268,
  458,
  501,
  372,
  845,
  411,
  553,
  1298,
  1400,
  39,
  1101,
  2,
  1050,
  418,
  925,
  1078,
  67,
  715,
  1019,
  1451,
  549,
  108,
  268,
  662,
  966,
  807,
  113,
  1558,
  433,
  232,
  175,
  49,
  701,
  722,
  562,
  1070,
  444,
  942,
  811,
  1136,
  1161,
  993,
  796,
  145,
  403,
  83,
  305,
  944,
  881,
  792,
  73,
  979,
  1125,
  627,
  173,
  554,
  280,
  1042,
  1149,
  773,
  131,
  238,
  573,
  1097,
  1278,
  1439,
  1099,
  1456,
  973,
  621,
  190,
  707,
  977,
  710,
  1065,
  331,
  1062,
  619,
  362,
  876,
  200,
  94,
  1086,
  696,
  836,
  192,
  939,
  42,
  274,
  1251,
  323,
  1218,
  975,
  281,
  291,
  308,
  290,
  659,
  1281,
  945,
  155,
  365,
  1264,
  393,
  263,
  405,
  288,
  1263,
  399,
  593,
  1245,
  717,
  425,
  278,
  1516,
  298,
  540,
  1224,
  109,
  78,
  1258,
  467,
  1242,
  287,
  289,
  3,
  406,
  748,
  1449,
  1280,
  70,
  72,
  312,
  420,
  1184,
  461,
  679,
  75,
  533,
  262,
  575,
  850,
  624,
  307,
  865,
  954,
  396,
  1005,
  846,
  1470,
  1073,
  296,
  1123,
  1185,
  1135,
  623,
  1131,
  102,
  417,
  1225,
  791,
  557,
  821,
  896,
  19,
  1277,
  347,
  236,
  863,
  81,
  497,
  658,
  789,
  904,
  779,
  473,
  762,
  475,
  535,
  786,
  472,
  843,
  230,
  118,
  106,
  968,
  1192,
  829,
  248,
  206,
  332,
  832,
  642,
  928,
  419,
  946,
  385,
  585,
  447,
  1153,
  350,
  1084,
  170,
  580,
  749,
  761,
  961,
  266,
  267,
  698,
  560,
  1189,
  1406,
  351,
  41,
  917,
  1615,
  577,
  1303,
  949,
  1171,
  783,
  327,
  346,
  18,
  455,
  1368,
  563,
  1527,
  142,
  1450,
  572,
  1148,
  559,
  337,
  1295,
  1555,
  785,
  864,
  552,
  854,
  609,
  355,
  156,
  1009,
  808,
  713,
  366,
  934,
  1311,
  608,
  731,
  797,
  1025,
  1367,
  66,
  1091,
  1102,
  51,
  1643,
  1141,
  476,
  947,
  937,
  728,
  46,
  803,
  596,
  1605,
  138,
  902,
  1229,
  1656,
  505,
  1137,
  889,
  1331,
  873,
  986,
  1462,
  1411,
  24,
  1143,
  918,
  802,
  853,
  833,
  1397,
  1211,
  755,
  834,
  221,
  1045,
  680,
  1220,
  542,
  587,
  931,
  793,
  386,
  1428,
  339,
  1518,
  320,
  1122,
  40,
  990,
  464,
  737,
  1388,
  1092,
  448,
  652,
  1407,
  63,
  469,
  824,
  1288,
  812,
  1265,
  43,
  943,
  878,
  888,
  1193,
  1228,
  1011,
  782,
  392,
  1118,
  123,
  1186,
  400,
  1312,
  579,
  637,
  842,
  271,
  1475,
  1617,
  1205,
  1248,
  105,
  899,
  1044,
  328,
  1300,
  983,
  1048,
  115,
  967,
  1107,
  1200,
  800,
  953,
  333,
  706,
  1103,
  905,
  410,
  825,
  321,
  927,
  583,
  912,
  231,
  352,
  919,
  1138,
  764,
  1379,
  752,
  815,
  146,
  1116,
  91,
  167,
  629,
  534,
  877,
  970,
  1079,
  689,
  1441,
  1014,
  218,
  1064,
  532,
  1090,
  1322,
  1434,
  85,
  431,
  1333,
  1058,
  1150,
  249,
  1060,
  160,
  143,
  5,
  1512,
  827,
  518,
  128,
  630,
  1531,
  886,
  960,
  799,
  1139,
  1188,
  561,
  1195,
  787,
  1187,
  670,
  1035,
  1483,
  1444,
  801,
  10,
  147,
  1159,
  626,
  1261,
  922,
  1620,
  62,
  908,
  933,
  1262,
  673,
  564,
  1082,
  1473,
  734,
  1267,
  607,
  345,
  1337,
  1040,
  769,
  47,
  592,
  1508,
  1059,
  971,
  667,
  453,
  695,
  1282,
  1378,
  1168,
  831,
  741,
  1223,
  220,
  122,
  1468,
  1296,
  948,
  996,
  1534,
  751,
  1293,
  1226,
  823,
  84,
  938,
  809,
  1594,
  766,
  733,
  1299,
  891,
  1445,
  452,
  1369,
  415,
  1061,
  1232,
  395,
  935,
  1026,
  1436,
  1217,
  1214,
  394,
  994,
  718,
  820,
  1010,
  1128,
  341,
  1071,
  691,
  407,
  1305,
  1372,
  998,
  885,
  952,
  1053,
  1355,
  597,
  859,
  1266,
  1067,
  625,
  1006,
  1443,
  700,
  569,
  163,
  723,
  1208,
  ...])

用戶實際評分的電影,要生成的是key-value類型的RDD

userMovies = ratings.map(lambda r:(r.user, r.product))\
                    .groupByKey()\
                    .mapValues(list)
userMovies.first()
(2,
 [237,
  300,
  100,
  127,
  285,
  289,
  304,
  272,
  278,
  288,
  286,
  275,
  302,
  296,
  292,
  251,
  50,
  314,
  297,
  290,
  312,
  281,
  13,
  280,
  303,
  308,
  307,
  257,
  316,
  315,
  301,
  313,
  279,
  299,
  298,
  19,
  277,
  282,
  111,
  258,
  295,
  242,
  283,
  276,
  1,
  305,
  14,
  287,
  291,
  293,
  294,
  310,
  309,
  306,
  25,
  273,
  10,
  311,
  269,
  255,
  284,
  274])

userMovies與estRatings具有相似的內部結構,將這兩個RDD連接起來,再提取其中的預測結果和實際結果,就構成了需要傳入RankingMetrics類的RDD

predictionAndLabels = estRatings.join(userMovies).map(lambda x: x[1])
predictionAndLabels.first()
([42,
  1073,
  474,
  171,
  188,
  177,
  60,
  150,
  180,
  513,
  530,
  89,
  462,
  98,
  199,
  652,
  168,
  427,
  523,
  512,
  127,
  174,
  55,
  357,
  663,
  654,
  56,
  203,
  318,
  183,
  198,
  527,
  286,
  11,
  346,
  50,
  273,
  186,
  97,
  176,
  179,
  211,
  639,
  657,
  170,
  522,
  603,
  276,
  511,
  942,
  169,
  14,
  45,
  33,
  64,
  285,
  484,
  59,
  856,
  224,
  641,
  1194,
  520,
  165,
  475,
  17,
  492,
  172,
  702,
  190,
  185,
  156,
  12,
  480,
  435,
  166,
  724,
  615,
  372,
  493,
  518,
  9,
  134,
  135,
  48,
  955,
  154,
  1142,
  187,
  195,
  483,
  311,
  505,
  661,
  205,
  269,
  515,
  61,
  721,
  410,
  189,
  659,
  69,
  116,
  96,
  509,
  238,
  963,
  223,
  648,
  216,
  246,
  202,
  197,
  490,
  919,
  6,
  847,
  1010,
  253,
  482,
  111,
  558,
  182,
  429,
  404,
  162,
  516,
  315,
  528,
  92,
  478,
  175,
  479,
  32,
  81,
  443,
  489,
  789,
  497,
  960,
  421,
  173,
  209,
  459,
  178,
  316,
  268,
  481,
  628,
  979,
  498,
  770,
  52,
  408,
  108,
  153,
  608,
  466,
  93,
  1007,
  114,
  129,
  693,
  234,
  1070,
  444,
  317,
  251,
  713,
  649,
  137,
  638,
  124,
  193,
  647,
  660,
  813,
  432,
  557,
  923,
  507,
  7,
  100,
  469,
  762,
  709,
  194,
  524,
  272,
  651,
  529,
  735,
  921,
  144,
  1021,
  3,
  632,
  593,
  526,
  250,
  903,
  53,
  504,
  501,
  279,
  604,
  136,
  200,
  922,
  503,
  181,
  293,
  301,
  26,
  544,
  428,
  746,
  344,
  463,
  184,
  67,
  506,
  510,
  499,
  525,
  191,
  1009,
  196,
  508,
  265,
  71,
  306,
  430,
  43,
  82,
  236,
  467,
  160,
  128,
  566,
  302,
  637,
  1,
  141,
  248,
  152,
  640,
  23,
  287,
  971,
  221,
  242,
  1459,
  1245,
  13,
  425,
  496,
  570,
  750,
  614,
  650,
  1401,
  1067,
  83,
  1238,
  1048,
  207,
  611,
  210,
  653,
  633,
  1114,
  546,
  792,
  1109,
  159,
  433,
  631,
  1005,
  993,
  206,
  864,
  99,
  607,
  642,
  1019,
  161,
  471,
  946,
  1240,
  612,
  79,
  531,
  562,
  24,
  22,
  636,
  132,
  671,
  218,
  117,
  51,
  305,
  925,
  262,
  613,
  115,
  208,
  616,
  192,
  708,
  461,
  1197,
  514,
  1286,
  487,
  1134,
  580,
  65,
  491,
  634,
  458,
  780,
  965,
  975,
  837,
  310,
  163,
  249,
  256,
  582,
  347,
  151,
  936,
  58,
  4,
  763,
  943,
  382,
  327,
  295,
  239,
  664,
  416,
  106,
  10,
  673,
  584,
  464,
  778,
  645,
  549,
  727,
  39,
  697,
  447,
  331,
  257,
  794,
  1126,
  753,
  95,
  854,
  764,
  896,
  646,
  952,
  625,
  396,
  455,
  255,
  20,
  630,
  705,
  622,
  1018,
  710,
  601,
  364,
  1011,
  320,
  537,
  1012,
  488,
  741,
  204,
  655,
  882,
  86,
  237,
  28,
  282,
  596,
  519,
  241,
  1039,
  80,
  1120,
  902,
  956,
  76,
  486,
  214,
  1226,
  521,
  924,
  201,
  576,
  8,
  844,
  543,
  436,
  226,
  1103,
  712,
  806,
  554,
  1099,
  1149,
  1098,
  718,
  939,
  244,
  441,
  1143,
  460,
  232,
  915,
  751,
  420,
  1017,
  431,
  126,
  980,
  1203,
  744,
  19,
  494,
  694,
  46,
  87,
  959,
  1065,
  228,
  367,
  1059,
  736,
  324,
  386,
  16,
  418,
  947,
  385,
  684,
  609,
  1020,
  303,
  57,
  448,
  707,
  1118,
  77,
  1093,
  215,
  452,
  15,
  587,
  793,
  904,
  1251,
  875,
  610,
  550,
  737,
  381,
  365,
  588,
  101,
  109,
  945,
  849,
  284,
  569,
  761,
  1113,
  495,
  536,
  133,
  167,
  30,
  1195,
  665,
  148,
  972,
  863,
  434,
  334,
  644,
  277,
  534,
  591,
  380,
  113,
  624,
  47,
  121,
  485,
  517,
  41,
  1221,
  1008,
  1478,
  125,
  619,
  855,
  31,
  1101,
  213,
  470,
  139,
  384,
  291,
  1267,
  333,
  1050,
  94,
  679,
  147,
  267,
  887,
  275,
  733,
  307,
  229,
  732,
  1176,
  820,
  928,
  1047,
  900,
  1062,
  1107,
  739,
  1172,
  629,
  345,
  445,
  502,
  730,
  1449,
  774,
  164,
  414,
  953,
  969,
  1192,
  818,
  865,
  824,
  330,
  1046,
  772,
  340,
  1524,
  70,
  606,
  90,
  589,
  456,
  297,
  1111,
  411,
  1124,
  1097,
  695,
  620,
  283,
  88,
  740,
  233,
  635,
  532,
  131,
  1131,
  313,
  157,
  258,
  618,
  810,
  1119,
  1112,
  423,
  595,
  288,
  656,
  1115,
  1171,
  696,
  290,
  941,
  747,
  1248,
  1085,
  1081,
  1068,
  621,
  473,
  107,
  91,
  298,
  54,
  1117,
  378,
  836,
  1218,
  1028,
  1116,
  745,
  686,
  966,
  602,
  1147,
  825,
  227,
  805,
  44,
  675,
  339,
  841,
  73,
  321,
  961,
  755,
  949,
  866,
  872,
  1110,
  831,
  358,
  328,
  476,
  1188,
  393,
  843,
  886,
  252,
  212,
  940,
  535,
  356,
  1063,
  343,
  217,
  977,
  49,
  930,
  355,
  389,
  1206,
  235,
  123,
  933,
  1161,
  1077,
  281,
  909,
  883,
  219,
  1404,
  729,
  230,
  105,
  823,
  1512,
  680,
  1335,
  592,
  568,
  1042,
  27,
  1208,
  967,
  1058,
  222,
  748,
  605,
  1129,
  533,
  1169,
  859,
  715,
  809,
  685,
  995,
  786,
  1312,
  62,
  240,
  692,
  155,
  719,
  1123,
  1198,
  402,
  1006,
  1296,
  1024,
  145,
  292,
  1187,
  274,
  1139,
  1375,
  1451,
  2,
  583,
  359,
  1136,
  245,
  1153,
  968,
  578,
  322,
  1298,
  335,
  371,
  1315,
  130,
  581,
  63,
  1056,
  781,
  962,
  329,
  874,
  728,
  472,
  808,
  898,
  840,
  158,
  754,
  465,
  765,
  964,
  577,
  1284,
  833,
  742,
  714,
  899,
  552,
  309,
  1269,
  366,
  1150,
  950,
  68,
  563,
  835,
  704,
  477,
  401,
  658,
  122,
  895,
  450,
  1199,
  1170,
  1086,
  905,
  1200,
  1045,
  149,
  1244,
  722,
  1074,
  1421,
  351,
  1281,
  929,
  540,
  689,
  5,
  412,
  564,
  319,
  701,
  280,
  260,
  674,
  1051,
  326,
  749,
  1035,
  25,
  559,
  1311,
  337,
  1324,
  888,
  573,
  690,
  547,
  829,
  1231,
  1222,
  312,
  352,
  1220,
  561,
  1078,
  1431,
  300,
  118,
  662,
  585,
  278,
  387,
  1592,
  500,
  403,
  790,
  845,
  985,
  102,
  1501,
  802,
  468,
  1263,
  782,
  811,
  867,
  1132,
  1434,
  698,
  399,
  720,
  373,
  1473,
  1015,
  1152,
  1121,
  270,
  1211,
  1137,
  1168,
  1184,
  906,
  1016,
  627,
  958,
  804,
  879,
  38,
  723,
  299,
  84,
  342,
  332,
  66,
  405,
  1204,
  453,
  1411,
  294,
  997,
  85,
  1022,
  556,
  1367,
  1125,
  120,
  379,
  1072,
  848,
  876,
  1243,
  395,
  853,
  266,
  388,
  362,
  912,
  1210,
  572,
  1069,
  1060,
  991,
  670,
  1127,
  1041,
  1301,
  934,
  415,
  1388,
  914,
  597,
  880,
  1157,
  752,
  1558,
  231,
  336,
  406,
  970,
  1479,
  862,
  725,
  304,
  289,
  1066,
  296,
  1052,
  1264,
  926,
  738,
  1278,
  551,
  1368,
  699,
  1230,
  869,
  350,
  1091,
  1288,
  419,
  743,
  1023,
  72,
  1379,
  892,
  787,
  119,
  672,
  1597,
  797,
  271,
  1166,
  1495,
  785,
  717,
  1255,
  574,
  1141,
  451,
  951,
  354,
  1004,
  944,
  1080,
  889,
  1084,
  143,
  1090,
  369,
  916,
  617,
  1178,
  881,
  1462,
  773,
  353,
  308,
  1228,
  1189,
  1025,
  338,
  1303,
  449,
  18,
  771,
  873,
  363,
  1537,
  1071,
  937,
  1560,
  1305,
  555,
  984,
  1225,
  801,
  1014,
  815,
  779,
  264,
  541,
  1232,
  538,
  1277,
  1095,
  716,
  1223,
  1268,
  1273,
  800,
  548,
  1160,
  ...],
 [273,
  258,
  286,
  183,
  50,
  325,
  1238,
  186,
  265,
  23,
  1,
  198,
  318,
  11,
  1459,
  313,
  97,
  191,
  527,
  302,
  56])

計算PK和MAP

from pyspark.mllib.evaluation import RankingMetrics

rankingMetrics = RankingMetrics(predictionAndLabels)
print 'MAP =', rankingMetrics.meanAveragePrecision
print 'PrecisionAtK =', rankingMetrics.precisionAt(20) 
MAP = 0.192062343904
PrecisionAtK = 0.182025450689

模型持久化

cf_model.save(sc, './cf_model')
from pyspark.mllib.recommendation import MatrixFactorizationModel

load_cf_model = MatrixFactorizationModel.load(sc, './cf_model')
load_cf_model.predict(123, 456)
0.5189201089615622
發佈了139 篇原創文章 · 獲贊 32 · 訪問量 2萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章