如何使用ML.NET構建推薦系統?

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推薦系統無處不在,從 Netflix、谷歌、亞馬遜到小型網店,都能看到它的身影。實際上,推薦系統可能是機器學習最成功的商業應用之一。它具備預測用戶喜歡閱讀、觀看和購買什麼的能力。推薦系統不僅對企業有益,對用戶也是有益的。對用戶而言,推薦系統提供了一種探索產品空間的方式;對企業而言,推薦系統提供了一種增加用戶參與的方式,同時也能讓企業對客戶有更多的瞭解。本文中,我們將瞭解如何使用 ML.NET 來創建推薦系統。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文主要包括以下五個部分:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據集和前提"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推薦系統的類型"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"協同過濾直覺"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"矩陣分解直覺"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用 ML.NET 實現"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"1. 數據集和前提"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大家都喜歡 Netflix,其中一個原因就是他們的推薦做得很好,這家公司已經在推薦系統方面投入了大量資金。Netflix 因 “Netflix Prize”競賽而聞名,工程師們在沒有關於用戶或電影的其他信息的情況下,根據先前的評級,試圖預測用戶對電影的評級,他們甚至提供了一個數據集,這個數據集包括 480189 個用戶給 17770 部電影的 100480507 個評級。每個樣本在數據集中被格式化爲每組四個特徵:用戶 ID、電影 ID、評級、評級日期。用戶 ID 和電影 ID 的特徵是整數 ID,而評級則是從 1 到 5。這些數據看起來像這樣:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/03\/95\/0327721667b8dff17b38e8ea7995a695.jpeg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文的實現用 C# 語言完成,我們使用最新的 "},{"type":"text","marks":[{"type":"strong"}],"text":".NET 5"},{"type":"text","text":",因此要確保你已安裝此 SDK。若你正在使用 Visual Studio,則隨附 16.8.3 版本。此外,確保你已安裝下列軟件包:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"$ dotnet add package Microsoft.ML\n$ dotnet add package Microsoft.ML.Recommender\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"你可以在 Package Manager Console 中執行相同操作:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"Install-Package Microsoft.ML\nInstall-Package Microsoft.ML.Recommendation\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"注意,這也會安裝默認的 Microsoft .ML 包。你可以使用 Visual Studio 的 Manage NuGetPackage 選項來執行類似操作:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/c0\/84\/c074f3a72f21056b98ce649911684084.jpeg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假如你想了解使用 ML.NET 進行機器學習的基本知識,請看這篇文章:《"},{"type":"link","attrs":{"href":"https:\/\/rubikscode.net\/2021\/01\/04\/machine-learning-with-ml-net-introduction\/","title":"xxx","type":null},"content":[{"type":"text","text":"使用 ML.NET 進行機器學習:簡介"}]},{"type":"text","text":"》(Machine Learning with ML.NET – Introduction)"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2. 推薦系統的類型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如前所述,Netflix 的數據集包含用戶如何爲電影評級的信息。在此基礎上,我們如何爲該用戶創建推薦呢?在推薦同類項目前,我們需要考慮用戶看過的電影的一些特徵並進行排名。此外,我們可以基於這些排名尋找相似用戶,並推薦這些用戶購買的項目。但這兩個項目的相似意味着什麼呢?用戶之間的相似意味着什麼呢?怎樣用數學術語計算和表達這種相似呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對這些問題,不同類型的推薦系統採用不同的方法。通常有四種推薦系統:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"基於內容的推薦系統"},{"type":"text","text":":這種類型的推薦系統側重於內容。也就是說,它們只使用項目的特徵和信息,並在此基礎上爲用戶創建推薦。它們忽略了其他用戶的信息。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"協同過濾推薦系統"},{"type":"text","text":":推薦系統最強大的地方在於,它們可以根據用戶在特定平臺上的行爲,或者根據同一平臺上其他用戶的行爲,來爲用戶推薦項目。舉例來說,Netflix 會根據你以前看過的電視劇,在用戶和你觀看並喜歡相同內容的電視劇的基礎上,爲你推薦下一部精彩的電視劇。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"基於知識的推薦系統"},{"type":"text","text":":這種類型的推薦系統使用明確的用戶偏好、項目以及推薦標註知識。本例中,推薦系統詢問用戶的偏好,並基於這些反饋建立推薦。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"混合解決方案推薦系統"},{"type":"text","text":":通常情況下,我們在一些定製的解決方案中使用所有類型的組合。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你想進一步瞭解這些系統的工作原理,請參閱這篇文章《"},{"type":"link","attrs":{"href":"https:\/\/rubikscode.net\/2020\/04\/13\/introduction-to-recommendation-systems\/","title":"xxx","type":null},"content":[{"type":"text","text":"推薦系統簡介"}]},{"type":"text","text":"》(Introduction to Recommendation Systems)。在這三種類型中,前兩種使用最頻繁,也最受歡迎。在實踐中,可能出現的情況是,我們創建了混合解決方案來實現更好的結果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ML.NET 僅支持協同過濾,或者更確切地說:矩陣分解。因此,在本文中,我們將重點放在這種類型的推薦系統上。讓我們進一步瞭解這些系統是如何在幕後運作的。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"3. 協同過濾直覺"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"創建推薦系統的最流行的技術之一是協同過濾。不像基於內容的過濾,這種方法把用戶和項目置於一個共同的嵌入空間中,沿着它們共有的維度(觀看特徵)。舉例來說,讓我們考慮兩個 Netflix 用戶及其對節目的評級。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/b7\/13\/b7826a4a207576543a6dc73405c6bc13.jpeg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 TensorFlow 中,我們可以這樣表示(別擔心,我們不會研究 TensorFlow 的細節,只是舉例說明):"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"users_tv_shows = tf.constant([\n [10, 2, 0, 0, 0, 6],\n [0, 1, 0, 2, 10, 0]],dtype=tf.float32)\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們現在可以獲取每個節目的特徵,也就是這一類型的 k-hot 編碼:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/be\/79\/be3ce372fa17c68dc517a55f14f53479.jpeg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"或者在 TensorFlow 中:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"tv_shows_features = tf.constant([\n [0, 0, 1, 0, 1],\n [1, 0, 0, 0, 0],\n [0, 1, 0, 1, 0],\n [0, 0, 1, 0, 0],\n [0, 1, 0, 0, 0]],dtype=tf.float32)\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這樣就可以對這些矩陣進行簡單的點乘,得到每個用戶的相似度:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"users_features = tf.matmul(users_tv_shows, tv_shows_features)\nusers_features = users_features\/tf.reduce_sum(users_features, axis=1, keepdims=True)\n\ntop_users_features = tf.nn.top_k(users_features, num_feats)[1]\nfor i in range(num_users):\n feature_names = [features[int(index)] for index in top_users_features[i]]\n print('{}: {}'.format(users[i],feature_names))\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"User1: ['Comedy', 'Drama', 'Sci-Fi', 'Action', 'Cartoon']\nUser2: ['Comedy', 'Sci-Fi', 'Cartoon', 'Action', 'Drama']\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們可以看到,這兩個用戶的首要特徵是喜劇,這意味着他們喜歡相似的東西。我們在這裏做了什麼?我們不只是用所提到的流派來描述項目,而是用相同的術語來描述每個用戶。比如,用戶 1 的意思是她喜歡喜劇片 0.5,而他喜歡動作片 0.1。注意,如果我們將用戶的嵌入矩陣與轉置項目嵌入矩陣相乘,我們會重新創建用戶 - 項目交互矩陣。現在,這對於用戶和項目很少的簡單例子來說效果很好。但是,當向系統添加更多項目和用戶時,它就變得不可擴展了。另外,我們如何確定我們所選擇的特徵是否相關?假如存在一些我們無法識別的潛在特徵呢?我們怎樣才能找到正確的特徵?要想解決這些問題,就需要先了解矩陣分解。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"4. 矩陣分解直覺"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們提到,人爲定義的項目和用戶特徵在整體上可能並不是最佳選擇。幸好,這些嵌入可以從數據中學習。也就是說,我們不會手工地給項目和用戶分配特徵,而是用用戶 - 項目交互矩陣來學習最佳因子分解的潛在因子。與之前的思維練習一樣,這個過程會產生一個用戶因子嵌入和項目因子嵌入矩陣。從技術上講,我們對稀疏的用戶 - 項目交互矩陣進行了壓縮,提取了潛在因子(類似主成分分析)。這就是矩陣因子分解的意義所在,它能將一個矩陣因子分解成兩個較小的矩陣,利用這些矩陣,我們可以重建原始矩陣:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"你的內容放在這裏,在模塊內容設置中編輯或刪除此文本。你還可以在模塊的設計設置中對這個內容的各個方面進行樣式化,甚至在模塊的高級設置中將自定義 CSS 應用到這個文本。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/16\/80\/160a7593ff48a73a20cf43d3a7062080.jpeg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"katexinline","attrs":{"mathString":" A \\approx U \\times V^{T} "}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"像其他降維技術一樣,潛在特徵的數量也是一個超參數,我們可以對其進行調整,使其在信息壓縮和重建誤差之間取得平衡。我們有兩種方法進行預測,可採用用戶與項目因子的點積,或項目與用戶因子的點積。矩陣分解法也可以幫助我們求解另一個問題。假設你的系統有數千個用戶,你希望計算它們之間的相似性矩陣,這是一個相當大的矩陣,矩陣分解能爲我們壓縮這些信息。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.1 矩陣分解算法"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數年前,Netflix 舉辦了一場價值 100 萬美元的推薦系統競賽。目標是根據用戶的評級來改進系統的準確性。獲勝者使用了奇異值分解算法來獲得最佳結果。這種算法現在仍然很流行。形式上,可以這樣定義:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"設 A 是一個 m×n 矩陣,則 A 的奇異值分解(Singular Value Decomposition,SVD)爲:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"katexinline","attrs":{"mathString":" A=U \\Sigma V^{T} "}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中 U 是 m×m 且正交,V 是 n×n 且正交,Σ 是一個 m×n 的對角矩陣,具有非負對角線項 σ1≥σ2≥--≥σp,p=min{m,n},稱爲 A 的奇異值。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另一種非常流行的算法是交替最小二乘法或 ALS 及其變體。正如它的名字一樣,該算法在保持 V 不變的情況下交替求解 U,然後在保持 U 不變的情況下求解 V,僅適用於最小二乘問題。但是,因爲它是專用的,ALS 可以並行化,所以算法非常快速。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/8b\/36\/8b9837e05b89895f903c55fc6b38a836.jpeg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"它的一個變體是加權交替最小二乘法或 WALS。其區別在於處理缺失數據的方式。正如我們在以前的文章中多次提到的那樣,推薦系統最大的敵人之一是稀疏數據。WALS 爲特定項目增加權重,並使用這些權重向量,它們可以通過線性或指數縮放來規範行和 \/ 或列的頻率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在矩陣分解中,NMF 是另一種常用算法。它表示非負矩陣分解。該算法基於獲取低秩表示的矩陣,它包含有非負或正元素。NMF 使用迭代過程來修改 U 和 V 的初始值,使其積接近 V。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/70\/b1\/70375f536054e646e673f5304f1ef1b1.jpeg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"5 用 ML.NET 實現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ML.NET 目前僅支持帶有隨機梯度下降的標準矩陣分解。正如我們稍後將看到的,MatrixFactorization Trainer 支持這一點。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5.1 高層架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在深入研究這個實現之前,讓我們先來考慮該實現的高層架構。正如之前的 ML.NET 指南一樣,我們想要建立一個易於擴展的解決方案,並且我們能夠輕易地擴展出 ML.NET 未來可能包含的新矩陣分解算法。本文所提供的解決方案是 Auto ML 的一種簡單形式。下面顯示了我們的解決方案的文件夾結構法:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/45\/87\/45fa57360b7a85c5ae8dea105d706c87.jpeg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Data 文件夾包含輸入數據的 .csv,MachineLearning 文件夾包含我們使用算法所需的所有東西。可以這樣表示架構概述:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/4c\/cb\/4c6cf8cd955c3f259aa1yy8b0d6b5bcb.jpeg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該解決方案的核心是一個抽象的 TrainerBase 類。這個類位於 Common 文件夾中,其主要目標是規範整個過程的完成方式。這是一個處理數據和進行特徵工程的類,此類還負責訓練機器學習算法。實現這個抽象類的類位於 Trainers 文件夾中,在這裏我們可以找到多個利用 ML.NET 算法的類,這些類定義了應該使用哪種算法。在這種特殊情況下,我們只有一個 Predictor,位於 Predictor 文件夾中。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5.2 數據模型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"要從數據集中加載數據,並使用 ML.NET 算法,需要實現類來建模這些數據。在 Data 文件夾中可以找到兩個文件:MovieRating 和 MovieRatingPredictions。MovieRating 類爲輸入數據建模,看起來如下所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"using Microsoft.ML.Data;\n\nnamespace RecommendationSystem.MachineLearning.DataModels\n{\n public class MovieRating\n {\n [LoadColumn(0)] \n public int UserId;\n\n [LoadColumn(1)] \n public int MovieId;\n\n [LoadColumn(2)] \n public float Label;\n }\n}\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"正如你所看到的,我們不使用來自數據集的數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MovieRatingPredictions 類爲輸出數據建模:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"namespace RecommendationSystem.MachineLearning.DataModels\n{\n public class MovieRatingPrediction\n {\n public float Label;\n public float Score;\n }\n}\n"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5.3 TrainerBase 和 ITrainerBase"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個類是該實現的核心。從本質上講,它有兩個部分:一個是用於描述該類的接口;另一個是需要用具體實現重寫的抽象類,不過,它實現了接口方法。下面是 ITrainerBase 的接口:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"using Microsoft.ML.Data;\n\nnamespace RandomForestClassification.MachineLearning.Common\n{\n public interface ITrainerBase\n {\n string Name { get; }\n void Fit(string trainingFileName);\n BinaryClassificationMetrics Evaluate();\n void Save();\n }\n}\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該接口由 TrainerBase 類實現。但是,它是抽象的,因爲我們希望注入特定的算法:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"using Microsoft.ML;\nusing Microsoft.ML.Data;\nusing Microsoft.ML.Trainers;\nusing Microsoft.ML.Trainers.Recommender;\nusing Microsoft.ML.Transforms;\nusing RecommendationSystem.MachineLearning.DataModels;\nusing System;\nusing System.IO;\n\nnamespace RecommendationSystem.MachineLearning.Common\n{\n \/\/\/ \n \/\/\/ Base class for Trainers.\n \/\/\/ This class exposes methods for training, evaluating and saving ML Models.\n \/\/\/ \n public abstract class TrainerBase : ITrainerBase\n {\n public string Name { get; protected set; }\n \n protected static string ModelPath => Path.Combine(AppContext.BaseDirectory, \n \"recommender.mdl\");\n\n protected readonly MLContext MlContext;\n\n protected DataOperationsCatalog.TrainTestData _dataSplit;\n protected ITrainerEstimator _model;\n protected ITransformer _trainedModel;\n\n protected TrainerBase()\n {\n MlContext = new MLContext(111);\n }\n\n \/\/\/ \n \/\/\/ Train model on defined data.\n \/\/\/ \n \/\/\/ \n public void Fit(string trainingFileName)\n {\n if (!File.Exists(trainingFileName))\n {\n throw new FileNotFoundException($\"File {trainingFileName} doesn't exist.\");\n }\n\n _dataSplit = LoadAndPrepareData(trainingFileName);\n var dataProcessPipeline = BuildDataProcessingPipeline();\n var trainingPipeline = dataProcessPipeline.Append(_model);\n\n _trainedModel = trainingPipeline.Fit(_dataSplit.TrainSet);\n }\n\n \/\/\/ \n \/\/\/ Evaluate trained model.\n \/\/\/ \n \/\/\/ RegressionMetrics object.\n public RegressionMetrics Evaluate()\n {\n var testSetTransform = _trainedModel.Transform(_dataSplit.TestSet);\n\n return MlContext.Regression.Evaluate(testSetTransform);\n }\n\n \/\/\/ \n \/\/\/ Save Model in the file.\n \/\/\/ \n public void Save()\n {\n MlContext.Model.Save(_trainedModel, _dataSplit.TrainSet.Schema, ModelPath);\n }\n\n \/\/\/ \n \/\/\/ Feature engeneering and data pre-processing.\n \/\/\/ \n \/\/\/ Data Processing Pipeline.\n private EstimatorChain BuildDataProcessingPipeline()\n {\n var dataProcessPipeline = MlContext.Transforms.Conversion.MapValueToKey(\n inputColumnName: \"UserId\",\n outputColumnName: \"UserIdEncoded\")\n .Append(MlContext.Transforms.Conversion.MapValueToKey(\n inputColumnName: \"MovieId\",\n outputColumnName: \"MovieIdEncoded\"))\n .AppendCacheCheckpoint(MlContext);\n\n return dataProcessPipeline;\n }\n\n private DataOperationsCatalog.TrainTestData LoadAndPrepareData(string trainingFileName)\n {\n IDataView trainingDataView = MlContext.Data.LoadFromTextFile\n (trainingFileName, hasHeader: true, separatorChar: ',');\n return MlContext.Data.TrainTestSplit(trainingDataView, testFraction: 0.1);\n }\n }\n}\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這是一個大類,它控制着整個過程。我們把它拆開,看看它到底是怎麼回事。首先,我們觀察這個類的字段和屬性:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"public string Name { get; protected set; }\n \n protected static string ModelPath => Path.Combine(AppContext.BaseDirectory, \n \"recommender.mdl\");\n\n protected readonly MLContext MlContext;\n\n protected DataOperationsCatalog.TrainTestData _dataSplit;\n protected ITrainerEstimator _model;\n protected ITransformer _trainedModel;\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"繼承該屬性的類使用 Name 屬性爲算法添加名稱。ModelPath 字段用於定義模型訓練完成後將其存儲在何處。注意,文件名的擴展名是 .mdl。接下來是 MlContext,以便我們能夠使用 ML.NET 的功能。不要忘記,這個類是一個單例,因此在我們的解決方案中只有一個。_dataSplit 字段包含加載的數據。該結構將數據分割成訓練數據集和測試數據集。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"子類將使用字段 _model。這些類定義了該字段中使用哪種機器學習算法。_trainedModel 字段是結果模型,應該對其進行評估和保存。從本質上講,繼承和實現此類的唯一工作是定義應該使用的算法,通過實例化作爲 _model 的所需算法的對象。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在讓我們來探索 Fit() 方法:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"public void Fit(string trainingFileName)\n{\n if (!File.Exists(trainingFileName))\n {\n throw new FileNotFoundException($\"File {trainingFileName} doesn't exist.\");\n }\n\n _dataSplit = LoadAndPrepareData(trainingFileName);\n var dataProcessPipeline = BuildDataProcessingPipeline();\n var trainingPipeline = dataProcessPipeline.Append(_model);\n\n _trainedModel = trainingPipeline.Fit(_dataSplit.TrainSet);\n}\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個方法是訓練算法的藍圖。它接收 .csv 文件的路徑作爲輸入參數。確定文件存在之後,我們使用私有方法 loadAndPrepareData。該方法將數據加載到內存中,並將其分割成兩個數據集,即訓練數據集和測試數據集。在 _dataSplit 中保存返回值,因爲我們需要一個用於評估階段的測試數據集。接着我們調用 BuildDataProcessingPipeline()。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這就是進行數據預處理和特徵工程的方法。對於這些數據,無需做大量工作,我們只需通過以下方式對其進行編碼:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":" private EstimatorChain BuildDataProcessingPipeline()\n {\n var dataProcessPipeline = MlContext.Transforms.Conversion.MapValueToKey(\n inputColumnName: \"UserId\",\n outputColumnName: \"UserIdEncoded\")\n .Append(MlContext.Transforms.Conversion.MapValueToKey(\n inputColumnName: \"MovieId\",\n outputColumnName: \"MovieIdEncoded\"))\n .AppendCacheCheckpoint(MlContext);\n\n return dataProcessPipeline;\n }\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來是 Evaluate() 方法:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"public RegressionMetrics Evaluate()\n{\n var testSetTransform = _trainedModel.Transform(_dataSplit.TestSet);\n\n return MlContext.Regression.Evaluate(testSetTransform);\n}\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過使用 _trainedModel 和測試數據集創建 Transformer 對象是一種非常簡單的方法。接着,利用 MlContext 來檢索迴歸指標。最後,讓我們看看 Save() 方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"public void Save()\n{\n MlContext.Model.Save(_trainedModel, _dataSplit.TrainSet.Schema, ModelPath);\n}\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這是另一個簡單的方法,只是使用 MLContext 將模型保存到定義的路徑中。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5.4 訓練器"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲我們在 TrainerBase 類中完成了所有繁重的工作,所以唯一的 Trainer 類非常簡單,只專注於 ML.NET 算法的實例化。下面看看 RandomForestTrainer 類:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"using Microsoft.ML;\nusing Microsoft.ML.Trainers.Recommender;\nusing RecommendationSystem.MachineLearning.Common;\n\nnamespace RecommendationSystem.MachineLearning.Trainers\n{\n \/\/\/ \n \/\/\/ Class that uses Decision Tree algorithm.\n \/\/\/ \n public sealed class MatrixFactorizationTrainer : TrainerBase\n {\n public MatrixFactorizationTrainer(int numberOfIterations, \n int approximationRank, \n double learningRate) : base()\n {\n Name = $\"Matrix Factorization {numberOfIterations}-{approximationRank}\";\n\n _model = MlContext.Recommendation().Trainers.MatrixFactorization(\n labelColumnName: \"Label\",\n matrixColumnIndexColumnName: \"UserIdEncoded\",\n matrixRowIndexColumnName: \"MovieIdEncoded\",\n approximationRank: approximationRank,\n learningRate: learningRate,\n numberOfIterations: numberOfIterations);\n }\n }\n}\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"正如你所看到的那樣,這個類非常簡單。我們覆寫了 Name 和 _model。在 Recommendation 擴展中使用 MatrixFactorization 類。注意,我們是如何使用這個算法所提供的一些超參數的。有了這個,我們可以做更多的實驗。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5.5 預測器"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Predictor 類用於加載保存的模型並運行一些預測。通常,這個類與訓練器不是同一個微服務的一部分。我們通常有一個微服務來執行模型的訓練。該模型被保存到文件中,其他模型從該文件加載該模型,並基於用戶輸入運行預測。該類看上去如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"using RecommendationSystem.MachineLearning.DataModels;\nusing Microsoft.ML;\nusing System;\nusing System.IO;\n\nnamespace RecommendationSystem.MachineLearning.Predictors\n{\n \/\/\/ \n \/\/\/ Loads Model from the file and makes predictions.\n \/\/\/ \n public class Predictor\n {\n protected static string ModelPath => Path.Combine(AppContext.BaseDirectory, \n \"recommender.mdl\");\n private readonly MLContext _mlContext;\n\n private ITransformer _model;\n\n public Predictor()\n {\n _mlContext = new MLContext(111);\n }\n\n \/\/\/ \n \/\/\/ Runs prediction on new data.\n \/\/\/ \n \/\/\/ New data sample.\n \/\/\/ Prediction object\n public MovieRatingPrediction Predict(MovieRating newSample)\n {\n LoadModel();\n\n var predictionEngine = _mlContext.Model.CreatePredictionEngine(_model);\n\n return predictionEngine.Predict(newSample);\n }\n\n private void LoadModel()\n {\n if (!File.Exists(ModelPath))\n {\n throw new FileNotFoundException($\"File {ModelPath} doesn't exist.\");\n }\n\n using (var stream = new FileStream(ModelPath, FileMode.Open, FileAccess.Read, \n FileShare.Read))\n {\n _model = _mlContext.Model.Load(stream, out _);\n }\n\n if (_model == null)\n {\n throw new Exception($\"Failed to load Model\");\n }\n }\n }\n}\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"簡單地說,模型是從已定義的文件加載,並預測新的樣本。要做到這一點,我們需要創建 PredictionEngine 來執行此操作。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5.6 用法和結果"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"讓我們把所有這些放在一起。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"using RecommendationSystem.MachineLearning.Common;\nusing RecommendationSystem.MachineLearning.DataModels;\nusing RecommendationSystem.MachineLearning.Predictors;\nusing RecommendationSystem.MachineLearning.Trainers;\nusing System;\nusing System.Collections.Generic;\n\nnamespace RecommendationSystem\n{\n class Program\n {\n static void Main(string[] args)\n {\n var newSample = new MovieRating\n {\n UserId = 6,\n MovieId = 11\n };\n\n var trainers = new List\n {\n new MatrixFactorizationTrainer(10, 50, 0.1),\n new MatrixFactorizationTrainer(10, 50, 0.01),\n new MatrixFactorizationTrainer(20, 100, 0.1),\n new MatrixFactorizationTrainer(20, 100, 0.01),\n new MatrixFactorizationTrainer(30, 100, 0.1),\n new MatrixFactorizationTrainer(30, 100, 0.01)\n\n };\n\n trainers.ForEach(t => TrainEvaluatePredict(t, newSample));\n }\n\n static void TrainEvaluatePredict(ITrainerBase trainer, MovieRating newSample) \n {\n Console.WriteLine(\"*******************************\");\n Console.WriteLine($\"{ trainer.Name }\");\n Console.WriteLine(\"*******************************\");\n\n trainer.Fit(\".\\\\Data\\\\recommendation-ratings.csv\");\n\n var modelMetrics = trainer.Evaluate();\n\n Console.WriteLine($\"Loss Function: {modelMetrics.LossFunction:0.##}{Environment.NewLine}\" +\n $\"Mean Absolute Error: {modelMetrics.MeanAbsoluteError:#.##}{Environment.NewLine}\" +\n $\"Mean Squared Error: {modelMetrics.MeanSquaredError:#.##}{Environment.NewLine}\" +\n $\"RSquared: {modelMetrics.RSquared:0.##}{Environment.NewLine}\" +\n $\"Root Mean Squared Error: {modelMetrics.RootMeanSquaredError:#.##}\");\n\n trainer.Save();\n\n var predictor = new Predictor();\n var prediction = predictor.Predict(newSample);\n Console.WriteLine(\"------------------------------\");\n Console.WriteLine($\"Prediction: {prediction.Score:#.##}\");\n Console.WriteLine(\"------------------------------\");\n }\n }\n}\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"非 TrainEvaluatePredict() 方法在這裏做的是重頭戲。使用這個方法,我們可以注入繼承 TrainerBase 類的一個實例,以及一個新的樣本,以便進行預測。接着調用 Fit() 方法對算法進行訓練,並調用 Evaluate() 方法、打印出指標。最後,我們保存該模型。這樣做之後,我們創建一個 Predictor 的實例,用一個新的樣本調用 Predict() 方法,並打印出預測結果。在 Main 中,我們創建一個訓練器對象列表,然後在這些對象上調用 TrainEvaluatePredict。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們根據這些超參數創建了算法列表中隨機森林的一些變體。結果如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"*******************************\nMatrix Factorization 10-50\n*******************************\niter tr_rmse obj\n 0 1.4757 2.4739e+05\n 1 0.9161 1.2617e+05\n 2 0.8666 1.1798e+05\n 3 0.8409 1.1348e+05\n 4 0.8240 1.1079e+05\n 5 0.8100 1.0897e+05\n 6 0.7980 1.0736e+05\n 7 0.7847 1.0575e+05\n 8 0.7691 1.0405e+05\n 9 0.7549 1.0284e+05\nLoss Function: 0.77\nMean Absolute Error: .68\nMean Squared Error: .77\nRSquared: 0.29\nRoot Mean Squared Error: .88\n------------------------------\nPrediction: 3.94\n------------------------------\n*******************************\nMatrix Factorization 10-50\n*******************************\niter tr_rmse obj\n 0 3.1309 9.0205e+05\n 1 2.3707 5.4640e+05\n 2 1.7857 3.3435e+05\n 3 1.5459 2.6501e+05\n 4 1.4055 2.2888e+05\n 5 1.3103 2.0634e+05\n 6 1.2430 1.9129e+05\n 7 1.1902 1.8002e+05\n 8 1.1493 1.7159e+05\n 9 1.1185 1.6546e+05\nLoss Function: 1.27\nMean Absolute Error: .89\nMean Squared Error: 1.27\nRSquared: -0.17\nRoot Mean Squared Error: 1.13\n------------------------------\nPrediction: 4.01\n------------------------------\n*******************************\nMatrix Factorization 20-100\n*******************************\niter tr_rmse obj\n 0 1.5068 2.5551e+05\n 1 0.9232 1.2707e+05\n 2 0.8675 1.1773e+05\n 3 0.8426 1.1358e+05\n 4 0.8260 1.1082e+05\n 5 0.8116 1.0874e+05\n 6 0.7984 1.0705e+05\n 7 0.7849 1.0547e+05\n 8 0.7699 1.0374e+05\n 9 0.7556 1.0222e+05\n 10 0.7407 1.0084e+05\n 11 0.7252 9.9587e+04\n 12 0.7108 9.8130e+04\n 13 0.6962 9.6890e+04\n 14 0.6845 9.6048e+04\n 15 0.6718 9.4877e+04\n 16 0.6615 9.4167e+04\n 17 0.6510 9.3413e+04\n 18 0.6419 9.2767e+04\n 19 0.6322 9.1971e+04\nLoss Function: 0.75\nMean Absolute Error: .67\nMean Squared Error: .75\nRSquared: 0.31\nRoot Mean Squared Error: .86\n------------------------------\nPrediction: 4.06\n------------------------------\n*******************************\nMatrix Factorization 20-100\n*******************************\niter tr_rmse obj\n 0 3.1188 8.9340e+05\n 1 2.4196 5.6643e+05\n 2 1.8203 3.4467e+05\n 3 1.5710 2.7129e+05\n 4 1.4210 2.3212e+05\n 5 1.3245 2.0894e+05\n 6 1.2559 1.9343e+05\n 7 1.2024 1.8189e+05\n 8 1.1592 1.7289e+05\n 9 1.1247 1.6594e+05\n 10 1.0956 1.6027e+05\n 11 1.0717 1.5566e+05\n 12 1.0506 1.5171e+05\n 13 1.0326 1.4838e+05\n 14 1.0169 1.4550e+05\n 15 1.0032 1.4306e+05\n 16 0.9907 1.4085e+05\n 17 0.9798 1.3893e+05\n 18 0.9698 1.3718e+05\n 19 0.9610 1.3563e+05\nLoss Function: 0.99\nMean Absolute Error: .78\nMean Squared Error: .99\nRSquared: 0.09\nRoot Mean Squared Error: .99\n------------------------------\nPrediction: 3.92\n------------------------------\n*******************************\nMatrix Factorization 30-100\n*******************************\niter tr_rmse obj\n 0 1.4902 2.5094e+05\n 1 0.9364 1.2934e+05\n 2 0.8672 1.1737e+05\n 3 0.8428 1.1349e+05\n 4 0.8264 1.1104e+05\n 5 0.8114 1.0883e+05\n 6 0.7966 1.0681e+05\n 7 0.7836 1.0532e+05\n 8 0.7698 1.0378e+05\n 9 0.7540 1.0209e+05\n 10 0.7402 1.0089e+05\n 11 0.7248 9.9437e+04\n 12 0.7098 9.7999e+04\n 13 0.6966 9.6791e+04\n 14 0.6826 9.5745e+04\n 15 0.6687 9.4572e+04\n 16 0.6593 9.3841e+04\n 17 0.6480 9.3017e+04\n 18 0.6404 9.2448e+04\n 19 0.6321 9.1986e+04\n 20 0.6238 9.1298e+04\n 21 0.6160 9.0879e+04\n 22 0.6090 9.0430e+04\n 23 0.6025 9.0006e+04\n 24 0.5962 8.9550e+04\n 25 0.5909 8.9269e+04\n 26 0.5859 8.9011e+04\n 27 0.5809 8.8598e+04\n 28 0.5764 8.8393e+04\n 29 0.5714 8.8086e+04\nLoss Function: 0.74\nMean Absolute Error: .67\nMean Squared Error: .74\nRSquared: 0.32\nRoot Mean Squared Error: .86\n------------------------------\nPrediction: 3.98\n------------------------------\n*******************************\nMatrix Factorization 30-100\n*******************************\niter tr_rmse obj\n 0 3.1699 9.2239e+05\n 1 2.4110 5.6279e+05\n 2 1.8361 3.4988e+05\n 3 1.5652 2.6961e+05\n 4 1.4201 2.3188e+05\n 5 1.3248 2.0902e+05\n 6 1.2537 1.9291e+05\n 7 1.2017 1.8175e+05\n 8 1.1583 1.7271e+05\n 9 1.1237 1.6575e+05\n 10 1.0953 1.6017e+05\n 11 1.0711 1.5555e+05\n 12 1.0502 1.5162e+05\n 13 1.0324 1.4834e+05\n 14 1.0168 1.4549e+05\n 15 1.0036 1.4316e+05\n 16 0.9905 1.4080e+05\n 17 0.9795 1.3886e+05\n 18 0.9697 1.3715e+05\n 19 0.9607 1.3558e+05\n 20 0.9526 1.3418e+05\n 21 0.9452 1.3293e+05\n 22 0.9384 1.3175e+05\n 23 0.9322 1.3070e+05\n 24 0.9265 1.2976e+05\n 25 0.9211 1.2883e+05\n 26 0.9163 1.2802e+05\n 27 0.9118 1.2727e+05\n 28 0.9075 1.2653e+05\n 29 0.9036 1.2589e+05\nLoss Function: 0.9\nMean Absolute Error: .74\nMean Squared Error: .9\nRSquared: 0.17\nRoot Mean Squared Error: .95\n------------------------------\nPrediction: 3.86\n------------------------------\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們使用用戶 ID-6 和電影 ID-11 進行測試。如果你看一下數據集,你會發現這一對和評級是 4。 正如你所看到的那樣,大多數矩陣分解的變體都很好用。迭代 10 次,近似秩 50,學習率 0.01 的變化似乎最接近。而且,它的指標似乎也非常好。但是,還需要進一步的測試才能確定那種變體表現最佳。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"結語"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文涉及了很多方面。我們瞭解了不同類型的推薦系,接着研究了協同過濾和矩陣分解。另外,我們也有機會了解如何將其應用於電影推薦。最終,我們使用 ML.NET 實現了這一切。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Nikola M. Zivkovic,是 Rubik's Code 的首席人工智能官,也是《Deep Learning for Programmers》(尚無中譯本)一書的作者。熱愛知識分享,是一位經驗豐富的演講者,也是塞爾維亞諾維薩德大學的客座講師。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"https:\/\/rubikscode.net\/2021\/03\/15\/machine-learning-with-ml-net-recommendation-systems\/"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章