Google推薦系統Wide & Deep Learning for Recommender Systems論文翻譯&解讀

Wide & Deep Learning for Recommender Systems

推薦系統中的Wide & Deep Learning

摘要

Generalized linear models with nonlinear feature transformations are widely used for large-scale regression and classification problems with sparse inputs. Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort. With less feature engineering, deep neural networks can generalize better to unseen feature combinations through low-dimensional dense embeddings learned for the sparse features. However, deep neural networks with embeddings can over-generalize and recommend less relevant items when the user-item inter- actions are sparse and high-rank. In this paper, we present Wide & Deep learning—jointly trained wide linear models and deep neural networks—to combine the benefits of memorization and generalization for recommender systems. We productionized and evaluated the system on Google Play, a commercial mobile app store with over one billion active users and over one million apps. Online experiment results show that Wide & Deep significantly increased app acquisitions compared with wide-only and deep-only models. We have also open-sourced our implementation in TensorFlow.

譯文:具有非線形特徵變換的廣義線性模型被廣泛用於大規模迴歸和稀疏輸入的分類問題。通過一系列跨產品特徵變換來記憶特徵交互是有效且可解釋的,而泛化則需要更多的特徵工程工作。利用較少的特徵工程,深度神經網絡可以通過稀疏特徵學習到的低維稠密向量生成更好的未知特徵組合。然而,當用戶-商品交互行爲比較稀疏且排名較高時,有向量的深度神經網絡會過擬合併且推薦不太相關的東西。在本文中,我們提出wide & deep學習 —— 同時訓練線形模型和深度神經網絡,爲推薦系統結合記憶和泛化的優勢。我們在Google Play上製作並評估了該系統,Google Play是一個商業移動應用商店,擁有超過10億活躍用戶和超過100萬個應用。在線實驗結果表明,與僅限廣泛和僅限深度的模型相比,Wide&Deep明顯增加了應用下載量。我們還在TensorFlow中開源了我們的實現。

解讀:這裏有兩個詞很重要,在後面也會反覆出現:memorization 和 generalization,譯文中翻譯成記憶和泛化並不是特別好(但也想不到什麼更合理的)。memorization是指學習已知的特徵變換和特徵組合對結果的影響,generalization是指學習未知的特徵變換和特徵組合對結果的影響。以論文中用到的Google Play預測舉例,用線性模型學習用戶的年齡、工作,應用的下載量、類型等對用戶是否會下載應用的影響是memorization,用深度模型學習未知的特徵組合(不是很恰當的例子:用戶的年齡*工作/應用的下載量+類型)對用戶是否會下載應用的影響是generalization

1. 引言

A recommender system can be viewed as a search ranking system, where the input query is a set of user and contextual information, and the output is a ranked list of items. Given a query, the recommendation task is to find the relevant items in a database and then rank the items based on certain objectives, such as clicks or purchases.
One challenge in recommender systems, similar to the general search ranking problem, is to achieve both memorization and generalization. Memorization can be loosely defined as learning the frequent co-occurrence of items or features and exploiting the correlation available in the historical data. Generalization, on the other hand, is based on transitivity of correlation and explores new feature combinations that have never or rarely occurred in the past. Recommendations based on memorization are usually more topical and directly relevant to the items on which users have already performed actions. Compared with memorization, generalization tends to improve the diversity of the recommended items. In this paper, we focus on the apps recommendation problem for the Google Play store, but the approach should apply to generic recommender systems.
For massive-scale online recommendation and ranking systems in an industrial setting, generalized linear models such as logistic regression are widely used because they are simple, scalable and interpretable. The models are often trained on binarized sparse features with one-hot encoding. E.g., the binary feature “user_installed_app=netflix” has value 1 if the user installed Netflix. Memorization can be achieved effectively using cross-product transformations over sparse features, such as AND(user_installed_app=netflix, impression_app=pandora”), whose value is 1 if the user installed Netflix and then is later shown Pandora. This explains how the co-occurrence of a feature pair correlates with the target label. Generalization can be added by using features that are less granular, such as AND(user_installed_category=video, impression_category=music), but manual feature engineer- ing is often required. One limitation of cross-product trans- formations is that they do not generalize to query-item feature pairs that have not appeared in the training data.
Embedding-based models, such as factorization machines [5] or deep neural networks, can generalize to previously un- seen query-item feature pairs by learning a low-dimensional dense embedding vector for each query and item feature, with less burden of feature engineering. However, it is difficult to learn effective low-dimensional representations for queries and items when the underlying query-item matrix is sparse and high-rank, such as users with specific preferences or niche items with a narrow appeal. In such cases, there should be no interactions between most query-item pairs, but dense embeddings will lead to nonzero predictions for all query-item pairs, and thus can over-generalize and make less relevant recommendations. On the other hand, linear mod- els with cross-product feature transformations can memorize these “exception rules” with much fewer parameters.
In this paper, we present the Wide & Deep learning frame- work to achieve both memorization and generalization in one model, by jointly training a linear model component and a neural network component as shown in Figure 1.
The main contributions of the paper include:
• The Wide & Deep learning framework for jointly train- ing feed-forward neural networks with embeddings and linear model with feature transformations for generic recommender systems with sparse inputs.
• The implementation and evaluation of the Wide & Deep recommender system productionized on Google Play, a mobile app store with over one billion active users and over one million apps.
• We have open-sourced our implementation along with a high-level API in TensorFlow1.
While the idea is simple, we show that the Wide & Deep framework significantly improves the app acquisition rate on the mobile app store, while satisfying the training and serving speed requirements.

譯文:推薦系統可以看作一個搜索排序系統,其中輸入語句是一組用戶和上下文信息,輸出是一個排了序的商品列表。給定一個查詢語句,推薦任務是在數據庫中查詢相關的商品,然後基於某些目標(例如點擊或者購買)對商品排名。
與一般搜索排名問題類似,推薦系統中的一個挑戰是實現記憶和泛化。記憶可以寬鬆定義爲學習商品或者特徵的共同出現頻繁程度和利用歷史數據中可用的相關性。
另一方面,泛化是基於相關性的傳遞性,探索從未出現或者極少出現過的新的特徵組合。基於記憶的推薦系統通常更加直接地和與用戶交互過的商品相關。和基於記憶的推薦系統相比,基於泛化的推薦系統傾向於提升推薦商品的多樣性。本文中,我們關注的是Google Play商店的應用推薦問題,但該方法應適用於通用推薦系統。
在工業環境中的大規模在線推薦和排名系統中,廣義線性模型(如邏輯迴歸)被廣泛使用,因爲它們簡單,可擴展且可解釋。模型通常使用one-hot編碼的二值化稀疏特徵訓練。例如,如果用戶安裝了Netflix,則二進制功能“user_installed_app = netflix”的值爲1。在稀疏特徵上使用跨產品特徵變換可以有效的實現記憶,例如AND(user_installed_app = netflix,impression_app = pandora“),如果用戶安裝了Netflix且出現在Pandora則其值爲1。這解釋了特徵對的共現如何與目標標籤相關聯。可以使用不太精細的特徵(例如AND(user_installed_category = video,impression_category = music))添加泛化,但通常需要人工特徵工程。跨產品變換的一個侷限性是不能產生沒有在訓練集中出現過的查詢語句-商品特徵對。
基於嵌入的模型,例如分解機或者深度神經網絡,通過學習每個query和item的低維稠密embedding向量,可以泛化從未出現過的查詢語句-商品特徵對,同時減少特徵工程的負擔。然而,由於基礎的query-item矩陣是稀疏和高排序的,例如具有特定偏好的用戶或者小衆商品,學習query和item的低維表徵是困難的。在這種情況下,大部分的query-item對之間沒有交互,但是稠密embedding會導致對所有quert-item對的非零預測,因此可能過擬合和使用不太相關的推薦。另一方面,有着跨產品交叉特徵變換的線性模型可以用更少的參數記憶這些“異常規則”。
在本文中,我們提出wide & deep模型,模型通過同時訓練一個線形模型和一個神經網絡(圖1)來同時實現記憶和泛化。
本論文的主要貢獻包括:
(1)通用於具有稀疏輸入的推薦系統的wide&deep框架,同時訓練帶有嵌入的前饋神經網絡和帶有特徵變換的線形模型。
(2)在Google Play上實施的Wide&Deep推薦系統的實施和評估,Google Play是一個擁有超過10億活躍用戶和超過100萬個應用的移動應用商店。
(3)我們開源了基於Tensorflow1的高級API的實現。
雖然這個想法很簡單,但是實踐表明wide&deep框架顯著提高了移動app score 的app下載率,同時滿足了訓練和服務的速度要求。
在這裏插入圖片描述

2. 推薦系統概述

An overview of the app recommender system is shown in Figure 2. A query, which can include various user and contextual features, is generated when a user visits the app store. The recommender system returns a list of apps (also referred to as impressions) on which users can perform certain actions such as clicks or purchases. These user actions, along with the queries and impressions, are recorded in the logs as the training data for the learner.
Since there are over a million apps in the database, it is intractable to exhaustively score every app for every query within the serving latency requirements (often O(10) milliseconds). Therefore, the first step upon receiving a query is retrieval. The retrieval system returns a short list of items that best match the query using various signals, usually a combination of machine-learned models and human-defined rules. After reducing the candidate pool, the ranking system ranks all items by their scores. The scores are usually P(y|x), the probability of a user action label y given the features x, including user features (e.g., country, language, demographics), contextual features (e.g., device, hour of the day, day of the week), and impression features (e.g., app age, historical statistics of an app). In this paper, we focus on the ranking model using the Wide & Deep learning framework.

譯文:app推薦系統的框架如圖2所示。當用戶訪問app store的時候,會生成一個包含了豐富的用戶和上下文信息的query。推薦系統會返回一個應用列表(也稱爲印象),用戶可以在上面執行某些操作,例如點擊或購買。這些用戶行爲以及查詢和印象,會作爲模型的訓練數據。
數據庫中有超過一百萬個應用程序,因此在服務延遲要求(通常爲o(10)毫秒)內爲每個查詢語句全面的對每個app評分是不現實的。因此,接收到查詢語句的第一步是檢索。檢索系統通過機器學習模型和人工定義規則篩選返回與查詢最匹配的item的簡短列表。減少候選池中app數量後,排名系統通過分數對這些app進行排序。分數通常是根據用戶特徵(國家、語言、人口統計)、上下文特徵(設備、時間、星期)、印象特徵(應用年齡、應用歷史數據)x預測的用戶行爲標籤y=1的概率p(y|x)。在本文中,我們重點關注wide&deep學習框架在排名模型上的應用。
在這裏插入圖片描述
解讀:這裏的impressions翻譯成印象可能不便於理解,應該是指(a list of apps)應用的列表,論及特徵(impression features)的時候,指應用的特徵

3. wide&deep模型

3.1 wide部分

The wide component is a generalized linear model of the form y=wTx+by = w^Tx + b, as illustrated in Figure 1 (left). y is the prediction, x=[x1,x2,...,xd]x=[x_1,x_2,...,x_d]is a vector of d features, w=[w1,w2,...,wd]w=[w_1,w_2,...,w_d]are the model parameters and b is the bias. The feature set includes raw input features and transformed features. One of the most important transformations is the cross-product transformation, which is defined as:
θk(x)=i=1dxicki,cki{0,1}\theta_k(x)=\prod^d_{i=1}x_i^{c_{ki}}, c_{ki}\in\{0,1\} ,where ckic_{ki}is a boolean variable that is 1 if the i-th feature is part of the k-th transformation φk, and 0 otherwise. For binary features, a cross-product transformation (e.g., “AND(gender=female, language=en)”) is 1 if and only if the constituent features (“gender=female” and “language=en”) are all 1, and 0 otherwise. This captures the interactions between the binary features, and adds nonlinearity to the generalized linear model.

譯文:wide部分是 y=wTx+by = w^Tx + b,形式的廣義線性模型,如圖1左邊部分所示。y是預測值,x=[x1,x2,...,xd]x=[x_1,x_2,...,x_d]是特徵向量,w=[w1,w2,...,wd]w=[w_1,w_2,...,w_d]是模型參數,b是偏置常量。特徵包括原始輸入特徵和變換特徵。最重要的變換特徵是交叉產品變換,定義如下:θk(x)=i=1dxicki,cki{0,1}\theta_k(x)=\prod^d_{i=1}x_i^{c_{ki}}, c_{ki}\in\{0,1\} ,ckic_{ki}是一個布爾變量,如果第i個特徵是第k個變換的一部分則爲1,反之爲0.對於二值特徵,一個組合特徵當原特徵都爲0的時候纔會0(例如“性別=女”且“語言=英語”時,AND(性別=女,語言=英語)=1,其他情況均爲0)。這捕獲了二元特徵之間的相互作用,併爲廣義線性模型增加了非線性。

解讀:這裏的θk(x)=i=1dxicki,cki{0,1}\theta_k(x)=\prod^d_{i=1}x_i^{c_{ki}}, c_{ki}\in\{0,1\}公式非常數學抽象化,其實就是特徵組合。如性別和語言的組合特徵,性別:{男,女},語言:{中文,英語},組合特徵:{男且中文,男且英語,女且中文,女且英語},某樣本性別=女,語言=英語,則組合特徵女且英語=1,其他爲0

3.2 deep部分

The deep component is a feed-forward neural network, as shown in Figure 1 (right). For categorical features, the original inputs are feature strings (e.g., “language=en”). Each of these sparse, high-dimensional categorical features are first converted into a low-dimensional and dense real-valued vector, often referred to as an embedding vector. The dimensionality of the embeddings are usually on the order of O(10) to O(100). The embedding vectors are initialized ran- domly and then the values are trained to minimize the final loss function during model training. These low-dimensional dense embedding vectors are then fed into the hidden layers of a neural network in the forward pass. Specifically, each hidden layer performs the following computation:
a(l+1)=f(W(l)a(l)+b(l))a^{(l+1)}=f(W^{(l)}a^{(l)}+b^{(l)})
where l is the layer number and f is the activation function, often rectified linear units (ReLUs). a(l),b(l),W(l)a^{(l)},b^{(l)},W^{(l)} are the activations, bias, and model weights at l-th layer.

譯文:deep部分是前饋神經網絡,如圖1(右)所示。對於類別型特徵,原始輸入是特徵字符串(例如,“語言=英語”)。這些稀疏的高維類別特徵會先轉換成低維稠密的實數向量,通常被稱爲嵌入向量。嵌入向量的維度通常通常在o(10)到o(100)的量級。隨機初始化嵌入向量,然後在模型訓練中最小化最終損失函數。這些低維稠密向量饋送到前向傳遞中的神經網絡的隱藏層中。 具體來說,每個隱藏層執行以下計算:a(l+1)=f(W(l)a(l)+b(l))a^{(l+1)}=f(W^{(l)}a^{(l)}+b^{(l)}),l是層數,f是激活函數,通常使用RELU單元。a(l),b(l),W(l)a^{(l)},b^{(l)},W^{(l)}是第l層的激活、偏置和模型權重。

3.3 wide模型和deep的結合

The wide component and deep component are combined using a weighted sum of their output log odds as the prediction, which is then fed to one common logistic loss function for joint training. Note that there is a distinction be- tween joint training and ensemble. In an ensemble, individual models are trained separately without knowing each other, and their predictions are combined only at inference time but not at training time. In contrast, joint training optimizes all parameters simultaneously by taking both the wide and deep part as well as the weights of their sum into account at training time. There are implications on model size too: For an ensemble, since the training is disjoint, each individual model size usually needs to be larger (e.g., with more features and transformations) to achieve reasonable accuracy for an ensemble to work. In comparison, for joint training the wide part only needs to complement the weak- nesses of the deep part with a small number of cross-product feature transformations, rather than a full-size wide model.
Joint training of a Wide & Deep Model is done by back- propagating the gradients from the output to both the wide and deep part of the model simultaneously using mini-batch stochastic optimization. In the experiments, we used Follow- the-regularized-leader (FTRL) algorithm [3] with L1 regularization as the optimizer for the wide part of the model, and AdaGrad [1] for the deep part.The combined model is illustrated in Figure 1 (center). For a logistic regression problem, the model’s prediction is:P(Y=1x)=σ(wwideT[x,φ(x)]+wdeepTa(lf)+b)P(Y=1|x)=\sigma(w_{wide}^T[x,\varphi(x)]+w_{deep}^Ta^{(lf)}+b)
where Y is the binary class label, σ(·) is the sigmoid func- tion, φ(x) are the cross product transformations of the orig- inal features x, and b is the bias term. wwidew_{wide} is the vector of all wide model weights, and wdeepw_{deep} are the weights applied on the final activations a(lf)a^{(lf)}.

譯文:wide的部分和deep的部分使用其輸出對數機率的加權和作爲預測,然後將其輸入到聯合訓練的一個共同的邏輯損失函數。注意到這裏的聯合訓練和集成學習是有區別的。集成學習中,每個模型是獨立訓練的,而且他們的預測是在推理時合併而不是在訓練時合併。相比之下,聯合訓練在訓練時同時考慮wide和deep模型以及加權和來優化所有參數。這對模型大小也有影響:對於集成學習而言,由於訓練是獨立的,因此每個模型的大小通常會更大(例如:更多特徵和交叉特徵)來實現一個集成模型合理的精確度。相比之下,在聯合訓練中,wide部分只需要通過少量的跨產品特徵變換來補充深度模型的不足,而且不是全量的模型。
wide和deep模型的聯合訓練是通過使用小批量隨機優化同時將輸出的梯度反向傳播到模型的wide和deep部分來完成的。 在實驗中,我們使用帶L1正則的FTRL算法作爲wide部分的優化器,AdaGrad作爲deep部分的優化器。
這個聯合模型如圖1(中)所示。對於邏輯迴歸問題,模型的預測是:
P(Y=1x)=σ(wwideT[x,θ(x)]+wdeepTa(lf)+b)P(Y=1|x)=\sigma(w_{wide}^T[x,\theta(x)]+w_{deep}^Ta^{(lf)}+b)其中,Y是二值分類標籤,σ\sigma是sigmoid函數,φ(x)\varphi(x)是跨產品特徵變換,b是偏置項,wwidew_{wide}是wide模型的權重,wdeepw_{deep}是用於最終激活函數a(lf)a^{(lf)}的權重。

4. 系統實現

The implementation of the apps recommendation pipeline consists of three stages: data generation, model training, and model serving as shown in Figure 3.

譯文:如圖3所示,app推薦系統管道的實現包括了三個階段:數據生成,模型訓練和模型服務。
在這裏插入圖片描述

4.1 數據生成

In this stage, user and app impression data within a period of time are used to generate training data. Each example corresponds to one impression. The label is app acquisition: 1 if the impressed app was installed, and 0 otherwise.
Vocabularies, which are tables mapping categorical feature strings to integer IDs, are also generated in this stage. The system computes the ID space for all the string features that occurred more than a minimum number of times. Continuous real-valued features are normalized to [0, 1] by map- ping a feature value x to its cumulative distribution function P (X ≤ x), divided into nqn_q quantiles. The normalized value
is i1nq1\frac{i-1}{n_q-1} for values in the i-th quantiles. Quantile boundaries are computed during data generation.

譯文:在此階段,一段時間內的用戶和應用程序展示數據用於生成訓練數據。 每個樣本對應一次展示。 標籤是應用程序下載:如果下載了展示的應用程序,則爲1,否則爲0。
這個階段還會生成存儲分類特徵字符串和對應ID的映射表。系統計算出現次數超過最少次數要求的特徵字符串的ID。通過將特徵值x映射到其累積分佈函數pXxp(X \le x),將連續值特徵標準化爲[0,1],將其分成nqn_q份。標準化值中第i份的值就是i1nq1\frac{i-1}{n_q-1}。在數據生成階段計算了分位數邊界。

4.2 模型訓練

The model structure we used in the experiment is shown in Figure 4. During training, our input layer takes in training data and vocabularies and generate sparse and dense features together with a label. The wide component consists of the cross-product transformation of user installed apps and impression apps. For the deep part of the model, A 32- dimensional embedding vector is learned for each categorical feature. We concatenate all the embeddings together with the dense features, resulting in a dense vector of approximately 1200 dimensions. The concatenated vector is then fed into 3 ReLU layers, and finally the logistic output unit.
The Wide & Deep models are trained on over 500 billion examples. Every time a new set of training data arrives, the model needs to be re-trained. However, retraining from scratch every time is computationally expensive and delays the time from data arrival to serving an updated model. To tackle this challenge, we implemented a warm-starting system which initializes a new model with the embeddings and the linear model weights from the previous model.
Before loading the models into the model servers, a dry run of the model is done to make sure that it does not cause problems in serving live traffic. We empirically validate the model quality against the previous model as a sanity check.

譯文:實驗中我們使用的模型框架如圖4所示。訓練過程中,輸入層接收訓練數據和詞彙表,同時生成帶label的稀疏和稠密的特徵。wide部分包括用戶安裝的app和展示的app的跨產品特徵變換。deep部分,爲每個分類特徵學習學習32維的嵌入向量。我們將所有向量和稠密特徵連接成一個約1200維的稠密向量。然後將連接的矢量輸入3個ReLU層,最後輸入邏輯輸出單元。
wide&deep模型在超過5000億個樣本的數據集上訓練。當加入一組新的數據時,需要重新訓練模型。然而,每次從頭開始重新訓練是非常耗費計算資源的,並且延遲了從數據到達到服務更新的時間。爲了應對這一挑戰,我們實施了一個熱啓動系統,該系統使用先前模型中的嵌入和線性模型權重初始化。在將模型加載到模型服務器之前,先完成模型的幹運行,以確保它不會在提供實時流量時出現問題。 我們根據先前的模型驗證模型質量作爲健全性檢查。
在這裏插入圖片描述

4.3 模型服務

Once the model is trained and verified, we load it into the model servers. For each request, the servers receive a set of app candidates from the app retrieval system and user features to score each app. Then, the apps are ranked from the highest scores to the lowest, and we show the apps to the users in this order. The scores are calculated by running a forward inference pass over the Wide & Deep model.
In order to serve each request on the order of 10 ms, we optimized the performance using multithreading parallelism by running smaller batches in parallel, instead of scoring all candidate apps in a single batch inference step.

譯文:模型經過訓練和驗證後,我們將其加載到模型服務器中。對於每個請求,服務器從app檢索系統中接收一組app,並給每個app通過用戶特徵打分。然後,應用程序從高到低排分並展示給用戶。通過wide&deep模型運行一個前向推理傳遞計算得分。
爲了在10ms內相應請求,我們通過並行運行較小批量使用多線程並行性來優化性能,而不是在單個批量推理步驟中對所有候選應用程序進行評分。

5. 實驗結果

6. 相關工作

The idea of combining wide linear models with cross- product feature transformations and deep neural networks with dense embeddings is inspired by previous work, such as factorization machines [5] which add generalization to linear models by factorizing the interactions between two variables as a dot product between two low-dimensional embedding vectors. In this paper, we expanded the model capacity by learning highly nonlinear interactions between embeddings via neural networks instead of dot products.
In language models, joint training of recurrent neural net- works (RNNs) and maximum entropy models with n-gram features has been proposed to significantly reduce the RNN complexity (e.g., hidden layer sizes) by learning direct weights between inputs and outputs [4]. In computer vision, deep residual learning [2] has been used to reduce the difficulty of training deeper models and improve accuracy with shortcut connections which skip one or more layers. Joint training of neural networks with graphical models has also been applied to human pose estimation from images [6]. In this work we explored the joint training of feed-forward neural networks and linear models, with direct connections between sparse features and the output unit, for generic recommendation and ranking problems with sparse input data.
In the recommender systems literature, collaborative deep learning has been explored by coupling deep learning for content information and collaborative filtering (CF) for the ratings matrix [7]. There has also been previous work on mobile app recommender systems, such as AppJoy which used CF on users’ app usage records [8]. Different from the CF-based or content-based approaches in the previous work, we jointly train Wide & Deep models on user and impression data for app recommender systems.

譯文:把交叉特徵變換的wide模型和使用稠密嵌入向量的deep模型結合起來的思想受到了之前工作的啓發。例如分解機,它通過將兩個變量之間的相互作用分解爲點積來增加線性模型的推廣。在兩個低維嵌入向量之間。在本文中,我們通過神經網絡而不是點積來學習嵌入之間的高度非線性相互作用來擴展模型容量。在語言模型中,已經提出聯合訓練遞歸神經網絡(RNN)和具有n-gram特徵的最大熵模型,通過學習輸入和輸出之間的直接權重來顯着降低RNN複雜度(例如,隱藏層大小)[4] ]。在計算機視覺中,深度殘差學習[2]已被用於減少訓練更深層模型的難度,並通過跳過一個或多個層的快捷連接來提高準確性。神經網絡與圖形模型的聯合訓練也已應用於圖像中的人體姿態估計[6]。在這項工作中,我們探索了前饋神經網絡和線性模型的聯合訓練,稀疏特徵和輸出單元之間的直接連接,用於稀疏輸入數據的通用推薦和排序問題。
在推薦系統文獻中,通過將內容信息的深度學習與評級矩陣的協同過濾(CF)相結合,探索了協作深度學習[7]。此前還有一些關於移動應用程序推薦系統的工作,例如AppJoy,它在用戶的應用程序使用記錄中使用了CF [8]。與之前工作中基於CF或基於內容的方法不同,我們聯合培訓針對應用推薦系統的用戶和印象數據的Wide&Deep模型。

7. 結論

Memorization and generalization are both important for recommender systems. Wide linear models can effectively memorize sparse feature interactions using cross-product feature transformations, while deep neural networks can generalize to previously unseen feature interactions through low- dimensional embeddings. We presented the Wide & Deep learning framework to combine the strengths of both types of model. We productionized and evaluated the framework on the recommender system of Google Play, a massive-scale commercial app store. Online experiment results showed that the Wide & Deep model led to significant improvement on app acquisitions over wide-only and deep-only models.

譯文:記憶和概括對於推薦系統都很重要。寬線性模型可以使用跨產品特徵轉換有效地記憶稀疏特徵交互,而深度神經網絡可以通過低維嵌入來生成以前看不見的特徵交互。我們介紹了Wide&Deep學習框架,以結合兩種模型的優勢。我們在Google Play的推薦系統上製作並評估了該框架,Google Play是一個大規模的商業應用商店。在線實驗結果表明,Wide&Deep模型在廣泛和僅深度模型上的應用程序獲取方面取得了顯着改進。

ps:目前簡單的翻譯了論文,如有錯誤,還請指正,謝謝!接下來會對開源代碼深入學習實踐,如果對論文有更深的理解將會不斷更新,歡迎交流~~

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章