1 introduction

文中，作者研究瞭如何有效地處理神經推薦系統中的上下文數據。首先對傳統的將上下文作爲特徵的方法進行了分析，並證明這種方法在捕獲特徵交叉時效率低下。然後據此來設計RNN推薦系統。 We first describe our RNN-based recommender system in use at YouTube. Next, we offer “Latent Cross,” an easy-to-use technique to incorporate contextual data in the RNN by embedding the context feature first and then performing an element-wise product of the context embedding with model’s hidden states.

學習的目的 to best learn from users actions, e.g., clicks, purchases, watches, and ratings…

一些重要的contextual data：request and watch time, the type of device, and the page on the website or mobile app

2 describe

Netflix Prize setting $e \equiv(i, j, R)$ , user $i$ gave movie $j$ a rating of $R$ . $e \equiv(i, j, t, d)$ , user $i$ watched video $j$ at time $t$ on device type $d$ .

recommender systems as trying to predict one value of the event given the others: for a tuple $e = (i, j, R)$ , use $(i, j)$ predict $R$ .

$\begin{array}{lll}{\text { Symbol }} & {\text { Description }} \\ \hline e & {\text { Tuple of } k \text { values describing an observed event }} \\ {e_{\ell}} & {\text { Element } \ell \text { in the tuple }} \\ {\mathcal{E}} & {\text { Set of all observed events }} \\ {u_{i}, v_{j}} & {\text { Trainable embeddings of user } i \text { and item } j} \\ {X_{i}} & {\text { All events for user } i} \\ {X_{i, t}} & {\text { All events for user i befor time t}} \\ {e^{(\tau)}} & {\text { Event at step } \tau \text { in a particular sequence }} \\ {<·>} & {\text { k way inner product}} \\ {*} & {\text { Element-wise product }} \\ {f(\cdot)} & {\text { An arbitrary neural network }}\end{array}$

$\begin{array}{l}{\text { machine learning perspective, we can split our tuple } e \text { into features }} \ {x \text { and label } y \text { such that } x=(i, j) \text { and label } y=R \text { . }}\end{array}$

矩陣分解： $u_{i} \cdot v_{j}$
張量分解： $\sum_{r} u_{i, r} v_{j, r} w_{t, r}$
表示稱內積： $\left\langle u_{i}, v_{j}, w_{t}\right\rangle=\sum_{r} u_{i, r} v_{j, r} w_{t, r}$

3 MoDELING PRELIMINARIES

3.1 First Order DNN的侷限

模型最後的輸出： $h_{\tau}=g\left(W_{\tau} h_{\tau-1}+b_{\tau}\right)$ ，這個公式可看做 $h_{\tau - 1}$ 的一階轉換，原因就是隻涉及了 $h_{\tau - 1}$ 中元素的加法，並沒有涉及到元素間的乘法。
矩陣分解可以捕捉到不同類型輸入（user，item，time等）之間的低秩關係

3.2 Modeling Low-Rank Relations

通過生成一些low-rank數據來驗證first order DNN是否可以很好的建模low-rank之間的關係。
生成長度爲 $r$ 的隨機向量 $u_i$ : $u_{i} \sim \mathcal{N}\left(0, \frac{1}{r^{1 / 2 m}} \mathbf{I}\right)$ ，其中 $r$ 爲data的秩，m個特徵。
當 $m=3$ 時，每個樣本可以表示成 $\left(i, j, t,\left\langle u_{i}, u_{j}, u_{t}\right\rangle\right)$ ，將三部分合並起來作爲輸入，然後經過RELU激活函數輸入到最終的線性層，損失函數採用MSE（ mean squared error loss ），採用Adagrade進行優化，最終以Pearson correlation進行評價。

1、隨着隱藏層大小增加，模型擬合訓練數據的能力更好。
2、rank從1變成2時，隱藏層nodes需要翻倍才能達到相同的準確率。
3、Considering collaborative filtering models will often discover rank 200 relations , this intuitively suggests that real world models would require very wide layers for a single two-way relation to be learned.
結果：ReLU layers越多，擬合的越好，但效率低。因此開始考慮RNN模型。

4 YOUTUBE’S RECURRENT RECOMMENDER

RNNs are notable as a baseline model because they are already second-order neural networks, significantly more complex than the first-order models explored above, and are at the cutting edge of dynamic recommender systems

4.1 Formal Description

the input to the model is the set of events for user: $X_{i}=\left\{e=(i, j, \psi(j), t) \in \mathcal{E} | e_{0}=i\right\}$ , use $X_{i,t}$ to denote all watches before $t$ for user $X_i$ : $X_{i, t}=\left\{e=(i, j, t) \in \mathcal{E} | e_{0}=i \wedge e_{3}<t\right\} \subset X_{I}$ , $\operatorname{Pr}\left(j | i, t, X_{i, t}\right)$ 表示the video $j$ that user $i$ will watch at a given time $t$ based on all watches before $t$ .

user $i$ 在time $t$ 觀看 $\psi(j)$ 上傳的具有 $w_t$ feature的video $j$ 。模型以user $i$ 在time $t$ 之前的瀏覽記錄 $X_{i,t}$ 作爲輸入。使用 $e^{(\tau)}$ 表示序列中的第 $\tau$ 次事件， $x^{(\tau)}$ 表示事件 $e^{(\tau)}$ 轉換後的輸入(就是user $i$ 對應的一些embedding)，而 $y^{(\tau)}$ 表示預測的標籤。當前時刻 $e^{(\tau)}=(i, j, \psi(j), t)$ ，下一時刻 $e^{(\tau+1)}=\left(i, j^{\prime}, \psi\left(j^{\prime}\right), t^{\prime}\right)$ ，則輸入 $x^{(\tau)}=\left[v_{j} ; u_{\psi}(j) ; w_{t}\right]$ 來預測標籤 $y^{(\tau+1)}=j^{'}$ ，其中 $v_{j}$ 是video的embedding， $u_{\psi}(j)$ 是上傳者(uploader)的embedding， $w_{t}$ 是情景的embedding。在預測 $y^{(\tau+1)}$ 時，不能使用 $e^{(\tau)}$ 的標籤作爲輸入，但是可以使用 $w_{t}$ 的情景特徵，記爲 $c^{(\tau)}=\left[w_{t}\right]$ 。

4.2 Structure of the Baseline RNN Model

RNN模型對一系列的actions進行建模:
1、對於每個event $e^{(\tau)}$ ， $e^{(\tau)}$ 對應爲 $x^{(\tau)}$ ，先輸入到一層NN中得到 $h_{0}^{(\tau)}=f_{i}\left(x^{(\tau)}\right)$
2、將其輸入到RNN（LSTM、GRU）模型，得到 $h_{1}^{(\tau)}, z^{(\tau)}=f_{r}\left(h_{0}^{(\tau)}, z^{(\tau-1)}\right)$
3、使用 $f_{o}\left(h_{1}^{(\tau-1)}, c^{(\tau)}\right)$ 來預測 $y^{(\tau)}$

4.3 Context Features

1、TimeDelta
$\Delta t^{(\tau)}=\log \left(t^{(\tau+1)}-t^{(\tau)}\right)$
2、Software Client
video的長短會影響user觀看使用的device
3、Page
從網站home_page開始瀏覽的話可能對new content更有興趣，從一個具體的video page跳轉可能表示user對某個特定的topic更感興趣。
4、Pre- and Post-Fusion.
前面將情景特徵標記爲 $c^{(\tau)}$ ,pre-fusion表示情景特徵從NN底部作爲input，post-fusion表示和RNN的輸出合併起來。把 $c^{(\tau)-1}$ 作爲pre-fusion特徵來影響RNN的state，而把 $c^{(\tau)}$ 作爲post-fusion特徵來直接用於預測 $y^{(\tau)}$ 。

5 CONTEXT MODELING WITH THE LATENT CROSS

前面介紹，直接將content feature concat 是低效的，因此下面展開研究。

5.1 Single Feature

以time爲例，perform an element-wise product in the middle of the network $h_{0}^{(\tau)}=\left(1+w_{t}\right) * h_{0}^{(\tau)}$ , 通過 0-mean Gaussian來初始化 $w$ ，有兩點好處：
1、This can be interpreted as the context providing a mask or attention mechanism over the hidden state. (相當於在隱狀態上加了mask和attention)
2、enables low-rank relations between the input previous watch and the time.（捕捉上次記錄與time的low-rank關係）
對於 $h_{1}^{(\tau)}$ , $h_{1}^{(\tau)}=\left(1+w_{t}\right) * h_{1}^{(\tau)}$ .

5.2 Using Multiple Features

通常，會有很多 contextual feature，以device和time爲例： $h^{(\tau)}=\left(1+w_{t}+w_{d}\right) * h^{(\tau)}$
1、相當於在隱狀態上加了mask和attention
2、捕捉2-way relation
3、加法運算容易訓練, 而 $w_{t} * w_{d} * h^{(\tau)}$ 以及 $f\left(\left[w_{t} ; w_{d}\right]\right)$ 難訓練\。

推薦算法 | 《Latent Cross: Making Use of Context in Recurrent Recommender Systems》

1 introduction

2 describe

3 MoDELING PRELIMINARIES

3.1 First Order DNN的侷限

3.2 Modeling Low-Rank Relations

4 YOUTUBE’S RECURRENT RECOMMENDER

4.1 Formal Description

4.2 Structure of the Baseline RNN Model

4.3 Context Features

5 CONTEXT MODELING WITH THE LATENT CROSS

5.1 Single Feature

5.2 Using Multiple Features

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

Python 安裝庫指令大全

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

基於 Milvus + LlamaIndex 實現高級 RAG

【2024-05-21】以茶會友

leecode 深度優先搜索 DFS

Linux | 文件管理

linux | vim編輯器

《Factorization Machines》 | FM模型及python實現

阿里媽媽DIN模型（Deep Interest Network）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結