Deep Cross Network (深度交叉網絡, DCN) 介紹與代碼分析

文章目錄

Deep Cross Network (深度交叉網絡, DCN) 介紹與代碼分析

Deep Cross Network

發現看完 Paper 和源碼後如果不做點筆記, 一段時間過後, 甚至記不起曾經看過相應的 Paper, 年紀大了啊 😂😂😂, 所以得養成好習慣. 限於水平, 筆記記錄下 Paper 的要點即可. 說回正題, 這是 Stanford & Google 於 2017 年發表於 KDD 的工作.

Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & Cross Networkfor Ad Click Predictions. In Proceedings of the ADKDD’17. ACM, 12.

後面的代碼分析採用 DeepCTR-Torch 的 PyTorch 實現.

文章信息

文章地址: Deep & Cross Networkfor Ad Click Predictions
發表時間: KDD, 2017
代碼實現: https://github.com/shenweichen/DeepCTR-Torch/blob/master/deepctr_torch/models/dcn.py

主要內容

傳統的 CTR 模型爲了增強非線性表達能力, 需要構造特徵組合, 雖然這些特徵含義明確、可解釋性強, 但通常需要大量的特徵工程, 耗時耗力. 另一方面, DNN 模型具有強大的學習能力, 可以自動獲取高階的非線性特徵組合, 然而這些特徵通常是隱式的, 含義難以解釋. 本文的亮點在於提出了一個 Cross Network, 可以顯式並且自動地獲取交叉特徵; 此外, Cross Network 相比 DNN 顯得更爲輕量級,因此在表達能力上比 DNN 會遜色一些, 本文通過聯合訓練 Cross Network 與 DNN 來同時發揮二者的優勢.

Deep Cross Network

下圖即爲本文提出的 DCN 完整模型:

主要由三部分構成:

底層的 Embedding and Stacking Layer
中間層並行的 Cross Network 以及 Deep Network
輸出層的 Combination output Layer

特徵輸入

對於原始的輸入特徵, 可能包括離散的類別特徵 (Categorical Features) 和連續的稠密特徵 (Dense Features). 通常我們會將類別特徵編碼爲 OneHot 的形式, 但會導致對應的特徵維度較高且稀疏, 因此我們還會引入一個 Embedding 層將類別特徵映射爲低維的稠密特徵:

$x_{emb, i} = W_{emb, i} x_i$

之後和規範化的連續特徵進行 Concatenation 後, 再喂入到下一層網絡中:

$x_0 = [x^T_{emb, 1}, x^T_{emb, 2}, \ldots, x^T_{emb, k}, x^T_{\bm{dense}}]$

Cross Network

先看圖, 再看公式化表達:

Cross Network 是本文的核心 idea, 它被用來顯式並且高效地對交叉特徵進行學習. 它由多個 Crossing Layer 構成, 每一層使用下式表示:

$x_{l + 1} = x_0 x^T_l w_l + b_l + x_l = f(x_l, w_l, b_l) + x_l$

其中 $x_{l}, x_{l + 1}\in\mathbb{R}^n$ 分別爲第 $l$ 層以及第 $l + 1$ 層的輸出特徵; $w_l, b_l\in\mathbb{R}^n$ 爲第 $l$ 層的權重和偏置; 注意到權重 $\bm{w}$ 爲向量而非矩陣. 此外, Residual Learning 的思路也被引入, 可以有效避免梯度消失的問題, 並可以幫助構建更深的網絡, 增強網絡的表達能力. Cross Network 特殊的結構可以讓特徵的階數(degree)隨着網絡的深度增加而增長. 比如一個層數爲 $l$ 的 Cross Network, 其 highest polynomial degree 爲 $l + 1$ .

由於每一層 cross layer 的權重 $w$ 和偏置 $b$ 都是向量, 假設維度均爲 $d$ , 那麼層數爲 $L_c$ 的 Cross Network 的參數總量爲 $d\times L_c\times 2$ . 由於 Cross Network 的參數數量相對較少, 這限制了模型的表達能力, 因此需要配合 Deep Network 一起使用.

下面看看 DeepCTR-Torch 對 CrossNet 的 PyTorch 實現:

class CrossNet(nn.Module):
    """The Cross Network part of Deep&Cross Network model,
    which leans both low and high degree cross feature.
      Input shape
        - 2D tensor with shape: ``(batch_size, units)``.
      Output shape
        - 2D tensor with shape: ``(batch_size, units)``.
      Arguments
        - **in_features** : Positive integer, dimensionality of input features.
        - **input_feature_num**: Positive integer, shape(Input tensor)[-1]
        - **layer_num**: Positive integer, the cross layer number
        - **l2_reg**: float between 0 and 1. L2 regularizer strength applied to the kernel weights matrix
        - **seed**: A Python integer to use as random seed.
      References
        - [Wang R, Fu B, Fu G, et al. Deep & cross network for ad click predictions[C]//Proceedings of the ADKDD'17. ACM, 2017: 12.](https://arxiv.org/abs/1708.05123)
    """

    def __init__(self, in_features, layer_num=2, seed=1024, device='cpu'):
        super(CrossNet, self).__init__()
        self.layer_num = layer_num
        self.kernels = torch.nn.ParameterList(
            [nn.Parameter(nn.init.xavier_normal_(torch.empty(in_features, 1))) for i in range(self.layer_num)])
        self.bias = torch.nn.ParameterList(
            [nn.Parameter(nn.init.zeros_(torch.empty(in_features, 1))) for i in range(self.layer_num)])
        self.to(device)

    def forward(self, inputs):
        x_0 = inputs.unsqueeze(2)
        x_l = x_0
        for i in range(self.layer_num):
            xl_w = torch.tensordot(x_l, self.kernels[i], dims=([1], [0]))
            dot_ = torch.matmul(x_0, xl_w)
            x_l = dot_ + self.bias[i] + x_l
        x_l = torch.squeeze(x_l, dim=2)
        return x_l

其中 layer_num 表示 Cross Network 的層數, 由於權重和 Bias 均爲向量, 代碼中使用 PyTorch 中的 Parameter 來表示, 每一層權重/Bias 的大小爲 [in_features, 1], 爲了方便我們設 in_channels = d.

在 forward 方法中實現 Cross Network 的前向傳播邏輯. 其中 inputs 的大小爲 [B, d], 其中 B 表示一個 Batch 的大小, d 表示每個樣本的維度, 即輸入特徵的維度.

x_0 = inputs.unsqueeze(2)

這一步對 inputs 的 dim=2 進行擴展, 此時 x0 的 Shape 爲 [B, d, 1]. 之後在 for 循環中, 實現公式

$x_{l + 1} = x_0 x^T_l w_l + b_l + x_l$

的效果. 第一步先完成 $x^T_l w_l$ , 即:

xl_w = torch.tensordot(x_l, self.kernels[i], dims=([1], [0]))

這裏需要了解下 torch.tensordot, 它將 x_l : [B, d, 1] 的 dim=1 這個維度的數據, 即 d 這個維度的數據, 也即特徵, 和第 $l$ 層的權重 kernels[i] : [d, 1] 的 dim=0 這個維度的數據, 也就是權重, 進行 element-wise 相乘並求和. 最終的效果就是 Shape 爲 [B, d, 1] 的特徵和 Shape 爲 [d, 1] 的權重進行 tensordot 後得到 Shape 爲 [B, 1, 1] 的結果.

第二步進行 $x_0 x^T_l w_l$ , 由於前一步得到了 $x^T_l w_l$ , 代碼中表示爲 xl_w, 那麼這一步的結果爲:

dot_ = torch.matmul(x_0, xl_w)

這時候得到的 dot_ Shape 和 x_0 相同, 均爲 [B, d, 1]

最後一步是加上偏置以及輸入自身:

x_l = dot_ + self.bias[i] + x_l

經過 layer_num 次循環後, 也即特徵經過了 $l$ 層後, 代碼中的 x_l Shape 爲 [B, d, 1], 因此還需進行最後一步:

x_l = torch.squeeze(x_l, dim=2)

得到大小爲 [B, d] 的輸入結果.

Deep Network

這個不多說, 裏面可能會用到 Dropout, BN 之類的, 但作者在論文 4.2 節的實現細節中說沒有發現 Dropout 或者 L2 正則化有效.

Deep Network 的公式化爲:

$h_{l + 1} = f(W_l h_l + b_l)$

Combination Layer

最後的輸出層, 需要先將 Cross Network 與 Deep Network 的結果進行 Concatenation, 輸入到 LR 模型 (對於 LR 模型, 可以詳見文章邏輯迴歸模型 Logistic Regression , 推導特別詳細, 我都很感動 😂😂😂), 從而得到預測概率.

$p = \sigma\left([x^T_{L_1}, h^T_{L_2}]w_{logits}\right), \quad \sigma(x) = \frac{1}{1 + e^{-x}}$

損失函數爲 LR 模型的 Negative Log Likelihood, 再加上 L2 正則項:

$\text{loss} = -\frac{1}{N}\sum_{i=1}^{N}y_i\log(p_i) + (1 - y_i)\log(1 - p_i) + \lambda\sum_{l}\Vert \bm{w}_l\Vert^2$

使用 Adam 算法進行優化.

總結

文章介紹的 Cross Network 可以用於高效地對高階組合特徵進行顯式學習, 組合特徵的多項式階數會隨着網絡深度的增加而增加. Cross Network 每一層的權重均爲向量, 結構相對較簡單, 參數相比 DNN 顯著減少, 這也限制了模型的表達能力, 因此需要聯合 DNN 訓練, 兩者優勢互補.

參考資料

Deep & Cross Networkfor Ad Click Predictions 論文資料
DeepCTR-Torch 實現了經典的各種 CTR 模型, 非常推薦看源碼學習.
(讀論文) 推薦系統之ctr預估-DCN模型解析深入讀論文系列

Deep Cross Network (深度交叉網絡, DCN) 介紹與代碼分析

Deep Cross Network (深度交叉網絡, DCN) 介紹與代碼分析

文章目錄

文章信息

主要內容

Deep Cross Network

特徵輸入

Cross Network

Deep Network

Combination Layer

總結

參考資料

Python 爬蟲：Spring Boot 反爬蟲的成功案例

Java中止線程的方式

京東科技數字化營銷能力的演進與最佳實踐| 京東雲技術團隊

861. Score After Flipping Matrix**

300. Longest Increasing Subsequence**

278. First Bad Version*

986. Interval List Intersections**

623. Add One Row to Tree**

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結