wide & Deep 和 Deep & Cross 及tensorflow實現

前言

　　最近讀了下Google的兩篇文章《Wide&Deep Learning》和《Deep&Cross Network》，趁着熱乎比較下，順道寫個demo，免得後面用的時候瞎搞。
　　前者是用來給用戶推薦潛在喜歡的APP；後者是用來預測用戶可能點擊的廣告排序。基於用戶基本信息和行爲日誌來做個性化的推薦，是商業化的重要一步，做得好，用戶使用起來甚是滿意，廣告商支付更多費用；做得差，大家只能喝喝西風，吃點鹹菜。

Why Deep-Network ?

　　關於推薦，前面博文FTRL系列講過，那是種基於基本特徵和二維組合特徵的線性推薦模型。其優點：模型簡單明白，工程實現快捷，badcase調整方便。缺點也很明顯：對更高維抽象特徵無法表示，高維組合特徵不全。而Deep-Network能夠表達高維抽象特徵，剛好可以彌補了線性模型這方面的缺點。

Why Cross-Network ?

　　組合特徵，爲什麼止步於兩維組合？多維組合，不單說手挑組合特徵費時費力，假設特徵都組合起來，特徵的數量非得彪上天不可。但是Cross-Network(參考5)則可以很好地解決組合特徵的數量飆升的問題。所以說，並不是問題真難以解決，只不過牛人還沒有解它而已。
結構比較
　　啥都不如圖明白，直接上圖，左側 Wide and Deep Network 右側 Deep and Cross Network

　　上面兩個圖清晰地表示了兩種方法的框架結構。

特徵輸入

　　1）W&D的特徵包括三方面：
　　　　User-Feature：contry, language, demographics.
　　　　Contextual-Feature：device, hour of the day, day of the week.
　　　　Impression-Feature：app age, historical statistics of an app.
　　1.1）Wide部分的輸入特徵：
　　　　raw input features and transformed features [手挑的交叉特徵].
　　　　notice: W&D這裏的cross-product transformation：
　　　　只在離散特徵之間做組合，不管是文本策略型的，還是離散值的；沒有連續值特徵的啥事，至少在W&D的paper裏面是這樣使用的。
　　1.2）Deep部分的輸入特徵： raw input+embeding處理
　　　　對非連續值之外的特徵做embedding處理，這裏都是策略特徵，就是乘以個embedding-matrix。在TensorFlow裏面的接口是：tf.feature_column.embedding_column，默認trainable=True.
　　　　對連續值特徵的處理是：將其按照累積分佈函數P(X≤x)，壓縮至[0,1]內。
　　　　notice: Wide部分用FTRL+L1來訓練；Deep部分用AdaGrad來訓練。
　　Wide&Deep在TensorFlow裏面的API接口爲：tf.estimator.DNNLinearCombinedClassifier
　　2）D&C的輸入特徵及處理：
　　　　所有輸入統一處理，不再區分是給Deep部分還是Cross部分。
　　　　對高維的輸入（一個特徵有非常多的可能性），加embeding matrix，降低維度表示，dense維度估算： $6 * (c a t e g o r y - c a r d i n a l i t y) 1 / 4$ 。
　　　　notice：W&D和D&C裏的embedding不是語言模型中常說的Word2Vec（根據上下文學習詞的低維表示），而是僅僅通過矩陣W，將離散化且非常稀疏的one-hot形式詞，降低維度而已。參數矩陣的學習方法是正常的梯度下降。
　　　　對連續值的，則用log做壓縮轉換。
　　　　stack上述特徵，分別做deep-network和Cross-network的輸入。
cross解釋
　　cross-network在引用5中有詳細介紹，但是在D&C裏面是修改之後的cross-network。

x_{l} = x_{0} * x_{l - 1}^{T} * w_{e m b e d d i n g} + b + x_{l - 1}

　　單樣本下大小爲：

x 0 = [d \times 1]

;

x l = [d \times 1]

;

w_{e m b e d d i n g} = [d \times 1]

;

b = [d \times 1]

，注意 w是共享的，對這一層交叉特徵而言，爲啥共享呢，目測一方面爲了節約空間，還一個可能原因是收斂困難（待定）。

tf實現D&C的注意事項

　　1）mult-hot的特徵表示問題
　　　　tf.feature_column.indicator_column來表示。
　　　　注意，_IndicatorColumn不支持疊加_EmbeddingColumn操作。
　　2）embedding問題
　　　　tf.feature_column.embedding_column來表示，默認trainable=True
　　　　特徵間共享embed： tf.contrib.layers.shared_embedding_columns
　　3）數據讀入的問題
　　　　dataset流解析函數要在input_fn內部。
　　　　tf.cast 與 tf.string_to_number。
　　4）tf.estimator.Estimator問題
　　　　自定義的model_fn的參數params項，是顯式地傳遞。
　　　　注意，estimator本身帶有異步更新的機制，SycOpt。
　　5）cross-network的實現
　　　　藉助廣播來計算。
　　　　驗證，tile是不影響原始參數梯度計算的。
　　6）不定長特徵的embedding
　　　　tf.feature_column + estimator
　　　　是不支持不定長特徵的處理的，僅支持定長的。
　　　　只能用tf.nn.embedding_lookup_sparse來處理不定長特徵。
　　　　對字符串離散不定長特徵的示例代碼附在後面。
　　　　非用tf.feature_column處理不定長特徵，會有報錯
　　　　convert Sparse Tensor to Tensor的維度錯誤，但是不知道內部哪裏的錯。

tf_debug

　　因爲是用tf.estimator寫的模型，無法使用print查看內部變量，調試就成了大問題。tf.estimator在設計的時候，考慮到了這種情況，將其設計爲可接收外部定義的hook，支持tf_debug。詳細代碼見下面的mult.py。
　　hook的樣式，params[‘hooks’] =
　　[tf_debug.LocalCLIDebugHook()]，
　　然後傳遞到estimator內部，給train或者evaluate使用。
　　用tf_debug查看內部變量，舉個栗子，想看下
　　tf.feature_column.embedding_column的combiner=sum是怎麼個操作。
　　某特徵輸入：
　　1)State-gov|human 2)Self-emp-not-inc|human 3)State-gov|human
　　爲了方便，初始化embedding-matrix=ones.

　　debug下運行，得到embedding-mat變量如下：

　　對特徵的處理結果：編碼表示和index值（embedding輸入側的的索引值）

　　發現embedding-vec如下：

　　發現：其中的combiner=sum是依照index找到embedding-vec,然後對embedding_vec加和得到embedding結果的。自行替換成隨機初始化的embedding-matrix，得到同樣的驗證結果。

github 源碼

　　利用tf.feature_column + dataset + tf.estimator 實現Deep and Cross。
　　數據集是census income dataset。
　　D&C 測試 demo : https://github.com/jxyyjm/tensorflow_test/blob/master/src/deep_and_cross.py
　　tf_debug 測試 demo : https://github.com/jxyyjm/tensorflow_test/blob/master/src/multi.py
　　下面給出cross_計算在tf裏面的多種實現，對tf.matmul /tf.tensordot的應用是核心，簡潔高效是重要的。

#!/usr/bin/python
# -*- coding:utf-8 -*-
import tensorflow as tf
import sys 
reload(sys)
sys.setdefaultencoding('utf-8')

def cross_op(x0, x, w, b): 
  ## absolute the defination 計算速度最慢，低效 ##
  x0 = tf.expand_dims(x0, axis=2) # mxdx1
  x  = tf.expand_dims(x,  axis=2) # mxdx1
  multiple = w.get_shape().as_list()[0]
  x0_broad_horizon = tf.tile(x0, [1,1,multiple])   # mxdx1 -> mxdxd #
  x_broad_vertical = tf.transpose(tf.tile(x,  [1,1,multiple]), [0,2,1]) # mxdx1 -> mxdxd #
  w_broad_horizon  = tf.tile(w,  [1,multiple])     # dx1 -> dxd #
  mid_res = tf.multiply(tf.multiply(x0_broad_horizon, x_broad_vertical), tf.transpose(w_broad_horizon)) # mxdxd # here use broadcast compute # 
  res = tf.reduce_sum(mid_res, axis=2) # mxd #
  res = res + tf.transpose(b) # mxd + 1xd # here also use broadcast compute #a
  return res 
def cross_op2(x0, x, w, b): 
  ## 充分利用了廣播計算 來實現cross，也很低效 ##
  x0 = tf.expand_dims(x0, axis=2) # mxdx1
  x  = tf.expand_dims(x,  axis=2) # mxdx1
  dot = tf.matmul(x0, tf.transpose(x, [0, 2, 1]))
  mid_res = tf.multiply(dot, tf.transpose(w))
  res = tf.reduce_sum(mid_res, axis=2) + tf.transpose(b) # mxd  + 1xd # here also use broadcast compute #
  return res 
def cross_op_single_data(x0, x, w, b):
  ## 最簡潔的cross_實現方法，單條樣本 ##
  ## all para size is [d, 1] ##
  dot = tf.matmul(x0, tf.transpose(x)) # dxd
  cros= tf.tensordot(dot, w, [[1], [0]]) + b ## dot的某行 dot  w的某列 ##
  return cros
def cross_op_batch_data(x0, x, w, b):
  ## x0 and x size is [batch, d]，與後面的方法一致，計算高效 ##
  ## w  and b size is [d, 1]
  x0 = tf.expand_dims(x0, 2) # [batch, d, 1]
  x  = tf.expand_dims(x,  2) # [batch, d, 1]
  dot= tf.matmul(x0, tf.transpose(x, [0, 2, 1])) # [batch, d, d] = batch x {[dx1]x[1xd]
  #cros = tf.tensordot(dot, w, [[1], [0]) + b # [batch, d, 1] this is wrong
  cros = tf.tensordot(dot, w, 1) + b ## 這種寫法來源與maxnet ## 很奇妙 ##
  return tf.squeeze(cros, 2)
def cross_op_None_batch(x0, x, w, b):
  ## x0 and x size is [None, d] ## 藉助了keras.backend.batch_dot ##
  ## w  and b size is [d, 1]
  x0 = tf.expand_dims(x0, 2) # [batch, d, 1]
  x  = tf.expand_dims(x,  2) # [batch, d, 1]
  dot= tf.contrib.keras.backend.batch_dot(x0, tf.transpose(x, [0,2,1]), [2, 1])
  #cros = tf.tensordot(dot, w, [[1], [0]]) + b # this is wrong 
  cros = tf.tensordot(dot, w, 1) + b
  return tf.squeeze(cros, 2)

Reference

《2016-Wide & Deep Learning for Recommender Systems》
《2017-Deep & Cross Network for Ad Click Predictions》
https://research.googleblog.com/2016/06/wide-deep-learning-better-together-with.html (google research blog)
https://github.com/tensorflow/models/tree/master/official/wide_deep (wide&deep github code)
《2016-Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features》
附：tf.nn.embedding_lookup_sparse如何處理不定長的字符串的embedding問題。

輸入數據如下：
csv = [
  "1,oscars|brad-pitt|awards",
  "2,oscars|film|reviews",
  "3,matt-damon|bourne",
]
第二列是不定長的特徵。處理如下：

import tensorflow as tf

# Purposefully omitting "bourne" to demonstrate OOV mappings.
TAG_SET = ["oscars", "brad-pitt", "awards", "film", "reviews", "matt-damon"]
NUM_OOV = 1

def sparse_from_csv(csv):
  ids, post_tags_str = tf.decode_csv(csv, [[-1], [""]])
  table = tf.contrib.lookup.index_table_from_tensor(
      mapping=TAG_SET, num_oov_buckets=NUM_OOV, default_value=-1) ## 這裏構造了個查找表 ##
  split_tags = tf.string_split(post_tags_str, "|")
  return ids, tf.SparseTensor(
      indices=split_tags.indices,
      values=table.lookup(split_tags.values), ## 這裏給出了不同值通過表查到的index ##
      dense_shape=split_tags.dense_shape)

# Optionally create an embedding for this.
TAG_EMBEDDING_DIM = 3

ids, tags = sparse_from_csv(csv)

embedding_params = tf.Variable(tf.truncated_normal([len(TAG_SET) + NUM_OOV, TAG_EMBEDDING_DIM]))
embedded_tags = tf.nn.embedding_lookup_sparse(embedding_params, sp_ids=tags, sp_weights=None)

# Test it out
with tf.Session() as s:
  s.run([tf.global_variables_initializer(), tf.tables_initializer()])
  print(s.run([ids, embedded_tags]))

1）這樣就可以處理非定長的特徵了，壞處是無法納入到tf.feature_column + tf.estimator模型框架裏，模型輸入和整體結構都暴露在外面，醜~
2）改寫成共享embedding也非常容易。
據說最新的tf 1.5裏新增 Add support for sparse multidimensional feature columns.【鼓掌】抽空看看

於建民

發佈了81 篇原創文章 · 獲贊 134 · 訪問量 32萬+

私信關注

wide & Deep 和 Deep & Cross 及tensorflow實現

前言

Why Deep-Network ?

Why Cross-Network ?

特徵輸入

tf實現D&C的注意事項

tf_debug

github 源碼

Reference

一個開源且全面的C#算法實戰教程

C語言--右移左移

12款高效開源Wiki系統推薦，打造團隊知識管理利器

dotnet 基於 DirectML 控制檯運行 Phi-3 模型

常用的 Git 指令

sm4加密工具類

Bert論文閱讀

DIEN在新聞推薦中的簡化探索

興趣探測的多樣性解決方案

微信的look-alike的啓發

興趣探測的模型化探索

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結