簡介

場景：根據用戶歷史行爲序列給用戶推薦物品
例如：根據用戶最近買了鞋子，裙子，現在要預估一件女性大衣的CTR
或者音樂場景中，用戶最近在聽rap，給他推薦一些中國新說唱的歌曲的CTR

普通做法，對用戶歷史行爲的item embedding做average pooling
但是實際中，用戶行爲中有些跟當前推薦物品有關，有些跟當前物品無關，比如女性偶爾給男朋友買了球鞋，偶爾聽到一首熱門歌曲，其實跟用戶當前想要的物品關聯度不高，而女性又一次買了高跟鞋，可能這次買口紅的關聯行酒更大
這就是attention機制的意義。

Attention機制

注意力機制顧名思義，就是模型在預測的時候，對用戶不同行爲的注意力是不一樣的，“相關”的行爲歷史看重一些，“不相關”的歷史甚至可以忽略。那麼這樣的思想反應到模型中也是直觀的

Vu是用戶興趣表達，Vi是歷史物品embedding，Va是當前物品embedding
wi是每個歷史物品和當前物品的相關性權重，可以由Vi，Va的相關性函數表示。g(Vi, Va)由Activation Unit表示

傳統的Attention機制中，給定兩個item embedding，比如u和v，通常是直接做點積uv或者uWv，其中W是一個|u|x|v|的權重矩陣，但這篇paper中阿里顯然做了更進一步的改進，着重看上圖右上角的activation unit，首先是把u和v以及u v的element wise差值向量合併起來作爲輸入，然後餵給全連接層，最後得出權重，這樣的方法顯然損失的信息更少

源碼分析

輸入數據處理：
lookup history embedding
lookup song embedding 和 song category embdding

hidden_units = 128

user_emb_w = tf.get_variable("user_emb_w", [user_count, hidden_units])
item_emb_w = tf.get_variable("item_emb_w", [item_count, hidden_units // 2])
item_b = tf.get_variable("item_b", [item_count],
                         initializer=tf.constant_initializer(0.0))
cate_emb_w = tf.get_variable("cate_emb_w", [cate_count, hidden_units // 2])
cate_list = tf.convert_to_tensor(cate_list, dtype=tf.int64)

ic = tf.gather(cate_list, self.i)
i_emb = tf.concat(values = [
    tf.nn.embedding_lookup(item_emb_w, self.i),
    tf.nn.embedding_lookup(cate_emb_w, ic),
    ], axis=1)
i_b = tf.gather(item_b, self.i)

jc = tf.gather(cate_list, self.j)
j_emb = tf.concat([
    tf.nn.embedding_lookup(item_emb_w, self.j),
    tf.nn.embedding_lookup(cate_emb_w, jc),
    ], axis=1)
j_b = tf.gather(item_b, self.j)

hc = tf.gather(cate_list, self.hist_i)
h_emb = tf.concat([
    tf.nn.embedding_lookup(item_emb_w, self.hist_i),
    tf.nn.embedding_lookup(cate_emb_w, hc),
    ], axis=2)

hist_i =attention(i_emb, h_emb, self.sl)

attention 模塊
輸入:
queries: B x H 當前物品embedding
keys: B x T x H 用戶歷史行爲物品序列embedding，T爲序列長度
keys_length: 用戶歷史行爲的實際長度 <= T

注意mask操作，tf.sequence_mask(keys_length, tf.shape(keys)[1]) 構造mask，讓用戶實際序列外的值很很小的值，softmax之後進行加權和後不受影響

def attention(queries, keys, keys_length):
  '''
    queries:     [B, H]
    keys:        [B, T, H]
    keys_length: [B]
  '''
  queries_hidden_units = queries.get_shape().as_list()[-1]
  queries = tf.tile(queries, [1, tf.shape(keys)[1]]) # 對當前物品複製T次 --》 B H*T
  queries = tf.reshape(queries, [-1, tf.shape(keys)[1], queries_hidden_units])  # B T H
  din_all = tf.concat([queries, keys, queries-keys, queries*keys], axis=-1)   # 計算相關性權重
  d_layer_1_all = tf.layers.dense(din_all, 80, activation=tf.nn.sigmoid, name='f1_att', reuse=tf.AUTO_REUSE)
  d_layer_2_all = tf.layers.dense(d_layer_1_all, 40, activation=tf.nn.sigmoid, name='f2_att', reuse=tf.AUTO_REUSE)
  d_layer_3_all = tf.layers.dense(d_layer_2_all, 1, activation=None, name='f3_att', reuse=tf.AUTO_REUSE)
  d_layer_3_all = tf.reshape(d_layer_3_all, [-1, 1, tf.shape(keys)[1]])  # B 1 T  
  outputs = d_layer_3_all 
  # Mask  防止padding進行更新和計算權重
  key_masks = tf.sequence_mask(keys_length, tf.shape(keys)[1])   # [B, T]
  key_masks = tf.expand_dims(key_masks, 1) # [B, 1, T]
  paddings = tf.ones_like(outputs) * (-2 ** 32 + 1)  ## 很小的值，爲了softmax的權重很低
  outputs = tf.where(key_masks, outputs, paddings)  # [B, 1, T]
# 序列外補很小的值
  # Scale
  outputs = outputs / (keys.get_shape().as_list()[-1] ** 0.5)

  # Activation
  outputs = tf.nn.softmax(outputs)  # [B, 1, T]

  # Weighted sum
  outputs = tf.matmul(outputs, keys)  # [B, 1, H]

  return outputs

舟

發佈了37 篇原創文章 · 獲贊 61 · 訪問量 17萬+

私信關注

DIN模型介紹

簡介

Attention機制

源碼分析

redis的key亂碼問題和值自增問題

一個開源且全面的C#算法實戰教程

一款.NET開源、功能強大、跨平臺的繪圖庫 - OxyPlot

CORS error 但是 status code 是200 OK

壓縮上傳的GPU數據的方案

使用skopeo同步鏡像

php寫的一個簡單路由控制類

DIEN模型介紹

說說MySQL中的事務

RNN LSTM GRU介紹

DSIN模型介紹

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結