Attention 機制在推薦算法中的應用 | 深度興趣網絡(DIN)算法介紹及淺析

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2020 年回看最近一兩年的 CTR(Clicked Through Rate,點擊率)預估算法論文就會發現,這兩年新提出的一系列 CTR 預估算法都能看到 attention 的影子,足以見得 attention 逐漸成爲 CTR 預估算法的一個標配。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中比較有代表性的是由蓋坤領導的阿里媽媽精準定向檢索及基礎算法團隊於 2018 年提出的 DIN(Deep Interest Network,深度興趣網絡)算法(論文下載地址:https:\/\/arxiv.org\/pdf\/1706.06978.pdf。),該算法充分利用\/挖掘用戶歷史行爲數據中的信息來提高 CTR 預估的性能。DIN 算法的提出給 CTR 預估領域帶來了新的研究思路,本文將對這一算法進行介紹,並做一些初步的解析,供大家參考。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"1. 論文背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在工業界 CTR 預估領域中,用戶的歷史行爲特徵(如最近一週的瀏覽商品、最近一週的點擊商品等)是一類非常重要的特徵,能夠有效地刻畫用戶的興趣和行爲偏好。如何最大程度地利用豐富的用戶歷史行爲數據進行精準的 CTR 預估一直以來都是推薦算法領域一個非常重要的研究方向。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"縱觀最近幾年 CTR 預估算法的發展,不難發現對於用戶歷史行爲的挖掘已形成一套相對固定的基本範式,即:通過 embedding 層,將高維離散特徵轉換爲固定長度的連續特徵,然後通過多個全聯接層,最後通過一個 sigmoid 單元轉化爲點擊概率,即 sparse features -> embedding vector -> MLPs -> sigmoid -> output。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"比較典型的 CTR 預估算法,例如 Wide&Deep,DeepFM,xDeepFM 等,均借鑑了這一範式的核心思想。這一類方法的優點在於:通過神經網絡可以擬合高階的非線性關係,同時減少了人工特徵的工作量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"面對衆多的用戶歷史行爲特徵,如何處理這些特徵的 embedding 向量呢?通常有兩種做法:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一種是直接把這些向量 concate 起來,這樣可以保證每個 embedding 的信息都被保留下來;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"還有一種就是利用 pooling 將多個 embedding 向量進行壓縮,這種方法無疑會造成一定程度的信息丟失,常用的 pooling 方法包括 sum-pooling 和 average-pooling,這兩種方法還有一個問題就是這些 embedding 向量的權重都是相同的,也就是認爲不同的用戶行爲特徵對 CTR 預估任務的重要性是相同的,但是事實可能並非如此。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章