乾貨 | 攜程酒店推薦模型優化
{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當用戶在線上瀏覽酒店時,作爲旅行平臺,如何挑選更合適的酒店推薦給用戶,降低其選擇的費力度,是需要考慮的一個問題。在攜程APP中,一般會觸發多種場景。在Figure 1中,我們列舉了幾種典型的場景:歡迎度排序,智能排序和搜索補償推薦。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/c3\/c3086d2cf6e60112aadada09befe5892.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"Figure1:用戶觸發的場景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"歡迎度排序"},{"type":"text","text":":在異地場景下,當用戶沒有明確表達自己需求的情況下,默認按照歡迎度排序展示酒店;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"智能排序"},{"type":"text","text":":在同城或者當用戶在瀏覽上述排序結果的過程中,發現自己的興趣並沒有得到滿足的時候,往往會添加篩選條件。比如添加地標選項,從而獲得該地標附近的相關酒店。相比歡迎度排序,用戶表達了更多的個性化需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"搜索補償推薦排序"},{"type":"text","text":":如果用戶在當前搜索的展示結果中,還是沒有找到滿足自己需要的酒店,就會不斷往下翻頁查看更多的搜索結果。而當滿足搜索限制條件的酒店數目不足時,會觸底。這個時候就會觸發搜索補償推薦。前面兩個場景都沒有滿足用戶需求,所以在搜索補償場景中,我們需要深入挖掘用戶歷史行爲,提供更加個性化的推薦內容。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文將主要介紹我們在補償推薦場景中所做的算法優化工作。包含模型迭代、模型迭代過程中遇到的技術需求以及針對技術需求所做的一些基建等。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、推薦模型的迭代"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在酒店推薦的場景中,我們需要把滿足用戶需求的產品優先曝光給用戶,減少其使用產品的費力度。我們以用戶在酒店的TOP點擊(編者注:可以簡單理解爲用戶點擊排在TOP位置酒店的概率,TOP點擊命中率越高,用戶體驗越好)和轉化命中率(CR)作爲費力度指標;CR優化問題被建模成二分類問題,離線採取AUC和NDCG作爲模型效果評估標準。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分類問題的本質就是要找到一條決策邊界函數f(x),來把正樣本(比如成單)和負樣本(比如沒有成單)數據分開。這裏,x就是模型特徵,好的特徵能更好地表徵正負樣本的差異,讓函數學習事半功倍;學習函數f(x)的過程就是建模過程。在本節中,我們分別介紹下我們在特徵和建模方面所嘗試的工作。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.1 特徵"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推薦特徵使用的特徵可以分爲:用戶側特徵、物品側特徵以及用戶和物品的交互特徵。從特徵的數值特性上,又可以分成:連續值特徵和離散值特徵。算法剛開始接入的時候,我們的模型只有連續值特徵。後經歷了從完全連續值特徵,到離散特徵比重越來越重的過程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"連續值特徵既有靜態的(比如用戶的性別,酒店價格等),也有基於用戶行爲的動態的(比如用戶點擊某個酒店的次數)。"},{"type":"text","marks":[{"type":"strong"}],"text":"連續值特徵的優點是具備良好的泛化能力。"},{"type":"text","text":"一個用戶對一個商圈的偏好可以泛化到另外一個對這個商圈有相同統計特性的用戶身上。"},{"type":"text","marks":[{"type":"strong"}],"text":"連續值特徵的缺點是缺乏記憶能力導致區分度不高。"},{"type":"text","text":"比如:在同一個商圈酒店列表中,一個用戶點擊了(A,B,C,D),而另外一個用戶點擊了(W,X,Y,Z)。雖然兩個用戶的行爲序列不同,但是統計值特徵忘記了用戶具體點擊了哪些酒店,認爲兩個用戶都是點擊了4家酒店。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"離散值特徵是細粒度的特徵:設備ID,用戶ID,用戶點擊的item ID都可以做特徵。這樣一來不同的人,有不同的行爲就可以在特徵上有很好的區分度,因此離散值特徵是模型可以把千人千面做得更進一步的基礎。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"離散特徵的優點就是記憶力強區分度高。"},{"type":"text","text":"我們還可以讓離散特徵通過特徵組合的方式,挖掘用戶對於酒店更深層次的興趣偏好:比如點擊item A的人也喜歡item B,我們可以直接基於(A,B)生成一個組合離散特徵,來學習A和B的協同信息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"離散特徵的缺點是泛化能力相對較弱。"},{"type":"text","text":"這是因爲特徵粒度太細,預測的時候命中率會偏低。同時在模型訓練的時候,細粒度的特徵相對粗粒度的特徵更容易獲得權重,讓泛化能力強的粗粒度的特徵學到的信息更少,進一步的惡化了模型的泛化能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"舉一個例子來說:每個樣本都有一個唯一的樣本id,如果我們以樣本id爲特徵,那麼我們模型訓練的時候,樣本id特徵可以完美的擬合label;但實際測試中,會發現因爲模型嚴重過擬合而效果非常差。這是設計離散特徵需要考慮的特徵記憶性和泛化性的tradeoff,也就是特徵儘可能細和特徵在測試數據中命中率儘可能高的tradeoff。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們結合業務場景,探索出技術方案:可以讓離散值特徵在線上也有良好的泛化能力,從而最終讓模型兼具較強的記憶能力和泛化能力。在工程方面:因爲組合特徵的存在,我們特徵空間會很大(e.g. 現在酒店推薦場景特徵可以輕鬆到億級別),對模型訓練和在線serving工程提出新的挑戰。這也是我們在推進大規模離散DNN訓練框架要解決的關鍵問題。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.2 模型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們模型經歷了從Logistic Regression (LR),Gradient BoostingDecision Tree(GBDT)到Deep Neural Network(DNN)的迭代過程。在業務開始階段,數據量和特徵量都比較少,通常會採用LR模型。隨着算法的迭代,數據量和特徵規模越來越多的時候,基於XGBOOST或者LightGBM構建GBDT模型是業務成長期快速拿到收益的好的選擇。當數據量越來越大的時候,需要基於DNN的框架來把個性化模型做的更細。下面對三種模型的特點做一些簡單的介紹。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/bb\/bbccc8170a219c494314fbc16bb3c6c8.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"Figure2:模型決策邊界"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在Figure 2中,展示了三個模型的決策邊界:"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"LR"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在LR裏,決策邊界函數是線性的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模型的優點:可以通過模型的權重大小,解釋特徵的重要性;同時LR支持增量更新;在引入大規模離散特徵的情況下,業界在LR時代的經典做法是對LR加L1正則並通過OWLQN或者Coordinate Descent的方式進行優化,也可以通過FTRL算法讓模型稀疏避免過擬合。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模型的缺點:線性決策邊界這個假設太強,會讓模型的精度受到限制;另外,模型的可擴展性程度低。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"GBDT"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在GBDT中,決策邊界是非線性的;模型通過將樣本空間分而治之的方式,來提高模型精度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模型的優點:樹模型可以計算每個特徵的重要性程度,來獲得一些可解釋性;同時模型比LR有更高的精度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模型的缺點:不支持大規模的離散特徵,不支持增量更新;模型可擴展性程度低。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們在酒店推薦場景中,嘗試了pointwise loss和pairwise loss,每次嘗試都獲得了不少的提升。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"DNN"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在DNN中,決策邊界是高度非線性的。我們知道計算機通過與或非這種簡單的邏輯,可以表達各種複雜的對象:音頻,視頻,網頁等。而DNN每一層網絡比與或非更加複雜,DNN通過多層神經元疊加,成爲一個萬能函數逼近器。在理想情況下,只要有足夠的數據量,不論我們實際的決策邊界如何複雜,我們都可以通過DNN來表達。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時DNN,支持增量更新,支持根據業務場景進行靈活定製各種網絡結構,支持大規模離散DNN,在離散模型中學習出來的Embedding向量還可以用在向量相似召回裏面。正因爲有這麼多的好處,DNN正在成爲業界推薦算法的標配。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個模型的缺點是:特徵經過不同層交叉,交互耦合關係過於複雜,而導致可解釋性不好;工程複雜度在我們用不同結構的時候所有不同。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面的表總結了三個模型的特點:"}]},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.