阿里飛豬搜索技術的應用與創新

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"旅行場景的搜索起初是爲了滿足用戶某種特定的強需求而出現的,如機票、火車票、酒店等搜索。這些需求有着各自不同的特點,傳統的旅行搜索往往會對不同業務進行定製化搜索策略。隨着人工智能技術的不斷髮展,用戶對產品的易用性提出了更高的要求。旅行場景的搜索逐漸發展爲一個擁有旅行定製搜索策略的全文檢索引擎。本文將爲大家介紹阿里飛豬在旅行場景下搜索技術的應用與創新,主要內容包括:"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"豬搜背景"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基礎建設"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"召回策略"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"思考總結"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"豬搜背景"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 飛豬搜索"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/1a\/1aff367dc39d0c182b2d0829a6680153.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"飛豬搜索業務分爲兩大部分:一是全局搜索,二是行業小搜。右邊飛豬界面的全局搜索就是最上方的輸入框。直接對應飛豬內部所有內容的搜索入口,都可以從全局搜索獲得。右圖中間部分就是產業小搜的垂直入口。比如搜索酒店機票和旅遊度假產品,一般用戶會使用行業小搜,垂直搜索需求。隨着飛豬業務的發展,以及用戶需求的變化,流量會從行業小搜逐漸遷移到飛豬的全局搜索上。主要是因爲:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"旅遊行業是一個跨類目的需求。用戶天然的需要定機票、酒店以及一些網絡的門票,如果全部通過垂直搜索,需要進行多次點擊,對用戶來說不是很方便。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"飛豬很多流量是由手淘引流過來的,手淘是一個全局的搜索。所以用戶會習慣性的使用全局搜索來滿足他的需求。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對用戶來說,用全局搜索的操作是最方便的,路徑最短。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 豬搜框架"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a3\/a309f596860a3e6c45428e45ec391557.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"豬搜框架如圖所示,首先通過調用QP來獲得當前的Query理解,以及需要召回的Query生成,然後通過SP分頁服務調用HA3倒排索引來獲取召回的結果。通過粗排序和加權排序將結果通過LTP服務做重排序,最後將得到的結果展示給用戶。這裏主要介紹下QP的工作。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. QP"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f2\/f2b1a9985874e486d0830bfebfed9f6b.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"QP即Query理解與召回生成服務。在這個服務中,我們面臨的挑戰主要有:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"性能限制:在業界,通常QP階段只佔用整個線上響應時間的1\/10。所以,對性能要求比較高,響應時間不能過長,需要提供良好的線上服務體驗。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"文本理解:我們的QP和其他的全局搜索QP一樣,也需要做傳統的文本理解,提供文本相關性的能力。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"獨有挑戰:在旅行場景下,會有一些特殊的要求。比如LBS與POI的理解能力,能夠提供空間上的相關性。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"特徵理解:從業務發展角度,我們還需要用戶特徵的理解,可以提供個性化的相關性,來滿足用戶的需求。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"基礎建設"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來,爲大家介紹下飛豬在具體基礎建設上的一些工作。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. Query tagging"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f0\/f0ffd383a23e757cdde3fa3548283bed.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Tagging是QP中的一個基礎任務。負責的功能是把一個query 打出目的地和意圖。舉個例子,“北京自由行”中“北京”就是用戶的目的地,“自由行”是用戶的意圖需求,可以看出用戶希望的是一個自由行的商品,而不是跟團遊這類的產品,可能會更希望獲得一些機票+酒店或者是無購物的產品。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏的工作,主要分爲以下幾層:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據層:通過離線挖掘出tagging詞庫。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"算法層:通過Tag消歧、CRF等算法進行在線打標工作。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"應用層:在tagging上的一些應用,如query丟詞和query改寫。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於線上性能的限制,我們主要依賴於離線的挖掘。這裏以我們內部比較重要的商品POI挖掘爲例,來介紹下我們離線挖掘tagging 的工作。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 商品POI挖掘"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"① QueryTagging"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/3e\/3e2220e2edeade017d863398fd0166b8.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"POI的挖掘除了商品title 可能會有一些景點信息外,詳情也會包含大量的信息。因此,我們需要從這些內容中挖掘出有價值的信息,來擴充詞表。例如圖中的景點POI,可以用作索引參與召回,但是詳情是非結構化的HTML文本,要挖掘POI實體,會有比較大的難度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"② 建模方式"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/75\/75283662dc40613356296b0b802ea345.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們採用了典型的序列標註問題來解決這個問題。我們通過一些特徵,如詞特徵、數字特徵、類目特徵,進行篩選,通過人工標註來訓練我們的CRF++模型。後續我們還升級成了Template下的模型來訓練NER模型,使我們可以在離線,對接了大量的文本數據,進行序列標註。最終,我們達到了99%以上的準確率,召回率也超過95%。擴充了大量的沒有挖掘出POI商品\/POI特徵的度假商品,使它們產生了POI的特徵,可以更好地爲後續的POI及檢索做出服務。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. 同義詞挖掘"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/3a\/3a138efcf0c99022f624caf30610ce21.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在旅行行業,存在四種類型的同義詞:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"翻譯類:如“迪斯尼”,可能有不同的中文描述方式"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中英文詞:有的用戶用英文來描述,而有的用戶用中文來表述,但是商家描述的title是英文"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"包含關係:比如“普吉”和“普吉島”,可能“普吉”這個POI是“普吉島”這個大POI下的子POI"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"錯別字:比如“國色天香”,在圖中應該是“國色天鄉”"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們希望可以用一個通用的模型來解決這種同義詞關係。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/c1\/c1d5dc5f17874968fac4dc20a67d84ec.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們的辦法是基於用戶點擊行爲,拼接query和商品title,使得query和title中的詞形成上下文,然後基於word2vec的skip-gram模型,得到每個詞的詞向量,並基於語義相似性,產生每個詞top 20的候選,同時將問題轉換爲二分類問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外,在特徵工程上,我們會利用中英文的編輯距離、共現數目以及是否包含關係、餘弦相似度等來構建特徵。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後,我們通過人工標註來構建正樣本,負樣本按照編輯距離倒排隨機採樣,使用LR模型和XGBoost對標註好的樣本進行二分類。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,我們還會經過一層人工審覈,因爲同義詞的影響面積比較大,如果直接通過算法挖掘,在線上的效果可能不會特別好。所以我們沒有采用複雜的模型,只是夠用就可以了。這樣在萬級別的人工標註上,我們的準確率可以達到94%。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4. 糾錯"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/46\/46aa2b280cd63e5627a456a758fa6aff.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"① 背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於糾錯,剛纔提到了詞級別的錯誤,事實上,整個Query中也會出現一些錯誤。只用詞級別的糾錯,不能滿足用戶需求,需要一個全query糾錯邏輯。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於QP階段對性能要求很高,現在業界常用的seq2seq方法,雖然效果很好,但整體性能不達標。我們可以在離線利用seq2seq來挖掘高頻的信息,但在線上很難應用seq2seq的方法來做糾錯。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"② 方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們的方案是採用傳統的隱馬爾科夫模型,基於統計的方式來做,能夠達到線上的性能要求。將錯誤分爲同音字與形近字,可以獲得比較強的可解釋性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同音字:因爲漢字都可以查到拼音碼錶,我們可以很容易的構建一個同音字的集合,然後通過一些統計的方式,就能獲得同音詞生成概率。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"形近字:比較難獲得,因爲很難判斷兩個字是否有些相似。我們這裏,通過字體圖像和字體結構來解決的。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a4\/a40eb0a5ded4f541b09369ab354537a7.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"③ 基於圖像"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"說到基於圖像的方式,最直接的方式就是基於CNN圖像網絡的匹配算法。但是出於性能方面的考慮,這種方法的效果往往達不到我們的性能要求,所以我們採用了一個比較簡單且有效的方法,就是我們直接對兩個可能形近的字的圖像進行計算。對形近字而言,我們在標準的字體庫中,發現它有兩個特點:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如鳥和烏兩個字,在字體庫裏的圖直接對比,它們的重合度是非常高的,由於字體庫裏的字,它的標準化程度是很高的,可以通過這種方式來進行計算。我們這裏基於圖像的方式,就是採用我們對字體庫裏的兩個字來進行每個點的一個具體的計算。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外,對於鳥和烏這個字,鳥這個字的每一個點在烏字上找到和它最近的一個點,作爲這兩個點相似度,那對於每一個點,我們都可以找到一個距離,然後通過求和的均值計算,我們就可以得到這個兩個字距離的相似度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過離線對兩個字以各自的圖像進行計算,那就可以獲得比較相似的一些字。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"④ 基於字體結構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外,我們還會通過字體結構的方式來進行計算。像倉頡、鄭碼、四角號碼的編碼,是基於這個字的情況來做的編碼。對於倆個形近字,它們的倉頡碼、鄭碼、四角號碼往往也會比較相似。所以,我們通過序列的相似計算,可以獲得這兩個形近字的相似度,然後通過相似度進行閾值計算,就可以得到字形相似的集合。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"召回策略"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來爲大家介紹下飛豬在召回策略上的一些技術:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/1c\/1cd9640acf14d156c48c2b9e707428b2.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"航旅召回跟常用的搜索召回有相似的地方,也有不同,面臨的挑戰主要有:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶query和商品描述之間存在GAP"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"航旅商品僅百萬級,而且城市分割,很容易造成無結果"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"召回優化時,很容易導致誤召回"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"旅行是低頻行爲,用戶行爲稀疏,算法樣本較少"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"鑑於這種情況,我們對用戶的召回分成了以下四種召回方式:經典召回(同義詞挖掘、相似query改寫、商品POI挖掘)、LBS召回、向量召回、個性化召回(I2I&U2I以及向量模型),來滿足用戶的需求。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 經典召回"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/fe\/fe185c1a87a85add145db1b0fc5fa65f.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"剛剛已經介紹過同義詞挖掘和商品POI挖掘,這裏主要介紹下相似query改寫。以“上海迪士尼樂園門票”爲例,其實標準的商品是“上海迪士尼度假區”,而“黃山風景區”的標準商品其實是“黃山”。在這樣的情況下,如果我們直接創建搜索,可能召回的效果比較差。因而,我們會進行一些相似query挖掘,來滿足這種query和title GAP的情況。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Learning To Rewrite:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/87\/87969ef1a2ceb9afc8e33e52bba6d074.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們思路是使用多路改寫產生候選集合,然後用learning to Rank 選取top K結果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先假設用戶在篩選中輸入了query,這個query是比較相似的。因爲用戶在篩選中是想要獲得他想要的結果。如果用戶第一個query,沒有得到想要的結果,用戶會進行一些改寫。就相當於用戶幫助我們完成了一次改寫,我們從中可以學到用戶改寫的信息。這裏我們是用類似word2vec的模型實現的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外,從query相似度來看,我從文本上也可以獲得一個相似的query文本。這裏我們採用的是doc2vec模型,來獲得文本相似性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,通過query和title點擊,可以訓練一個雙塔結構的語義相似度模型,來獲得query和title相似性的特徵。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過這三種方式,我們可以獲得想要的相似query改寫的候選。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於候選,通過一些人工標註及線上的埋點信息,來獲得原query和候選query相似的標註。這樣我們就可以訓練一個模型來進行相似query的排序工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最終,我們線上使用的模型是PS-SMART 模型。加上規則過濾之後,準確率可以達到99%。可以影響線上36%的PV,對一次UV的無結果率可以相對降低18%。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 航旅特色召回:LBS召回"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/3b\/3b31796996fa3117dd3b68bc3f517737.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於用戶是在旅行場景下搜索,用戶天然會需要LBS 相關的信息。如果是差旅用戶,可能會定阿里巴巴園區附近的酒店,如果是旅遊用戶,可能會定黃山風景區附近的酒店。這就需要識別用戶想要的商品大概在什麼樣的LBS範圍內。解決的方法是通過對query中用戶POI的識別,獲取用戶的經緯度,進行召回上的限制。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"建模過程:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/17\/17494670d512bc3fa737f9f4f7dd5e34.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先會對query進行常規的分詞,然後在POI專用的倒排索引庫進行檢索,獲得候選POI。接下來對候選POI query進行特徵計算,計算出文本相似性、embedding相似距離,以及用戶當前位置輸入後,與歷史點擊的商品地點的距離做特徵。然後用特徵構建模型算出一個分數,通過一定的閾值得到結果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,我們的準確率可以達到95%,GMV和成交都得到了一定的提升。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. 深度召回:向量召回"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"① 背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/9a\/9ae35476b1482a9edad37e2916140570.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前面提到的都是一些簡單的文本召回,以及LBS召回等偏傳統的方法。前面說過,我們的商品按照目的地切換後,還是很稀疏,還會存在無召回的情況。對於這種情況,我們想到引入向量召回的方式進行補充召回。可以覆蓋改寫沒有的情況,可以召回一些原來不能召回的產品。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"② 向量召回整體架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/44\/44ba55f448bab0a5e8c3d2ee926587ed.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"向量召回架構如上圖。在線通過對query 進行embedding。離線通過HA3引擎,把所有的item embedding存儲到HA3引擎中。最後,SP通過從QP獲得query embedding,進行HA3檢索,獲得需要的商品。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"③ 模型結構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/59\/59eea8a5740ab43cb6973e7c4266f213.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模型結構,如上所示:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"query側:通過對query的文本,進行卷積層特徵抽取。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"商品側:我們主要的工作在這裏,除了文本上對用戶目的地的需求,對商品類目的需求也是比較關注的。所以在商品特徵上,使用了商品title文本的卷積特徵,以及目的地類目id 的特徵。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對這三個特徵,我們沒有使用簡單的concat,而是使用了tensor fusion進行三個向量的外積,可以讓特徵更好的融合。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,通過全鏈接層進行特徵抽取,計算向量內積。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於損失函數,我們使用的large margin loss。對於學的足夠充分的case ,就丟棄掉,不再進行學習,讓模型更快的達到我們想要的效果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"④ 樣本選擇"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/77\/7741a0374127fc55e0b13dc6e01970b9.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在樣本選擇上,我們對正負樣本也做了一些探索。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"集團內通用的方法:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"正樣本:query下用戶點擊的商品"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"負樣本:未點擊的商品"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這樣的方法更適合在排序上使用,而不太適合召回。以左圖爲例,用戶點擊了“上海迪士尼度假區”,未點擊的是下面的商品,雖然可能是由於商品的標題標準化比較低,用戶未點擊,但不能說它是不相關的商品。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們的方法:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"正樣本:和集團一樣,使用點擊的商品"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"負樣本:隨機選取的樣本作爲負樣本"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用隨機選擇有兩方面:一是在全量商品中,進行隨機選擇;二是在一個類目或者目的地下,進行隨機選擇。這樣可以提升訓練的難度,達到我們想要的效果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"⑤ 模型產出與使用方式"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b2\/b21c0f4d59d99dc6083c9126273d7677.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最終產出的分數,也給排序使用了,作爲排序的一個特徵,取得了不錯的效果,可以排在第4位。另外,線上召回可以讓無結果率降低32.7%。同時,擴充了1.7倍的相似query。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4. 個性化召回"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/99\/99ef39ee07bbc6b66646d7edfccafc21.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"爲什麼做個性化召回?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲在旅行場景下,會存在一些泛需求搜索。比如搜杭州,我們會對杭州所有的商品和酒店進行召回。這樣大量的召回會給後面的排序造成很大的壓力,沒辦法根據用戶的query排出一個用戶想要的item。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外,還有一種情況是用戶搜索的意圖不是很明確,可能會存在一些無結果的情況。對於這種情況,傳統的文本相似性、深度召回都無法召回的情況下,可以嘗試個性化的方式,給用戶推薦一些商品,直接展示在搜索結果中,提供補充,來提升用戶體驗。實踐證明,用戶也會對這類商品進行點擊和購買。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們的方案有兩種方式:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"引入推薦的召回結果,在此基礎上進行相關性粗排,得到個性化召回"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"構建了個性化專用的向量召回模型,來得到更好的個性化召回結果"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/9d\/9dcfda3a442e6e86fc88d9fe9d7ee14c.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整體的方式是將召回池分爲個性化召回和文本召回兩路:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"個性化召回:通過推薦的重定向、i2i 、lbs2i以及屬性2i等方式,來獲得推薦召回結果。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"文本相關性過濾:通過文本相關性的過濾(如關鍵詞命中和向量cos相似度),把推薦召回和當前用戶搜索query很不相關的item過濾掉,展現給用戶比較相關,也是通過用戶i2i擴展的結果。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"個性化召回模型:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/73\/739c48e541e0d64dcdaae074c238af9d.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在用戶側,通過用戶畫像屬性和用戶的query,進行特徵抽取。另外,我們引入了用戶操作序列,來達到個性化目的。比如用戶最近搜索時,查看的商品、點擊的商品、加購的商品以及成交的商品,這些操作的商品序列,引入到模型中。然後通過用戶畫像和用戶query特徵向量,對用戶歷史操作序列做attention,就能夠從用戶操作序列中取出跟用戶當前搜索最相關的商品特徵,來滿足用戶當前搜索的需求。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在商品側,也會引入商品特徵。如商品title、商品目的地、商品類目等特徵,作爲商品的優選,然後獲得一個向量。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在上層,我們採用剛剛提到的tensor fusion來進行特徵融合,讓不同的特徵更好的融合。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"模型優化:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在深度向量召回上,對文本的特徵採用卷積模型進行抽取。這裏並沒有採用卷積,而是採用了簡單的詞向量concat 方式。這是因爲通過實驗驗證,使用卷積學到的文本特徵比較強,整體的個性化效果比較弱,這不是我們希望見到的。所以我們採用了減弱文本特徵的限制,突出個性化特徵帶來的額外檢索效果。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結思考"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ad\/ad4f7e7f81040ee0485a2db47f7ad7ba.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,是我們對工作的思考總結:"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. Query & User Planer"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在我們還是叫QP,後續我們希望升級成Query & User Planer,能夠更多的融合用戶特徵,增加更多的個性化搜索能力。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 可解釋性升級"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們希望對搜索的可解釋性進行升級,不是簡單的用文本或者深度向量直接進行召回。我們希望對用戶的意圖,進行更多維度、更細力度的理解,能夠直接理解成人類可讀的意圖。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外,我們希望對用戶的行爲做預測。因爲用戶搜杭州時,可能根據歷史點擊推出來的商品也不能滿足用戶需求。我們後續希望對這類query,能夠預測出用戶想去的景點。當用戶搜酒店時,可以預測出用戶想去的目的地,更好的滿足用戶需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"今天的分享就到這裏,謝謝大家。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"嘉賓介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"林睿,阿里花名“英卓”,本碩畢業於哈爾濱工業大學。先後在百度、阿里從事NLP,搜索相關的算法工作。目前主要負責飛豬搜索中NLP算法及搜索召回算法的優化研發工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:DataFunTalk(ID:datafuntalk)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/vh7g1HzfZJxc-Xna-ocTpg","title":"xxx","type":null},"content":[{"type":"text","text":"阿里飛豬搜索技術的應用與創新"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章