深度解讀展會場景智能推薦搭建之路 | 會展雲技術解讀

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/af/af9f6637b50b09be60b00a42f3812d5e.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雲妹導讀:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在《會展雲技術解讀》專題中,我們已經推出了"},{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s?__biz=MzU1OTgxMTg2Nw==&mid=2247494756&idx=1&sn=3628afb8c6b0053d7e62b2ac94e643c3&scene=21#wechat_redirect","title":null},"content":[{"type":"text","text":"安全篇"}]},{"type":"text","text":"與"},{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s?__biz=MzU1OTgxMTg2Nw==&mid=2247494818&idx=1&sn=4b294480df8370df767388ecaa9988ff&scene=21#wechat_redirect","title":null},"content":[{"type":"text","text":"設計篇"}]},{"type":"text","text":",分別介紹瞭如何應對雲上會展最嚴保障要求與線上展覽中基於服務設計的方法,本篇文章雲妹將繼續爲大家帶來會展雲中非常重要的一環——智能推薦系統。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在如今的互聯網產品中智能推薦可謂無處不在,它可以根據用戶每個人的性別、年齡、愛好等維度塑造靜態用戶畫像,和用戶每一次點擊、點贊、評論、收藏等行爲數據形成的動態用戶畫像相結合,來結合挖掘用戶深層次興趣需求維度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們常見的有新聞推薦和電商場景的商品推薦,展會場景推薦系統與之不同的是,它需要滿足"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"參展商"},{"type":"text","text":"、"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"採購商"},{"type":"text","text":"和"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"個人用戶"},{"type":"text","text":"各方需求,尤其是像前不久舉辦的永不落幕的雲上服貿會,首次採用線上+線下結合的模式,將服貿會影響輻射週期從集中的一週拉長至一整年,參展商、採購商以及正在尋找商機有需求的個人用戶都可以隨時隨地瀏覽雲上服貿會尋找有價值的商機。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服貿會註冊展商近萬家,涉及展品數量龐大,涉及"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"200多"},{"type":"text","text":"個子行業。如何讓線上用戶從大量的展商信息中快速找到自己想要的商機?如何保持有效商機的持續獲取?這些問題是提升觀展體驗和逛展效率的關鍵行動。在這個過程中,京東智聯雲機器學習團隊承擔了"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"雲上服貿會智能推薦功能的開發"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5d/5d2bcd5d36fc7d1ea760b802d07f50c0.webp","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從上圖可以看到,整個服貿會智能推薦系統包括四個模塊的功能,同時服務官網2D店鋪和手機APP端,可以做到用戶級別的個性化推薦。針對服貿會的"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"展商"},{"type":"text","text":"、"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"展臺"},{"type":"text","text":"、"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"展品"},{"type":"text","text":"、"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"項目"},{"type":"text","text":"四項重要信息,智能推薦系統有對應的"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"展商推薦"},{"type":"text","text":"、"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"展臺推薦"},{"type":"text","text":"、"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"展品推薦"},{"type":"text","text":"和"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"項目發佈推薦"},{"type":"text","text":"四個模塊。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中,展商、展臺和展品推薦三個模塊的功能引入了採購商和個人的用戶畫像、興趣標籤和行爲等維度數據進行精準匹配。比較難實現的是項目發佈的推薦,因爲除了要考慮用戶畫像和興趣標籤等維度數據外,考慮到項目的及時性和強目的性,還需要高權重的引入內容維度的數據做推薦。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本次智能推薦功能落地過程中除了對於如何更精準的實現項目發佈的推薦外,還有3大難題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"整個雲上服貿,智能推薦呈現的內容承載近80%的用戶“第一眼”,所以如何在第一時間給用戶帶來最佳的精準推薦是一個比較棘手的問題。加上本次是第一屆雲上服貿會,沒有歷史信息可以使用。怎麼做才能最大價值用戶的第一時間流量是整個項目期間持續思考的問題;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"雖然整個雲上服貿會註冊參展商不如電商平臺的量級大,但是在9月5日-9月9日線下展會期間是同樣需要面對高併發和性能的挑戰,好的架構系統設計是扛得住檢驗的堅實防線;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"解決了“第一眼”的推薦,如何做好第二眼、第三眼……的推薦,除了做好用戶畫像,在參展內容刻畫的不斷探索。"}]}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時,我們也在不斷思考:對於“永不落寞的服貿會”如何持續做好後續的推薦?不同於互聯網產品新聞推薦和電商場景的商品推薦,展會場景的推薦如何做出滿足各方(參展商、採購商和個人用戶)需求的推薦之路?"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e6/e68a1cf97d4d5e2fc5161ee727f337a0.webp","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前市場上面向C端的用戶產品如頭條、淘寶和各家音樂APP等爲了做好推薦過程的冷啓動可謂各顯神通——通過多途徑儘可能地獲取數據,比如首次註冊的“關聯微博/微信/QQ”賬戶;比如詢問用戶的偏好和感興趣內容;比如基礎的用戶信息收集(性別、年齡、地域和所屬行業等)。無論通過被動的信息獲取還是主動的用戶意向選擇,都旨在補全對用戶的認知,加上對分發內容精細化粒度的標籤特徵提取,可以實現冷啓動的個性化推薦。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而冷啓動推薦的另一個較高的門檻是:"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"對用戶場景和行爲動機的深層理解,足夠的知識庫沉澱。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但云上服貿會的智能推薦場景,以上兩條路似乎都不怎麼好走。由於是首次參與展會類場景的推薦,即使看似推薦的產品和京東的商品相似,有展臺和展品的劃分,但是用戶的羣體畫像和逛展意圖有天壤之別。好在京東智聯雲長期以來一直賦能於ToB的業務,沉澱了對B端企業採購場景的認知。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"尤其在2020年年初的疫情期間,爲了給企業和政府提供高效的防疫裝備採買,京東智聯雲推出了"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"“應急資源信息發佈平臺”"},{"type":"text","text":"。提供"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"採購"},{"type":"text","text":"和"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"發佈"},{"type":"text","text":"供需信息的通道的同時,還爲平臺用戶提供了基於"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"供需訴求"},{"type":"text","text":"、"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"地理位置"},{"type":"text","text":"、"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"產品匹配度與數量"},{"type":"text","text":"、"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"生產力"},{"type":"text","text":"和"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"運輸效率"},{"type":"text","text":"等多維度的精準推薦。這些沉澱的供需場景知識剛好可以應用於本次雲上服貿會的推薦。另一方面,我們對於用戶畫像和分發內容畫像的理解和補全也做了很多功課,最終確保本次雲上服貿會智能推薦的功能到成功亮相。"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d4/d41f7b64bae5018751316cd3f909bee7.webp","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"面對高併發下的高性能要求,我們設計了基於Caffeine和Redis的多重緩存架構,接下來將從兩個方面來介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1,"},{"type":"text","marks":[{"type":"strong"}],"text":"技術選型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲什麼使用Caffeine+Redis?Redis不用多說,大家都太熟悉了。這裏重點介紹下Caffeine,Caffeine是一個基於"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"Java8開發的提供了近乎最佳命中率的高性能緩存庫。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏有人又會產生疑問,爲什麼不用Guava Cache呢?這種大家更熟悉基於LRU(The Least Recently Used)算法實現的本地化緩存難道不好嗎?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然Guava Cache在過去應用更廣泛,性能也還不錯,但在日新月異的今天,總是會有更優秀、性能更好的緩存框架出現——就像Caffeine。另外再補充下,從Spring5(SpringBoot2)開始也使用Caffeine來取代Guava Cache。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲什麼Caffeine的性能更好?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先從淘汰算法說起,Guava Cache使用的是LRU。LRU實現比較簡單,日常使用時也有着不錯的命中率,它可以"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"有效的保護熱點數據,"},{"type":"text","text":"但對於偶發或週期性的訪問,會導致偶發數據被保留,而真正的熱點數據被淘汰,大大降低緩存命中率。爲此Caffeine使用了Window TinyLFU算法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在講Window TinyLFU前,還需要再簡單介紹下LFU。LFU算法解決了"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"LRU對於突發或週期性訪問導致真實熱點數據淘汰的問題,"},{"type":"text","text":"但短時間對於某些數據的高頻訪問,會導致這些數據長時間駐留在內存中,進而在觸發淘汰時,新加入的熱點數據被錯誤的淘汰掉,最終導致命中率的下降。另外LFU還需要維護訪問頻次,每次訪問都需要更新,造成巨大的資源開銷。Window TinyLFU實際上吸取了LRU和LFU的優點,又規避了各自的缺點。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"具體做法是:首先Window TinyLFU維護了一個近期訪問記錄的頻次信息,作爲一個過濾器,當新記錄來時,只有滿足TinyLFU要求的記錄纔可以被插入緩存。爲了解決資源的高消耗問題,它通過"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"4-bit CountMinSketch"},{"type":"text","text":"實現,這個算法類似於布隆過濾器,可以用很小的空間來存放大量的訪問頻次數據。這個設計給予每個數據項積累熱度的機會,而不是立即過濾掉。這避免了持續的未命中,特別是在突然流量暴漲的的場景中,一些短暫的重複流量就不會被長期保留。爲了刷新歷史數據,一個時間衰減進程被週期性或增量的執行,給所有計數器減半。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/85/85c7ed3c61cb1695ca7e2d2f84e63182.webp","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而對於長期保存的數據,W-TinyLFU使用了Segmented LRU(縮寫 SLRU)策略。在初始階段,一個數據項會被存儲在probationary segment中,在後續被訪問時,它會被移到protected segment中。當protected segment內存不夠時,有的數據會被淘汰回probationary segment,這也可能再次觸發probationary segment的淘汰。這套機制確保了訪問間隔小的熱點數據被保存,而重複訪問少的冷數據則被回收。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ea/ea001867d057a38668c8f16aed208937.webp","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除此以外,在caffeine中讀寫都是通過異步操作,將事件提交至隊列實現的,而隊列的數據結構使用的是RingBuffer(高性能無鎖隊列Disruptor用的就是RingBuffer),所有的寫操作共享同一個RingBuffer;而讀取時,這塊的設計思想是類似於Striped64,每一個讀線程對應一個RingBuffer,從而避免競爭。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"下面是官方性能測試對比:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1、讀(100%)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5f/5fdafb77fb00fcd965a7ffa061e249ea.webp","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2、讀 (75%) / 寫 (25%)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/39/39ddbf5d982a2a3402c73c74bf003376.webp","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"3、寫 (100%)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/21/218c5bfad8594faf77b2fbd115b510ea.webp","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2,"},{"type":"text","marks":[{"type":"strong"}],"text":"多級緩存設計"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Redis作爲常用的緩存,雖然性能非常優秀,但隨着數據量的增長,數據結構的複雜,在疊加高併發場景時,不管是網絡IO的消耗,還是Redis單節點的瓶頸,都會對整個調用鏈的性能造成不可忽視的影響。所以我們既需要Caffeine作爲JVM級別的緩存,也需要Redis作爲我們的二級緩存,這種多級的緩存設計才能最終滿足我們的需要。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在Java世界中,我們最常用的就是基於Spring Cache來實現應用緩存,但Spring Cache僅支持單一緩存來源,並無法滿足多級緩存的場景。因此我們需要"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"通過實現CacheManager接口來定義自己的多級緩存CacheManager,同時還需要實現自己的Cache類(繼承AbstractValueAdaptingCache),"},{"type":"text","text":"在這裏面將CaffeineCache和RedisTemplate類以及相關的一些策略配置注入進去,這樣我們就可以自己實現想要的get、put方法:多級緩存的讀和寫。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在數據一致性的設計上,這塊主要依賴於"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"Redis的發佈訂閱模式,"},{"type":"text","text":"也就是將所有的更新、刪除都通過該模式通知其他節點去清理本地緩存,當然因爲CAP的關係,這種設計是無法保證數據的強一致性的,所以我們也只能儘可能的去保證數據的最終一致性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/04/04cd79fb5c9bf6b2057726adfa362c4a.webp","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/25/25c312be60cd2072277ceea952405a0f.webp","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在會展雲中,我們採用了"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"用戶畫像"},{"type":"text","text":"、"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"信息畫像"},{"type":"text","text":"、"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"關鍵詞匹配"},{"type":"text","text":"等技術實現個性化推薦。其中,用戶畫像是通過用戶的註冊信息、興趣標籤、瀏覽偏好等數據進行構建。信息畫像包括了展商畫像、展臺畫像、展品畫像和項目畫像四部分,前三部分各自構建又互相利用了對方的信息,如展商的收藏、瀏覽等數據會添加該企業對應展臺和展品的數據,展品的行業信息需要從展商畫像中獲取,三部分數據融合建模,從而構建了更加豐富的畫像。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"關鍵詞匹配技術主要應用於"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"行業名稱和交易類型關鍵詞的匹配,通過該技術可以將不標準的信息規範化。"},{"type":"text","text":"該系統還針對冷啓動場景進行了優化,當用戶和信息數據不足時,系統可以根據僅有的用戶註冊信息和參展商的行業信息進行匹配,並考慮信息的熱度進行排序。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本次服貿會實現了對數十萬用戶提供個性化推薦服務,針對新註冊的用戶和新發布的信息也可以通過冷啓動方案快速實現智能推薦。推薦系統採用了通用的"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"召回"},{"type":"text","text":"和"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"排序架構"},{"type":"text","text":",召回部分將採用"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"協同過濾"},{"type":"text","text":"、"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"矩陣分解"},{"type":"text","text":"等模型,可以快速從海量數據中粗篩出候選集;排序部分採用更復雜且準確率較高的"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"深度學習"},{"type":"text","text":"模型,如業界常用的Wide&Deep、DeepFM等先進模型,實現對候選集每個信息的精準排序,爲服貿會的用戶和參展商提供準確和穩定的服務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fd/fdf02deb34f8eec1a9b96ff2aa737272.webp","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在模型選擇上,我們使用DIN(Deep Interest Network)模型。在正式介紹模型之前,先來介紹一下Attention機制。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Attention機制是"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"模仿人類注意力而提出的一種解決問題的辦法,"},{"type":"text","text":"簡單地說就是從大量信息中快速篩選出高價值信息,即一種將內部經驗和外部感覺對齊從而增加部分區域的觀察精細度的機制。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如人的視覺在處理一張圖片時,會通過快速掃描全局圖像,獲得需要重點關注的目標區域,也就是注意力焦點。然後對這一區域投入更多的注意力資源,以獲得更多所需要關注的目標的細節信息,並抑制其它無用信息。圖1中對Attention機制進行了圖示,其中亮白色區域表示更關注的區域。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/56/56aeb3c330e2b2954c046ff0d507dd29.webp","alt":null,"title":"▲圖1 注意力機制直觀展示圖▲","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Attention機制的具體計算過程見圖2。對目前大多數Attention方法進行抽象,可以將其歸納爲兩個過程、三個階段:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一個過程是根據query和key計算權重係數:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(1)第一個階段根據query和key計算兩者的相似性或者相關性;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(2)第二個階段對第一階段的原始分值進行歸一化處理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二個過程根據權重係數對value進行加權求和:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/75/75332b7e3badc0ffc54f307a30cc85f2.webp","alt":null,"title":"▲圖2 三階段計算Attention過程▲","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"利用候選參展商品和用戶歷史行爲之間的相關性計算出一個權重,這個權重就代表了“注意力”的強弱。DIN設計了局部激活單元,激活單元會計算候選參展商品與用戶最近N個歷史行爲商品的相關性權重,然後將其作爲加權係數對N個行爲商品的embedding向量做sum pooling。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶的興趣由加權後的embedding來體現。權重是根據候選參展商品和歷史行爲共同決定的,同一候選商品對不同用戶歷史行爲的影響是不同的,與候選商品相關性高的歷史行爲會獲得更高的權重。可以看到,激活單元是一個多層網絡,輸入爲用戶畫像embedding向量、信息畫像embedding向量以及二者的叉乘。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DIN模型大致分爲以下五個部分:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"Embedding Layer:"},{"type":"text","text":"原始數據是高維且稀疏的0-1矩陣,emdedding層用於將原始高維數據壓縮成低維矩陣;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"Pooling Layer :"},{"type":"text","text":"由於不同的用戶有不同個數的行爲數據,導致embedding矩陣的向量大小不一致,而全連接層只能處理固定維度的數據,因此利用Pooling Layer得到一個固定長度的向量;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"Concat Layer:"},{"type":"text","text":"經過embedding layer和pooling layer後,原始稀疏特徵被轉換成多個固定長度的用戶興趣的抽象表示向量,然後利用concat layer聚合抽象表示向量,輸出該用戶興趣的唯一抽象表示向量;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"MLP:"},{"type":"text","text":"將concat layer輸出的抽象表示向量作爲MLP的輸入,自動學習數據之間的交叉特徵;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"Loss:"},{"type":"text","text":"損失函數一般採用Logloss;"}]}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DIN認爲用戶的興趣不是一個點,而是一個多峯的函數。一個峯就表示一個興趣,峯值的大小表示興趣強度。那麼針對不同的候選參展商品,用戶的興趣強度是不同的,也就是說隨着候選商品的變化,用戶的興趣強度不斷在變化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總的來說,DIN通過"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"引入attention機制,"},{"type":"text","text":"針對不同的商品構造不同的用戶抽象表示,從而實現了在數據維度一定的情況下,更精準地捕捉用戶當前的興趣。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上,是我們爲本次服貿會智能推薦板塊提供的技術支持和思考,本次服貿會作爲首屆"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"“永不落幕”"},{"type":"text","text":"服貿會,同樣,我們在技術之路的深耕和追逐的腳步一刻也不敢懈怠,不斷思考持續探索,不忘初心未來可期。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"推薦閱讀:"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s?__biz=MzU1OTgxMTg2Nw==&mid=2247494818&idx=1&sn=4b294480df8370df767388ecaa9988ff&scene=21#wechat_redirect","title":""},"content":[{"type":"text","text":"基於服務設計的線上展覽 | 會展雲技術解讀"}]}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s?__biz=MzU1OTgxMTg2Nw==&mid=2247494756&idx=1&sn=3628afb8c6b0053d7e62b2ac94e643c3&scene=21#wechat_redirect","title":""},"content":[{"type":"text","text":"多重安全保障護航雲上會展 | 會展雲技術解讀"}]}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s?__biz=MzU1OTgxMTg2Nw==&mid=2247494488&idx=1&sn=a100e4e684c83fe643c6b192eabd7134&scene=21#wechat_redirect","title":""},"content":[{"type":"text","text":"讓黑產無處遁形:京東智聯雲推出風險識別服務"}]}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"歡迎點擊"},{"type":"text","text":"【"},{"type":"link","attrs":{"href":"https://www.jdcloud.com/cn/cloudexpo/all?utm_source=PMM_infoQ&utm_medium=NAutm_campaign=ReadMoreutm_term=NA","title":""},"content":[{"type":"text","text":"京東智聯雲"}]},{"type":"text","text":"】"},{"type":"text","marks":[{"type":"strong"}],"text":",瞭解京東會展雲服務"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"更多精彩技術實踐與獨家乾貨解析"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"歡迎關注【京東智聯雲開發者】公衆號"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/77/77b9f9bae21f5a6033857fbf27a4b901.jpeg?x-oss-process=image/resize,p_80/auto-orient,1","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章