干货 | 携程酒店推荐模型优化
{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"当用户在线上浏览酒店时,作为旅行平台,如何挑选更合适的酒店推荐给用户,降低其选择的费力度,是需要考虑的一个问题。在携程APP中,一般会触发多种场景。在Figure 1中,我们列举了几种典型的场景:欢迎度排序,智能排序和搜索补偿推荐。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/c3\/c3086d2cf6e60112aadada09befe5892.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"Figure1:用户触发的场景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"欢迎度排序"},{"type":"text","text":":在异地场景下,当用户没有明确表达自己需求的情况下,默认按照欢迎度排序展示酒店;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"智能排序"},{"type":"text","text":":在同城或者当用户在浏览上述排序结果的过程中,发现自己的兴趣并没有得到满足的时候,往往会添加筛选条件。比如添加地标选项,从而获得该地标附近的相关酒店。相比欢迎度排序,用户表达了更多的个性化需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"搜索补偿推荐排序"},{"type":"text","text":":如果用户在当前搜索的展示结果中,还是没有找到满足自己需要的酒店,就会不断往下翻页查看更多的搜索结果。而当满足搜索限制条件的酒店数目不足时,会触底。这个时候就会触发搜索补偿推荐。前面两个场景都没有满足用户需求,所以在搜索补偿场景中,我们需要深入挖掘用户历史行为,提供更加个性化的推荐内容。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文将主要介绍我们在补偿推荐场景中所做的算法优化工作。包含模型迭代、模型迭代过程中遇到的技术需求以及针对技术需求所做的一些基建等。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、推荐模型的迭代"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在酒店推荐的场景中,我们需要把满足用户需求的产品优先曝光给用户,减少其使用产品的费力度。我们以用户在酒店的TOP点击(编者注:可以简单理解为用户点击排在TOP位置酒店的概率,TOP点击命中率越高,用户体验越好)和转化命中率(CR)作为费力度指标;CR优化问题被建模成二分类问题,离线采取AUC和NDCG作为模型效果评估标准。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分类问题的本质就是要找到一条决策边界函数f(x),来把正样本(比如成单)和负样本(比如没有成单)数据分开。这里,x就是模型特征,好的特征能更好地表征正负样本的差异,让函数学习事半功倍;学习函数f(x)的过程就是建模过程。在本节中,我们分别介绍下我们在特征和建模方面所尝试的工作。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.1 特征"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推荐特征使用的特征可以分为:用户侧特征、物品侧特征以及用户和物品的交互特征。从特征的数值特性上,又可以分成:连续值特征和离散值特征。算法刚开始接入的时候,我们的模型只有连续值特征。后经历了从完全连续值特征,到离散特征比重越来越重的过程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"连续值特征既有静态的(比如用户的性别,酒店价格等),也有基于用户行为的动态的(比如用户点击某个酒店的次数)。"},{"type":"text","marks":[{"type":"strong"}],"text":"连续值特征的优点是具备良好的泛化能力。"},{"type":"text","text":"一个用户对一个商圈的偏好可以泛化到另外一个对这个商圈有相同统计特性的用户身上。"},{"type":"text","marks":[{"type":"strong"}],"text":"连续值特征的缺点是缺乏记忆能力导致区分度不高。"},{"type":"text","text":"比如:在同一个商圈酒店列表中,一个用户点击了(A,B,C,D),而另外一个用户点击了(W,X,Y,Z)。虽然两个用户的行为序列不同,但是统计值特征忘记了用户具体点击了哪些酒店,认为两个用户都是点击了4家酒店。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"离散值特征是细粒度的特征:设备ID,用户ID,用户点击的item ID都可以做特征。这样一来不同的人,有不同的行为就可以在特征上有很好的区分度,因此离散值特征是模型可以把千人千面做得更进一步的基础。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"离散特征的优点就是记忆力强区分度高。"},{"type":"text","text":"我们还可以让离散特征通过特征组合的方式,挖掘用户对于酒店更深层次的兴趣偏好:比如点击item A的人也喜欢item B,我们可以直接基于(A,B)生成一个组合离散特征,来学习A和B的协同信息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"离散特征的缺点是泛化能力相对较弱。"},{"type":"text","text":"这是因为特征粒度太细,预测的时候命中率会偏低。同时在模型训练的时候,细粒度的特征相对粗粒度的特征更容易获得权重,让泛化能力强的粗粒度的特征学到的信息更少,进一步的恶化了模型的泛化能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"举一个例子来说:每个样本都有一个唯一的样本id,如果我们以样本id为特征,那么我们模型训练的时候,样本id特征可以完美的拟合label;但实际测试中,会发现因为模型严重过拟合而效果非常差。这是设计离散特征需要考虑的特征记忆性和泛化性的tradeoff,也就是特征尽可能细和特征在测试数据中命中率尽可能高的tradeoff。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们结合业务场景,探索出技术方案:可以让离散值特征在线上也有良好的泛化能力,从而最终让模型兼具较强的记忆能力和泛化能力。在工程方面:因为组合特征的存在,我们特征空间会很大(e.g. 现在酒店推荐场景特征可以轻松到亿级别),对模型训练和在线serving工程提出新的挑战。这也是我们在推进大规模离散DNN训练框架要解决的关键问题。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.2 模型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们模型经历了从Logistic Regression (LR),Gradient BoostingDecision Tree(GBDT)到Deep Neural Network(DNN)的迭代过程。在业务开始阶段,数据量和特征量都比较少,通常会采用LR模型。随着算法的迭代,数据量和特征规模越来越多的时候,基于XGBOOST或者LightGBM构建GBDT模型是业务成长期快速拿到收益的好的选择。当数据量越来越大的时候,需要基于DNN的框架来把个性化模型做的更细。下面对三种模型的特点做一些简单的介绍。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/bb\/bbccc8170a219c494314fbc16bb3c6c8.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"Figure2:模型决策边界"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在Figure 2中,展示了三个模型的决策边界:"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"LR"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在LR里,决策边界函数是线性的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模型的优点:可以通过模型的权重大小,解释特征的重要性;同时LR支持增量更新;在引入大规模离散特征的情况下,业界在LR时代的经典做法是对LR加L1正则并通过OWLQN或者Coordinate Descent的方式进行优化,也可以通过FTRL算法让模型稀疏避免过拟合。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模型的缺点:线性决策边界这个假设太强,会让模型的精度受到限制;另外,模型的可扩展性程度低。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"GBDT"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在GBDT中,决策边界是非线性的;模型通过将样本空间分而治之的方式,来提高模型精度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模型的优点:树模型可以计算每个特征的重要性程度,来获得一些可解释性;同时模型比LR有更高的精度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模型的缺点:不支持大规模的离散特征,不支持增量更新;模型可扩展性程度低。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们在酒店推荐场景中,尝试了pointwise loss和pairwise loss,每次尝试都获得了不少的提升。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"DNN"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在DNN中,决策边界是高度非线性的。我们知道计算机通过与或非这种简单的逻辑,可以表达各种复杂的对象:音频,视频,网页等。而DNN每一层网络比与或非更加复杂,DNN通过多层神经元叠加,成为一个万能函数逼近器。在理想情况下,只要有足够的数据量,不论我们实际的决策边界如何复杂,我们都可以通过DNN来表达。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同时DNN,支持增量更新,支持根据业务场景进行灵活定制各种网络结构,支持大规模离散DNN,在离散模型中学习出来的Embedding向量还可以用在向量相似召回里面。正因为有这么多的好处,DNN正在成为业界推荐算法的标配。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这个模型的缺点是:特征经过不同层交叉,交互耦合关系过于复杂,而导致可解释性不好;工程复杂度在我们用不同结构的时候所有不同。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面的表总结了三个模型的特点:"}]},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.