详解微软大规模稀疏模型MEB:参数高达1350亿,可显著提升搜索相关性

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最近,像GPT-3这样基于Transformer的深度学习模型在机器学习领域受到了很多关注。这些模型可以很好地理解语义关系,"},{"type":"link","attrs":{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/bing-delivers-its-largest-improvement-in-search-experience-using-azure-gpus\/","title":"","type":null},"content":[{"type":"text","text":"帮助微软必应搜索引擎大幅提升了体验"}]},{"type":"text","text":",并在SuperGLUE学术基准测试上"},{"type":"link","attrs":{"href":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/microsoft-deberta-surpasses-human-performance-on-the-superglue-benchmark\/","title":"","type":null},"content":[{"type":"text","text":"超越了人类水平"}]},{"type":"text","text":"。但是,这些模型可能无法捕获查询和文档术语之间更细微的、超出单纯语义的关系。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/45\/50\/45d1d06eebfb4174bb7e3b7d1bf64550.jpg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在这篇博文中我们将介绍“Make Every feature Binary”(让每个特征都成为一个二进制特征,MEB),这是一种大规模稀疏模型,是我们生产Transformer模型的一种补充,可以提升微软客户使用"},{"type":"link","attrs":{"href":"https:\/\/www.microsoft.com\/en-US\/ai\/ai-at-scale","title":"","type":null},"content":[{"type":"text","text":"大规模AI"}]},{"type":"text","text":"时的搜索相关性。为了让搜索更加准确和动态,MEB更好地利用了大数据的力量,并允许输入特征空间拥有超过2000亿个二进制特征,这些特征反映了搜索查询和文档之间的微妙关系。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"为什么“Make Every feature Binary”可以改进搜索结果?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"之所以MEB可以有效改善基于Transformer的深度学习模型的搜索相关性,一个原因是它可以将单个事实映射到特征,从而让MEB能够更细致地理解一个个事实。例如,许多深度神经网络(DNN)语言模型在填写下面这句话的空白时可能会过度泛化:“(blank) can fly”。由于大多数DNN训练案例的结果是“birds can fly”,因此DNN语言模型可能只会用“birds”这个词来填补空白。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MEB会将每个事实分配给一个特征,从而避免这种情况,因此它可以通过分配权重来区分不同动物的飞行能力,比如企鹅和海雀。它可以针对区分鸟或任何实体、物体的每一个特征执行这种操作。MEB与Transformer模型搭配使用,可以将分类级别提升到全新的水平。它对上题给出的答案会是“鸟类可以飞,但鸵鸟、企鹅和其他这些鸟类除外(birds can fly, except ostriches, penguins, and these other birds)。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"随着规模的增加,还有一个元素可以用来提升数据的使用效率。必应中的网页结果排名是一个机器学习问题,它通过学习大量用户数据来改进结果。传统上一种利用点击数据的方法是为每个曝光的查询\/文档对提取数千个手工数字特征,并训练一个梯度提升决策树(GBDT)模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然而,由于特征表示和模型容量有限,即使是最先进的GBDT训练器"},{"type":"link","attrs":{"href":"https:\/\/github.com\/Microsoft\/LightGBM","title":"","type":null},"content":[{"type":"text","text":"LightGBM"}]},{"type":"text","text":"在数亿行数据后也会收敛。此外,这些手工制作的数字特征本质上往往非常粗糙。例如,它们可以捕获查询中给定位置的术语在文档中出现的次数,但关于给定术语具体含义的信息在这一表示中丢失了。此外,这种方法中的特征并不能一直准确地说明搜索查询中的词序等内容。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为了释放海量数据的力量,并找到能够更好地反映查询和文档之间关系的特征表示,MEB接受了来自必应搜索积累三年,超过5000亿个查询\/文档对的训练。输入特征空间有超过2000亿个二进制特征。对于"},{"type":"link","attrs":{"href":"http:\/\/www.eecs.tufts.edu\/~dsculley\/papers\/ad-click-prediction.pdf","title":"","type":null},"content":[{"type":"text","text":"FTRL"}]},{"type":"text","text":",最新版本是具有90亿个特征和超过1350亿个参数的稀疏神经网络模型。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"使用微软提供的最大通用模型发现隐藏的意图"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MEB业已投入生产环境,所有区域和语言的必应搜索100%都用上了它。它是我们在微软提供的最大的通用模型。它展示了一种出色的能力,可以记住这些二进制特征所代表的事实,同时以一种持续的方式从大量数据中可靠地学习。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们凭经验发现,对如此庞大规模的数据进行训练是大型稀疏神经网络的独特能力。将相同的必应日志输入一个LightGBM模型,并使用传统数字特征(例如BM25和其他类型的查询和文档匹配特征)进行训练时,使用的数据量超过一个月后模型质量就不再提高了。这表明这个模型的容量不足以从更多数据中受益。相比之下,MEB是在积累三年的数据上训练的,我们发现为它添加更多数据后它还能继续学习,这表明模型容量会随着数据增加而增长。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"与基于Transformer的深度学习模型相比,MEB模型还展示了有趣的超越语义关系的学习能力。在查看MEB学习的主要特征时,我们发现它可以学习到查询和文档之间的隐藏意图。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/0a\/47\/0a55e1a26b53e83db8dd3663e470ba47.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"表1:MEB模型学习的示例"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如,MEB了解到“Hotmail”与“Microsoft Outlook”两个词密切相关,尽管它们在语义上并不接近。MEB发现了这些词之间微妙的关系:Hotmail是一种免费的基于Web的电子邮件服务,由微软提供,后来更名为Microsoft Outlook。类似地,它了解到“Fox31”和“KDVR”之间有很强的联系,因为KDVR是位于科罗拉多州丹佛市的电视频道的呼号,该频道的运营品牌叫Fox31。同样,这两个短语之间并没有明显的语义联系。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"更有趣的是,MEB可以识别单词或短语之间的负面关系,找出那些用户不希望在查询中看到的内容。例如,搜索“baseball”的用户通常不会点击谈论“hockey”的页面,即使它们都是流行的运动。类似地,用户搜索“瑜伽”的时候不会去点击包含“歌舞”的文档。理解这些负面关系有助于搜索引擎忽略不相关的搜索结果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MEB学到的这些关系与基于Transformer的DNN模型学到的关系有很好的互补性。搜索相关性的提升对用户体验的改善是非常明显的。在我们的生产级Transformer模型上引入MEB带来了以下收益:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"头部"},{"type":"text","marks":[{"type":"strong"},{"type":"strong"},{"type":"strong"}],"text":"搜索结果的点击率"},{"type":"text","marks":[{"type":"strong"}],"text":"("},{"type":"text","marks":[{"type":"strong"},{"type":"strong"},{"type":"strong"}],"text":"CTR"},{"type":"text","marks":[{"type":"strong"}],"text":")增加了近2%"},{"type":"text","text":"。用户无需向下滚动页面即可找到相关结果。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"手动查询重构"},{"type":"text","marks":[{"type":"strong"},{"type":"strong"},{"type":"strong"}],"text":"行为"},{"type":"text","marks":[{"type":"strong"}],"text":"减少了1%以上"},{"type":"text","text":"。用户需要手动重新制定查询内容,意味着他们不喜欢他们在原始查询中找到的结果。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"分页点击量减少了1.5%以上"},{"type":"text","text":"。用户需要点击“下一页”按钮,意味着他们没有在第一页找到他们想要的东西。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"MEB如何训练数据并大规模提供特征"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"模型结构"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如图1所示,MEB模型由一个二进制特征输入层、一个特征嵌入层、一个池化层和两个密集层组成。输入层包含90亿个特征,由49个特征组生成,每个二进制特征编码为一个15维嵌入向量。在对每组sum-pooling和concatenation之后,向量通过两个密集层产生一个点击概率估计。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/32\/f6\/32c0fd2149f6261a528f1c2d6f20bdf6.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"图1:MEB是一个稀疏神经网络模型,由一个接受二进制特征的输入层、一个将每个二进制特征转换为15维向量的特征嵌入层、一个sum-pooling层应用于全部49个特征组并通过concatenation以产生一个735维的向量,然后通过两个密集层来产生一个点击概率。此图中展示的特征是从示例查询“Microsoft Windows”和文档"},{"type":"link","attrs":{"href":"https:\/\/www.microsoft.com\/en-us\/windows","title":"","type":null},"content":[{"type":"text","text":"https:\/\/www.microsoft.com\/en-us\/windows"}],"marks":[{"type":"size","attrs":{"size":10}}]},{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"生成的,如图2中所述。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"训练数据和统一特征为二进制"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MEB使用了来自必应的三年搜索日志作为训练数据。对于每次必应搜索曝光(impression),我们使用启发式方法来确定用户是否对他们单击的文档感到满意。我们将这些“感到满意”的文档标记为正样本。同一曝光中的其他文档被标记为负样本。对于每个查询和文档对,我们从查询文本、文档URL、标题和正文文本中提取二进制特征。这些特征被输入到一个稀疏神经网络模型中,以最小化模型预测的点击概率和实际点击标签之间的交叉熵损失。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"特征设计和大规模训练是MEB成功的关键所在。MEB特征是在查询和文档之间非常具体的术语级别或N-gram级别的关系上定义的,传统的数字特征无法捕获这些信息,因为后者只关心查询和文档之间的匹配计数。(N-grams是N项的序列。)为了充分挖掘这个大规模训练平台的力量,所有的特征都被设计为二进制特征,可以很容易地用一致的方式覆盖人工制作的数字特征和直接从原始文本中提取的特征。这样做可以让MEB在一条路径上进行端到端的优化。当前的生产模型使用三种主要类型的特征,如下所述。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"查询和文档N-gram对特征"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"N-gram对特征是基于必应搜索日志中查询和文档字段的N-gram组合生成的。如图2所示,来自查询文本的N-gram将与来自文档URL、标题和正文文本的N-gram结合形成N-gram对特征。更长的N-gram(对于更高的N值)能够捕捉更丰富和更细微的概念。然而,随着N的增加,处理它们的成本呈指数级增长。在我们的生产模型中,N设置为1和2(分别为unigrams和bigrams)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们还通过组合整个查询文本和文档字段来生成特征。例如,特征“Query_Title_Microsoft Windows_Explore Windows 10 OS Computer Apps More Microsoft”是从query=“Microsoft Windows”和document title=“Explore Windows 10 OS Computer Apps More Microsoft”生成的特征。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"桶化数字特征的单热编码"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"数字特征首先会分桶,然后通过应用单热(one-hot)编码将其转换为二进制格式。在图2所示的示例中,数字特征“QueryLength”可以采用1到MaxQueryLength之间的任何整数值。我们为此特征定义了MaxQueryLength存储桶,以便“Microsoft Windows”这个查询具有等于1的二进制特征QueryLength_2。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"分类特征的单热编码"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分类(Categorical)特征可以通过单热编码,以一种直接的方式转换为二进制特征。例如,UrlString是一个分类特征,每个唯一的URL字符串文本都是一个不同的类别。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/48\/44\/4897771f2aec1f562abe485fa23e3544.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"图2:上面是MEB特征外观的一个示例。左侧展示了一个示例查询文档对,其中查询文本、文档标题、URL和片段作为特征提取的输入。右侧展示了MEB生成的一些典型特征。例如,“Microsoft Windows”这个查询和文档标题“Explore Windows 10 OS, Computers, Apps, & More | Microsoft”生成了一个Query x Title特征“Query:Microsoft Windows_Title:Explore Windows 10 OS Computer Apps More Microsoft”。由于“Microsoft Windows”这个查询包含两个术语,因此生成了二进制特征“QueryLength_2”。查询词和文档标题词的每个组合都可以生成一个Query unigram x Title unigram特征的列表,例如“QTerm:Microsoft_TitleTerm:Explore”等。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"持续训练支持万亿查询\/文档对,每天刷新"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为了在如此巨大特征空间上完成训练,我们利用了由"},{"type":"link","attrs":{"href":"https:\/\/about.ads.microsoft.com\/en-us\/h\/a\/microsoft-advertising?ef_id=a34847914f4c184c15292760a4561a7d:G:s&OCID=AID2200059_SEM_a34847914f4c184c15292760a4561a7d:G:s&s_cid=US-ACQ-PPC-src_BNG-sub_prosp-cat_Brand_mt_b&msclkid=a34847914f4c184c15292760a4561a7d","title":"","type":null},"content":[{"type":"text","text":"微软广告"}]},{"type":"text","text":"团队构建的内部大型培训平台Woodblock。它是一种用于训练大型稀疏模型的分布式、大规模、高性能解决方案。Woodblock建立在TensorFlow之上,填补了通用深度学习框架与对数十亿稀疏特征的工业需求之间的空白。通过对I\/O和数据处理的深度优化,它可以使用CPU和GPU集群在数小时内训练数千亿个特征。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"就算用上了Woodblock管道,用包含近一万亿个查询\/文档对的必应搜索三年累积日志训练MEB也很难一蹴而就。相反,我们采用了一种持续训练方法,模型每次都会在之前几个月的数据基础上再加入新一个月的数据继续训练。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"更重要的是,即使在必应中实现后,模型也会每天使用最新的每日点击数据刷新训练数据集,如图3所示。为了避免过时特征的负面影响,一个自动过期策略会检查每个特征的时间戳,并过滤掉过去500天内未出现的特征。经过持续的训练,模型的日常更新部署得以完全自动化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/6d\/0d\/6d719b104ebd6bc710597287af3d250d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"图3:上面是一个流程图,说明了MEB是如何每天刷新的。生产MEB模型每天都使用最新的单日必应搜索日志数据进行持续训练。在新模型部署并在线提供服务之前,会从模型中删除过去500天内未出现的陈旧特征。这可以保持特征的新鲜度并有效利用模型容量。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"使用必应ObjectStore平台服务超大模型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MEB稀疏神经网络模型加载到内存时占用720GB的空间。在流量高峰期,系统需要维持每秒3500万次特征查找,因此无法从一台机器上服务MEB模型。相比之下,我们利用了必应的自研"},{"type":"link","attrs":{"href":"https:\/\/nam06.safelinks.protection.outlook.com\/?url=https%3A%2F%2Fwww.microsoft.com%2Fen-us%2Fresearch%2Fblog%2Fevolution-bings-objectstore%2F&data=04%7C01%7CJunyan.Chen%40microsoft.com%7C1df927f6a1524ee57fec08d941123d34%7C72f988bf86f141af91ab2d7cd011db47%7C0%7C0%7C637612368881862856%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=27jmcZg77iOd5O5Na0HCI1XZZ%2B2gSyc%2FDogfPaQIZI0%3D&reserved=0","title":"","type":null},"content":[{"type":"text","text":"ObjectStore"}]},{"type":"text","text":"服务来托管和服务MEB模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ObjectStore是一个多租户、分布式键值存储,支持数据和计算托管。MEB的特征嵌入层在ObjectStore中实现为一个表查找操作,每个二进制特征哈希用作检索其在训练时产生的嵌入的键。池化层和密集层部分的计算量更大,在一个承载用户定义函数的ObjectStore Coproc(一个接近数据的计算单元)中执行。MEB将计算和数据服务分离到不同的分片中。每个计算分片占用一部分用于神经网络处理的生产流量,每个数据分片托管一部分模型数据,如图4所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/5e\/65\/5e189f425d082a9f84c6a269e1789265.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"图4:计算分片中的ObjectStore Coproc与数据分片通信,以检索特征嵌入并运行神经网络。数据分片存储特征嵌入表,并支持来自每个Coproc调用的查找请求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由于在ObjectStore上运行的大多数负载都是专门做存储查找的,因此将MEB计算分片和内存中数据分片放在一起,可以让我们最大限度地利用运行在多租户集群中的ObjectStore的计算和内存资源。由于分片分布在多台机器上,我们还能够精细控制每台机器上的负载,以便在MEB中实现个位数毫秒的服务延迟。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"支持更快的搜索,更好地理解内容"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们发现像MEB这样非常大的稀疏神经网络可以学习到基于Transformer的神经网络无法理解的细微关系,从而成为后者的有效补充。这种对搜索语言的更深入理解能力为整个搜索生态系统带来了一系列显著好处:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由于搜索相关性的提升,必应用户能够更快找到内容并完成任务,减少重构查询内容或翻到第1页之后的需要。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由于MEB可以更好地理解内容,发布商和网站管理员可以获得更多访问其资产的流量,并且他们可以专注于满足客户,而不是花时间寻找有助于提升他们排名的正确关键字。一个具体的例子是产品品牌重塑,MEB模型可以自动学习新旧名称之间的关系,就像它对“Hotmail”和“Microsoft Outlook”所做的那样。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你使用DNN为你的业务提供动力,我们建议你尝试使用大型稀疏神经网络来为这些模型提供补充。如果你有大量的用户交互历史流,并且可以轻松构建简单的二进制特征,我们尤其推荐这样做。如果你沿着这条路走下去,我们建议你应该让模型尽可能接近实时地更新。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MEB只是我们团队创建有影响力的尖端技术以提高规模和效率,进而改进搜索体验的一个例子。如果你对搜索和推荐的大规模建模感兴趣,我们的Core Search & AI团队正在招人!你可以在微软职业"},{"type":"link","attrs":{"href":"https:\/\/careers.microsoft.com\/us\/en\/search-results?keywords=%23semanticsearch%23","title":"","type":null},"content":[{"type":"text","text":"网站"}]},{"type":"text","text":"上找到我们当前的职位空缺。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"作者介绍"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.microsoft.com\/en-us\/research\/people\/junyanch\/","title":"","type":null},"content":[{"type":"text","text":"Junyan Chan"}]},{"type":"text","text":"是微软搜索和人工智能领域的首席应用科学经理。她领导的团队专注于对必应网络搜索中的基本问题进行排名。团队利用了最先进的NLP和机器学习技术来改进Web相关性模型,并为微软用户带来更满意的搜索体验。他们的工作包括了通过超大规模深度学习模型、稀疏神经网络DNN、LightGBM等技术实施数据创新、特征工程和模型改进。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.microsoft.com\/en-us\/research\/people\/fdubut\/","title":"","type":null},"content":[{"type":"text","text":"Frédéric Dubut"}]},{"type":"text","text":"与微软的工程和数据科学团队合作,管理必应有机搜索排名的产品团队。他们的工作涵盖搜索、个性化、实验和机器学习运维。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.microsoft.com\/en-us\/research\/people\/jasol\/","title":"","type":null},"content":[{"type":"text","text":"Jason(Zengzhong)Li"}]},{"type":"text","text":"是微软WebXT平台的合作伙伴组工程经理。他的工作重点是大规模低延迟分布式服务系统,包括k-v存储、倒排索引服务、向量索引和深度学习模型推理。他也对稀疏密集索引和近似最近邻搜索等信息检索算法感兴趣。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.microsoft.com\/en-us\/research\/people\/ranganm\/","title":"","type":null},"content":[{"type":"text","text":"Rangan Majumder"}]},{"type":"text","text":"是微软的搜索和人工智能副总裁。他们的使命是通过减少用户的信息需求与图像、文档和视频中的所有知识之间的摩擦,让世界变得更智能、更高效。他们应用最先进的语言理解、视觉理解和多模态理解来构建更好的搜索、个人助理和生产力应用程序。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文链接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/make-every-feature-binary-a-135b-parameter-sparse-neural-network-for-massively-improved-search-relevance\/","title":"","type":null},"content":[{"type":"text","text":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/make-every-feature-binary-a-135b-parameter-sparse-neural-network-for-massively-improved-search-relevance\/"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章