基于腾讯云Elasticsearch搭建QQ邮箱全文检索

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"导语"},{"type":"text","text":" | 随着用户邮件数量越来越多,邮件搜索已是邮箱的基本功能。QQ 邮箱于 2008 年推出的自研搜索引擎面临着存储机器逐渐老化,存储机型面临淘汰的境况。因此,需要搭建一套新的全文检索服务,迁移存储数据。本文将介绍 QQ 邮箱全文检索的架构、实现细节与搜索调优。文章作者:干胜,腾讯后台研发工程师。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、重构背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"QQ 邮箱的全文检索服务于2008年开始提供,使用中文分词算法和倒排索引结构实现自研搜索引擎。设计有二级索引,热数据存放于正排索引支持实时检索,冷数据存放于倒排索引支持分词搜索。在使用旧全文检索过程中存在以下问题:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"机器老化、磁盘损坏导致丢数据;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"业务逻辑复杂,代码庞大晦涩,难以维护;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用定制化kv存储,已无人维护;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不存储原文,无法实现原生高亮;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"未索引超大附件名。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"旧的全文检索在使用中长期存在上述问题,恰逢旧的存储机器裁撤,借此机会重构 QQ 邮箱的全文检索后台服务。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、新全文检索架构"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Elasticsearch 是一个分布式的搜索引擎,支持存储、搜索和数据分析,有良好的扩展性、稳定性和可维护性,在搜索引擎排名中蝉联第一。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ES 的底层存储引擎是 Lucene,ES 在 Lucene 的基础上提供分布式集群的能力以确保可靠性、提供 REST API 以确保可用性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章