基於騰訊雲Elasticsearch搭建QQ郵箱全文檢索

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"導語"},{"type":"text","text":" | 隨着用戶郵件數量越來越多,郵件搜索已是郵箱的基本功能。QQ 郵箱於 2008 年推出的自研搜索引擎面臨着存儲機器逐漸老化,存儲機型面臨淘汰的境況。因此,需要搭建一套新的全文檢索服務,遷移存儲數據。本文將介紹 QQ 郵箱全文檢索的架構、實現細節與搜索調優。文章作者:幹勝,騰訊後臺研發工程師。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、重構背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"QQ 郵箱的全文檢索服務於2008年開始提供,使用中文分詞算法和倒排索引結構實現自研搜索引擎。設計有二級索引,熱數據存放於正排索引支持實時檢索,冷數據存放於倒排索引支持分詞搜索。在使用舊全文檢索過程中存在以下問題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"機器老化、磁盤損壞導致丟數據;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務邏輯複雜,代碼龐大晦澀,難以維護;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用定製化kv存儲,已無人維護;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不存儲原文,無法實現原生高亮;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"未索引超大附件名。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"舊的全文檢索在使用中長期存在上述問題,恰逢舊的存儲機器裁撤,藉此機會重構 QQ 郵箱的全文檢索後臺服務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、新全文檢索架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Elasticsearch 是一個分佈式的搜索引擎,支持存儲、搜索和數據分析,有良好的擴展性、穩定性和可維護性,在搜索引擎排名中蟬聯第一。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ES 的底層存儲引擎是 Lucene,ES 在 Lucene 的基礎上提供分佈式集羣的能力以確保可靠性、提供 REST API 以確保可用性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章