題記
本文詳細論述了Elasticsearch全文檢索、指定字段檢索實戰技巧,並提供了詳盡的源碼舉例。是不可多得學習&實戰資料。
0、前言
爲了講解不同類型ES檢索,我們將要對包含以下類型的文檔集合進行檢索:
1. title 標題;
2. authors 作者;
3. summary 摘要;
4. release data 發佈日期;
5. number of reviews 評論數。
首先,讓我們藉助 bulk API批量創建新的索引並提交數據。
PUT /bookdb_index
{ "settings": { "number_of_shards": 1 }}
POST /bookdb_index/book/_bulk
{ "index": { "_id": 1 }}
{ "title": "Elasticsearch: The Definitive Guide", "authors": ["clinton gormley", "zachary tong"], "summary" : "A distibuted real-time search and analytics engine", "publish_date" : "2015-02-07", "num_reviews": 20, "publisher": "oreilly" }
{ "index": { "_id": 2 }}
{ "title": "Taming Text: How to Find, Organize, and Manipulate It", "authors": ["grant ingersoll", "thomas morton", "drew farris"], "summary" : "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", "publish_date" : "2013-01-24", "num_reviews": 12, "publisher": "manning" }
{ "index": { "_id": 3 }}
{ "title": "Elasticsearch in Action", "authors": ["radu gheorge", "matthew lee hinman", "roy russo"], "summary" : "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "publish_date" : "2015-12-03", "num_reviews": 18, "publisher": "manning" }
{ "index": { "_id": 4 }}
{ "title": "Solr in Action", "authors": ["trey grainger", "timothy potter"], "summary" : "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date" : "2014-04-05", "num_reviews": 23, "publisher": "manning" }
1、基本匹配檢索( Basic Match Query)
1.1 全文檢索
有兩種方式可以執行全文檢索:
1)使用包含參數的檢索API,參數作爲URL的一部分。
舉例:以下對”guide”執行全文檢索。
GET /bookdb_index/book/_search?q=guide
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.28168046,
"_source": {
"title": "Elasticsearch: The Definitive Guide",
"authors": [
"clinton gormley",
"zachary tong"
],
"summary": "A distibuted real-time search and analytics engine",
"publish_date": "2015-02-07",
"num_reviews": 20,
"publisher": "manning"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 0.24144039,
"_source": {
"title": "Solr in Action",
"authors": [
"trey grainger",
"timothy potter"
],
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"publish_date": "2014-04-05",
"num_reviews": 23,
"publisher": "manning"
}
}
]
2)使用完整的ES DSL,其中Json body作爲請求體。
其執行結果如方式1)結果一致。
{
"query": {
"multi_match" : {
"query" : "guide",
"fields" : ["_all"]
}
}
}
解讀:使用multi_match關鍵字代替match關鍵字,作爲對多個字段運行相同查詢的方便的簡寫方式。 fields屬性指定要查詢的字段,在這種情況下,我們要對文檔中的所有字段進行查詢。
1.2 指定特定字段檢索
這兩個API也允許您指定要搜索的字段。 例如,要在標題字段中搜索帶有“in action”字樣的圖書,
1)URL檢索方式
如下所示:
GET /bookdb_index/book/_search?q=title:in action
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 0.6259885,
"_source": {
"title": "Solr in Action",
"authors": [
"trey grainger",
"timothy potter"
],
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"publish_date": "2014-04-05",
"num_reviews": 23,
"publisher": "manning"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 0.5975345,
"_source": {
"title": "Elasticsearch in Action",
"authors": [
"radu gheorge",
"matthew lee hinman",
"roy russo"
],
"summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
"publish_date": "2015-12-03",
"num_reviews": 18,
"publisher": "manning"
}
}
]
2)DSL檢索方式
然而,full body的DSL爲您提供了創建更復雜查詢的更多靈活性(我們將在後面看到)以及指定您希望的返回結果。 在下面的示例中,我們指定要返回的結果數、偏移量(對分頁有用)、我們要返回的文檔字段以及屬性的高亮顯示。
結果數的表示方式:size;
偏移值的表示方式:from;
指定返回字段 的表示方式 :_source;
高亮顯示 的表示方式 :highliaght。
POST /bookdb_index/book/_search
{
"query": {
"match" : {
"title" : "in action"
}
},
"size": 2,
"from": 0,
"_source": [ "title", "summary", "publish_date" ],
"highlight": {
"fields" : {
"title" : {}
}
}
}
[Results]
"hits": {
"total": 2,
"max_score": 0.9105287,
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 0.9105287,
"_source": {
"summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
"title": "Elasticsearch in Action",
"publish_date": "2015-12-03"
},
"highlight": {
"title": [
"Elasticsearch <em>in</em> <em>Action</em>"
]
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 0.9105287,
"_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"title": "Solr in Action",
"publish_date": "2014-04-05"
},
"highlight": {
"title": [
"Solr <em>in</em> <em>Action</em>"
]
}
}
]
}
注意:對於 multi-word 檢索,匹配查詢允許您指定是否使用‘and’運算符,
而不是使用默認’or’運算符。
您還可以指定minimum_should_match選項來調整返回結果的相關性。
詳細信息可以在Elasticsearch指南中查詢Elasticsearch guide. 獲取。
2、多字段檢索 (Multi-field Search)
如我們已經看到的,要在搜索中查詢多個文檔字段(例如在標題和摘要中搜索相同的查詢字符串),請使用multi_match查詢。
POST /bookdb_index/book/_search
{
"query": {
"multi_match" : {
"query" : "elasticsearch guide",
"fields": ["title", "summary"]
}
}
}
[Results]
"hits": {
"total": 3,
"max_score": 0.9448582,
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.9448582,
"_source": {
"title": "Elasticsearch: The Definitive Guide",
"authors": [
"clinton gormley",
"zachary tong"
],
"summary": "A distibuted real-time search and analytics engine",
"publish_date": "2015-02-07",
"num_reviews": 20,
"publisher": "manning"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 0.17312013,
"_source": {
"title": "Elasticsearch in Action",
"authors": [
"radu gheorge",
"matthew lee hinman",
"roy russo"
],
"summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
"publish_date": "2015-12-03",
"num_reviews": 18,
"publisher": "manning"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 0.14965448,
"_source": {
"title": "Solr in Action",
"authors": [
"trey grainger",
"timothy potter"
],
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"publish_date": "2014-04-05",
"num_reviews": 23,
"publisher": "manning"
}
}
]
}
注意:以上結果3匹配的原因是guide在summary存在。
3、 Boosting提升某字段得分的檢索( Boosting)
由於我們正在多個字段進行搜索,我們可能希望提高某一字段的得分。 在下面的例子中,我們將“摘要”字段的得分提高了3倍,以增加“摘要”字段的重要性,從而提高文檔 4 的相關性。
POST /bookdb_index/book/_search
{
"query": {
"multi_match" : {
"query" : "elasticsearch guide",
"fields": ["title", "summary^3"]
}
},
"_source": ["title", "summary", "publish_date"]
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.31495273,
"_source": {
"summary": "A distibuted real-time search and analytics engine",
"title": "Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 0.14965448,
"_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"title": "Solr in Action",
"publish_date": "2014-04-05"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 0.13094766,
"_source": {
"summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
"title": "Elasticsearch in Action",
"publish_date": "2015-12-03"
}
}
]
注意:Boosting不僅意味着計算得分乘法以增加因子。 實際的提升得分值是通過歸一化和一些內部優化。參考 Elasticsearch guide.查看更多。
4、Bool檢索( Bool Query)
可以使用AND / OR / NOT運算符來微調我們的搜索查詢,以提供更相關或指定的搜索結果。
在搜索API中是通過bool查詢來實現的。
bool查詢接受”must”參數(等效於AND),一個must_not參數(相當於NOT)或者一個should參數(等同於OR)。
例如,如果我想在標題中搜索一本名爲“Elasticsearch”或“Solr”的書,AND由“clinton gormley”創作,但NOT由“radu gheorge”創作:
POST /bookdb_index/book/_search
{
"query": {
"bool": {
"must": {
"bool" : { "should": [
{ "match": { "title": "Elasticsearch" }},
{ "match": { "title": "Solr" }} ] }
},
"must": { "match": { "authors": "clinton gormely" }},
"must_not": { "match": {"authors": "radu gheorge" }}
}
}
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.3672021,
"_source": {
"title": "Elasticsearch: The Definitive Guide",
"authors": [
"clinton gormley",
"zachary tong"
],
"summary": "A distibuted real-time search and analytics engine",
"publish_date": "2015-02-07",
"num_reviews": 20,
"publisher": "oreilly"
}
}
]
注意:您可以看到,bool查詢可以包含任何其他查詢類型,包括其他布爾查詢,以創建任意複雜或深度嵌套的查詢。
5、 Fuzzy 模糊檢索( Fuzzy Queries)
在 Match檢索 和多匹配檢索中可以啓用模糊匹配來捕捉拼寫錯誤。 基於與原始詞的Levenshtein距離來指定模糊度。
POST /bookdb_index/book/_search
{
"query": {
"multi_match" : {
"query" : "comprihensiv guide",
"fields": ["title", "summary"],
"fuzziness": "AUTO"
}
},
"_source": ["title", "summary", "publish_date"],
"size": 1
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 0.5961596,
"_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"title": "Solr in Action",
"publish_date": "2014-04-05"
}
}
]
“AUTO”的模糊值相當於當字段長度大於5時指定值2。但是,設置80%的拼寫錯誤的編輯距離爲1,將模糊度設置爲1可能會提高整體搜索性能。 有關更多信息, Typos and Misspellingsch 。
6、 Wildcard Query 通配符檢索
通配符查詢允許您指定匹配的模式,而不是整個詞組(term)檢索。
- ? 匹配任何字符;
- *匹配零個或多個字符。
舉例,要查找具有以“t”字母開頭的作者的所有記錄,如下所示:
POST /bookdb_index/book/_search
{
"query": {
"wildcard" : {
"authors" : "t*"
}
},
"_source": ["title", "authors"],
"highlight": {
"fields" : {
"authors" : {}
}
}
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 1,
"_source": {
"title": "Elasticsearch: The Definitive Guide",
"authors": [
"clinton gormley",
"zachary tong"
]
},
"highlight": {
"authors": [
"zachary <em>tong</em>"
]
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "2",
"_score": 1,
"_source": {
"title": "Taming Text: How to Find, Organize, and Manipulate It",
"authors": [
"grant ingersoll",
"thomas morton",
"drew farris"
]
},
"highlight": {
"authors": [
"<em>thomas</em> morton"
]
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 1,
"_source": {
"title": "Solr in Action",
"authors": [
"trey grainger",
"timothy potter"
]
},
"highlight": {
"authors": [
"<em>trey</em> grainger",
"<em>timothy</em> potter"
]
}
}
]
7、正則表達式檢索( Regexp Query)
正則表達式能指定比通配符檢索更復雜的檢索模式。
舉例如下:
POST /bookdb_index/book/_search
{
"query": {
"regexp" : {
"authors" : "t[a-z]*y"
}
},
"_source": ["title", "authors"],
"highlight": {
"fields" : {
"authors" : {}
}
}
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 1,
"_source": {
"title": "Solr in Action",
"authors": [
"trey grainger",
"timothy potter"
]
},
"highlight": {
"authors": [
"<em>trey</em> grainger",
"<em>timothy</em> potter"
]
}
}
]
8、匹配短語檢索( Match Phrase Query)
匹配短語查詢要求查詢字符串中的所有詞都存在於文檔中,按照查詢字符串中指定的順序並且彼此靠近。
默認情況下,這些詞必須完全相鄰,但您可以指定偏離值(slop value),該值指示在仍然考慮文檔匹配的情況下詞與詞之間的偏離值。
POST /bookdb_index/book/_search
{
"query": {
"multi_match" : {
"query": "search engine",
"fields": ["title", "summary"],
"type": "phrase",
"slop": 3
}
},
"_source": [ "title", "summary", "publish_date" ]
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 0.22327082,
"_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"title": "Solr in Action",
"publish_date": "2014-04-05"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.16113183,
"_source": {
"summary": "A distibuted real-time search and analytics engine",
"title": "Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
}
]
注意:在上面的示例中,對於非短語類型查詢,文檔_id 1通常具有較高的分數,並且顯示在文檔_id 4之前,因爲其字段長度較短。
然而,作爲一個短語查詢,詞與詞之間的接近度被考慮在內,所以文檔_id 4分數更好。
9、匹配詞組前綴檢索
匹配詞組前綴查詢在查詢時提供搜索即時類型或“相對簡單”的自動完成版本,而無需以任何方式準備數據。
像match_phrase查詢一樣,它接受一個斜率參數,使得單詞的順序和相對位置沒有那麼“嚴格”。 它還接受max_expansions參數來限制匹配的條件數以減少資源強度。
POST /bookdb_index/book/_search
{
"query": {
"match_phrase_prefix" : {
"summary": {
"query": "search en",
"slop": 3,
"max_expansions": 10
}
}
},
"_source": [ "title", "summary", "publish_date" ]
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 0.5161346,
"_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"title": "Solr in Action",
"publish_date": "2014-04-05"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.37248808,
"_source": {
"summary": "A distibuted real-time search and analytics engine",
"title": "Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
}
]
注意:查詢時間搜索類型具有性能成本。
一個更好的解決方案是將時間作爲索引類型。
更多相關API查詢 Completion Suggester API 或者 Edge-Ngram filters 。
10、字符串檢索( Query String)
query_string查詢提供了以簡明的簡寫語法執行多匹配查詢 multi_match queries ,布爾查詢 bool queries ,提升得分 boosting ,模糊匹配 fuzzy matching ,通配符 wildcards ,正則表達式 regexp 和範圍查詢 range queries 的方式。
在下面的例子中,我們對“ search algorithm ”一詞執行模糊搜索,其中一本作者是“ grant ingersoll ”或“tom morton”。 我們搜索所有字段,但將提升應用於文檔2的摘要字段。
POST /bookdb_index/book/_search
{
"query": {
"query_string" : {
"query": "(saerch~1 algorithm~1) AND (grant ingersoll) OR (tom morton)",
"fields": ["_all", "summary^2"]
}
},
"_source": [ "title", "summary", "authors" ],
"highlight": {
"fields" : {
"summary" : {}
}
}
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "2",
"_score": 0.14558059,
"_source": {
"summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
"title": "Taming Text: How to Find, Organize, and Manipulate It",
"authors": [
"grant ingersoll",
"thomas morton",
"drew farris"
]
},
"highlight": {
"summary": [
"organize text using approaches such as full-text <em>search</em>, proper name recognition, clustering, tagging, information extraction, and summarization"
]
}
}
]
11、簡化的字符串檢索 (Simple Query String)
simple_query_string查詢是query_string查詢的一個版本,更適合用於暴露給用戶的單個搜索框,
因爲 它分別用+ / | / - 替換了AND / OR / NOT的使用,並放棄查詢的無效部分,而不是在用戶出錯時拋出異常。
POST /bookdb_index/book/_search
{
"query": {
"simple_query_string" : {
"query": "(saerch~1 algorithm~1) + (grant ingersoll) | (tom morton)",
"fields": ["_all", "summary^2"]
}
},
"_source": [ "title", "summary", "authors" ],
"highlight": {
"fields" : {
"summary" : {}
}
}
}
——————————–我是分割線——————————————————-
12、Term/Terms檢索(指定字段檢索)
上面1-11小節的例子是全文搜索的例子。 有時我們對結構化搜索更感興趣,我們希望在其中找到完全匹配並返回結果。
在下面的例子中,我們搜索Manning Publications發佈的索引中的所有圖書(藉助 term和terms查詢 )。
POST /bookdb_index/book/_search
{
"query": {
"term" : {
"publisher": "manning"
}
},
"_source" : ["title","publish_date","publisher"]
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "2",
"_score": 1.2231436,
"_source": {
"publisher": "manning",
"title": "Taming Text: How to Find, Organize, and Manipulate It",
"publish_date": "2013-01-24"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 1.2231436,
"_source": {
"publisher": "manning",
"title": "Elasticsearch in Action",
"publish_date": "2015-12-03"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 1.2231436,
"_source": {
"publisher": "manning",
"title": "Solr in Action",
"publish_date": "2014-04-05"
}
}
]
Multiple terms可指定多個關鍵詞進行檢索。
{
"query": {
"terms" : {
"publisher": ["oreilly", "packt"]
}
}
}
13、Term排序檢索-(Term Query - Sorted)
Term查詢和其他查詢一樣,輕鬆的實現排序。多級排序也是允許的。
POST /bookdb_index/book/_search
{
"query": {
"term" : {
"publisher": "manning"
}
},
"_source" : ["title","publish_date","publisher"],
"sort": [
{ "publish_date": {"order":"desc"}},
{ "title": { "order": "desc" }}
]
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": null,
"_source": {
"publisher": "manning",
"title": "Elasticsearch in Action",
"publish_date": "2015-12-03"
},
"sort": [
1449100800000,
"in"
]
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": null,
"_source": {
"publisher": "manning",
"title": "Solr in Action",
"publish_date": "2014-04-05"
},
"sort": [
1396656000000,
"solr"
]
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "2",
"_score": null,
"_source": {
"publisher": "manning",
"title": "Taming Text: How to Find, Organize, and Manipulate It",
"publish_date": "2013-01-24"
},
"sort": [
1358985600000,
"to"
]
}
]
14、範圍檢索(Range query)
另一個結構化檢索的例子是範圍檢索。下面的舉例中,我們檢索了2015年發佈的書籍。
POST /bookdb_index/book/_search
{
"query": {
"range" : {
"publish_date": {
"gte": "2015-01-01",
"lte": "2015-12-31"
}
}
},
"_source" : ["title","publish_date","publisher"]
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 1,
"_source": {
"publisher": "oreilly",
"title": "Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 1,
"_source": {
"publisher": "manning",
"title": "Elasticsearch in Action",
"publish_date": "2015-12-03"
}
}
]
注意:範圍查詢適用於日期,數字和字符串類型字段。
15、過濾檢索(Filtered query)5.0版本已不再存在,不必關注。
過濾的查詢允許您過濾查詢的結果。 如下的例子,我們在標題或摘要中查詢名爲“Elasticsearch”的圖書,但是我們希望將結果過濾到只有20個或更多評論的結果。
POST /bookdb_index/book/_search
{
"query": {
"filtered": {
"query" : {
"multi_match": {
"query": "elasticsearch",
"fields": ["title","summary"]
}
},
"filter": {
"range" : {
"num_reviews": {
"gte": 20
}
}
}
}
},
"_source" : ["title","summary","publisher", "num_reviews"]
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.5955761,
"_source": {
"summary": "A distibuted real-time search and analytics engine",
"publisher": "oreilly",
"num_reviews": 20,
"title": "Elasticsearch: The Definitive Guide"
}
}
]
注意:已過濾的查詢不要求存在要過濾的查詢。 如果沒有指定查詢,則運行match_all查詢,基本上返回索引中的所有文檔,然後對其進行過濾。
實際上,首先運行過濾器,減少需要查詢的表面積。 此外,過濾器在第一次使用後被緩存,這使得它非常有效。
更新:已篩選的查詢已推出的Elasticsearch 5.X版本中移除,有利於布爾查詢。 這是與上面重寫的使用bool查詢相同的示例。 返回的結果是完全一樣的。
POST /bookdb_index/book/_search
{
"query": {
"bool": {
"must" : {
"multi_match": {
"query": "elasticsearch",
"fields": ["title","summary"]
}
},
"filter": {
"range" : {
"num_reviews": {
"gte": 20
}
}
}
}
},
"_source" : ["title","summary","publisher", "num_reviews"]
}
16、多個過濾器檢索(Multiple Filters)5.x不再支持,無需關注。
多個過濾器可以通過使用布爾過濾器進行組合。
在下一個示例中,過濾器確定返回的結果必須至少包含20個評論,不得在2015年之前發佈,並且應該由oreilly發佈。
POST /bookdb_index/book/_search
{
"query": {
"filtered": {
"query" : {
"multi_match": {
"query": "elasticsearch",
"fields": ["title","summary"]
}
},
"filter": {
"bool": {
"must": {
"range" : { "num_reviews": { "gte": 20 } }
},
"must_not": {
"range" : { "publish_date": { "lte": "2014-12-31" } }
},
"should": {
"term": { "publisher": "oreilly" }
}
}
}
}
},
"_source" : ["title","summary","publisher", "num_reviews", "publish_date"]
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.5955761,
"_source": {
"summary": "A distibuted real-time search and analytics engine",
"publisher": "oreilly",
"num_reviews": 20,
"title": "Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
}
]
17、 Function 得分:Field值因子( Function Score: Field Value Factor)
可能有一種情況,您想要將文檔中特定字段的值納入相關性分數的計算。 這在您希望基於其受歡迎程度提升文檔的相關性的情況下是有代表性的場景。
在我們的例子中,我們希望增加更受歡迎的書籍(按評論數量判斷)。 這可以使用field_value_factor函數得分。
POST /bookdb_index/book/_search
{
"query": {
"function_score": {
"query": {
"multi_match" : {
"query" : "search engine",
"fields": ["title", "summary"]
}
},
"field_value_factor": {
"field" : "num_reviews",
"modifier": "log1p",
"factor" : 2
}
}
},
"_source": ["title", "summary", "publish_date", "num_reviews"]
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.44831306,
"_source": {
"summary": "A distibuted real-time search and analytics engine",
"num_reviews": 20,
"title": "Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 0.3718407,
"_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"num_reviews": 23,
"title": "Solr in Action",
"publish_date": "2014-04-05"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 0.046479136,
"_source": {
"summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
"num_reviews": 18,
"title": "Elasticsearch in Action",
"publish_date": "2015-12-03"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "2",
"_score": 0.041432835,
"_source": {
"summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
"num_reviews": 12,
"title": "Taming Text: How to Find, Organize, and Manipulate It",
"publish_date": "2013-01-24"
}
}
]
注1:我們可以運行一個常規的multi_match查詢,並按num_reviews字段排序,但是我們失去了相關性得分的好處。
注2:有許多附加參數可以調整對原始相關性分數
(如“ modifier ”,“ factor ”,“boost_mode”等)的增強效果的程度。
詳見 Elasticsearch guide.
18、 Function 得分:衰減函數( Function Score: Decay Functions )
假設,我們不是想通過一個字段的值逐漸增加得分,以獲取理想的結果。 舉例:價格範圍、數字字段範圍、日期範圍。 在我們的例子中,我們正在搜索2014年6月左右出版的“ search engines ”的書籍。
POST /bookdb_index/book/_search
{
"query": {
"function_score": {
"query": {
"multi_match" : {
"query" : "search engine",
"fields": ["title", "summary"]
}
},
"functions": [
{
"exp": {
"publish_date" : {
"origin": "2014-06-15",
"offset": "7d",
"scale" : "30d"
}
}
}
],
"boost_mode" : "replace"
}
},
"_source": ["title", "summary", "publish_date", "num_reviews"]
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 0.27420625,
"_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"num_reviews": 23,
"title": "Solr in Action",
"publish_date": "2014-04-05"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.005920768,
"_source": {
"summary": "A distibuted real-time search and analytics engine",
"num_reviews": 20,
"title": "Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "2",
"_score": 0.000011564,
"_source": {
"summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
"num_reviews": 12,
"title": "Taming Text: How to Find, Organize, and Manipulate It",
"publish_date": "2013-01-24"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 0.0000059171475,
"_source": {
"summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
"num_reviews": 18,
"title": "Elasticsearch in Action",
"publish_date": "2015-12-03"
}
}
]
19、Function得分:腳本得分( Function Score: Script Scoring )
在內置計分功能不符合您需求的情況下,可以選擇指定用於評分的Groovy腳本。
在我們的示例中,我們要指定一個考慮到publish_date的腳本,然後再決定考慮多少評論。 較新的書籍可能沒有這麼多的評論,所以他們不應該爲此付出“代價”。
得分腳本如下所示:
`
``
publish_date = doc['publish_date'].value
num_reviews = doc['num_reviews'].value
if (publish_date > Date.parse('yyyy-MM-dd', threshold).getTime()) {
my_score = Math.log(2.5 + num_reviews)
} else {
my_score = Math.log(1 + num_reviews)
}
return my_score
要動態使用評分腳本,我們使用script_score參數:
POST /bookdb_index/book/_search
{
"query": {
"function_score": {
"query": {
"multi_match" : {
"query" : "search engine",
"fields": ["title", "summary"]
}
},
"functions": [
{
"script_score": {
"params" : {
"threshold": "2015-07-30"
},
"script": "publish_date = doc['publish_date'].value; num_reviews = doc['num_reviews'].value; if (publish_date > Date.parse('yyyy-MM-dd', threshold).getTime()) { return log(2.5 + num_reviews) }; return log(1 + num_reviews);"
}
}
]
}
},
"_source": ["title", "summary", "publish_date", "num_reviews"]
}
[Results]
"hits": {
"total": 4,
"max_score": 0.8463001,
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.8463001,
"_source": {
"summary": "A distibuted real-time search and analytics engine",
"num_reviews": 20,
"title": "Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 0.7067348,
"_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"num_reviews": 23,
"title": "Solr in Action",
"publish_date": "2014-04-05"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 0.08952084,
"_source": {
"summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
"num_reviews": 18,
"title": "Elasticsearch in Action",
"publish_date": "2015-12-03"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "2",
"_score": 0.07602123,
"_source": {
"summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
"num_reviews": 12,
"title": "Taming Text: How to Find, Organize, and Manipulate It",
"publish_date": "2013-01-24"
}
}
]
}
注1:要使用動態腳本,必須爲config / elasticsearch.yml文件中的Elasticsearch實例啓用它。 也可以使用已經存儲在Elasticsearch服務器上的腳本。 查看 Elasticsearch reference docs 以獲取更多信息。
**注2:**JSON不能包含嵌入的換行符,因此分號用於分隔語句。
原文作者: by Tim Ojo Aug. 05, 16 · Big Data Zone
原文地址:https://dzone.com/articles/23-useful-elasticsearch-example-queries
小結
本文最早來自《Elastic日報第7期》,點開後,感覺很驚喜,這就是我要的檢索方面的實戰資料。所以,第一時間花3個小時完成了梳理、翻譯。
原文標題提及23個技巧,可能是源作者包含了小結內容。我爲了格式統一,做了部分調整。>=原文的內容。
後續,會在開發實戰中進一步應用、完善。