elasticsearch系列四：搜索詳解（搜索API、Query DSL）

原文鏈接：https://www.cnblogs.com/leeSmall/p/9206641.html

一、搜索API

1. 搜索API 端點地址

從索引tweet裏面搜索字段user爲kimchy的記錄

GET /twitter/_search?q=user:kimchy

從索引tweet,user裏面搜索字段user爲kimchy的記錄

GET /twitter/tweet,user/_search?q=user:kimchy

GET /kimchy,elasticsearch/_search?q=tag:wow

從所有索引裏面搜索字段tag爲wow的記錄

GET /_all/_search?q=tag:wow
GET /_search?q=tag:wow

說明：搜索的端點地址可以是多索引多mapping type的。搜索的參數可作爲URI請求參數給出，也可用 request body 給出

2. URI Search

URI 搜索方式通過URI參數來指定查詢相關參數。讓我們可以快速做一個查詢。

GET /twitter/_search?q=user:kimchy

可用的參數請參考： https://www.elastic.co/guide/en/elasticsearch/reference/current/search-uri-request.html

3. 查詢結果說明

5. 特殊的查詢參數用法

如果我們只想知道有多少文檔匹配某個查詢，可以這樣用參數：

GET /bank/_search?q=city:b*&size=0

如果我們只想知道有沒有文檔匹配某個查詢，可以這樣用參數：

GET /bank/_search?q=city:b*&size=0&terminate_after=1

比較兩個查詢的結果可以知道第一個查詢返回所有的命中文檔數，第二個查詢由於只需要知道有沒有文檔，所以只要有文檔就立即返回

6. Request body Search

Request body 搜索方式以JSON格式在請求體中定義查詢 query。請求方式可以是 GET 、POST 。

GET /twitter/_search
{
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

可用的參數:

timeout：請求超時時長，限定在指定時長內響應（即使沒查完）；
from：分頁的起始行，默認0；
size：分頁大小；
request_cache：是否緩存請求結果，默認true。
terminate_after：限定每個分片取幾個文檔。如果設置，則響應將有一個布爾型字段terminated_early來指示查詢執行是否實際已經terminate_early。缺省爲no terminate_after；
search_type：查詢的執行方式，可選值dfs_query_then_fetch or query_then_fetch ，默認： query_then_fetch ；
batched_reduce_size：一次在協調節點上應該減少的分片結果的數量。如果請求中的潛在分片數量可能很大，則應將此值用作保護機制以減少每個搜索請求的內存開銷。

6.1 query 元素定義查詢

query 元素用Query DSL 來定義查詢。

GET /_search
{
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

6.2 指定返回哪些內容

6.2.1 source filter 對_source字段進行選擇

GET /_search
{
    "_source": false,
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

通配符查詢

GET /_search
{
    "_source": [ "obj1.*", "obj2.*" ],
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

GET /_search
{
    "_source": "obj.*",
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

包含什麼不包含什麼

GET /_search
{
    "_source": {
        "includes": [ "obj1.*", "obj2.*" ],
        "excludes": [ "*.description" ]
    },
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

6.2.2 stored_fields 來指定返回哪些stored字段

GET /_search
{
    "stored_fields" : ["user", "postDate"],
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

說明：* 可用來指定返回所有存儲字段

6.2.3 docValue Field 返回存儲了docValue的字段值

GET /_search
{
    "query" : {
        "match_all": {}
    },
    "docvalue_fields" : ["test1", "test2"]
}

6.2.4 version 來指定返回文檔的版本字段

GET /_search
{
    "version": true,
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

6.2.5 explain 返回文檔的評分解釋

GET /_search
{
    "explain": true,
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

6.2.6 Script Field 用腳本來對命中的每個文檔的字段進行運算後返回

GET /bank/_search
{
  "query": {
    "match_all": {}
  },
  "script_fields": {
    "test1": {
      "script": {
        "lang": "painless",
        "source": "doc['balance'].value * 2"
      }
    },
    "test2": {
      "script": {
        "lang": "painless",
        <!--  doc指文檔-->
        "source": "doc['age'].value * params.factor",
        "params": {
          "factor": 2
        }
      }
    } }}

搜索結果：

View Code

GET /bank/_search
{
  "query": {
    "match_all": {}
  },
  "script_fields": {
    "ffx": {
      "script": {
        "lang": "painless",
        "source": "doc['age'].value * doc['balance'].value"
      }
    },
    "balance*2": {
      "script": {
        "lang": "painless",
        "source": "params['_source'].balance*2"
      }
    }
  }
}

說明：

params _source 取 _source字段值

官方推薦使用doc，理由是用doc效率比取_source 高

搜索結果：

View Code

6.2.7 min_score 限制最低評分得分

GET /_search
{
    "min_score": 0.5,
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

6.2.8 post_filter 後置過濾：在查詢命中文檔、完成聚合後，再對命中的文檔進行過濾。

如：要在一次查詢中查詢品牌爲gucci且顏色爲紅色的shirts，同時還要得到gucci品牌各顏色的shirts的分面統計。

創建索引並指定mappping：

PUT /shirts
{
    "mappings": {
        "_doc": {
            "properties": {
                "brand": { "type": "keyword"},
                "color": { "type": "keyword"},
                "model": { "type": "keyword"}
            }
        }
    }
}

往索引裏面放入文檔即類似數據庫裏面的向表插入一行數據，並立即刷新

PUT /shirts/_doc/1?refresh
{
    "brand": "gucci",
    "color": "red",
    "model": "slim"
}
PUT /shirts/_doc/2?refresh
{
    "brand": "gucci",
    "color": "green",
    "model": "seec"
}

執行查詢:

GET /shirts/_search
{
  "query": {
    "bool": {
      "filter": {
        "term": { "brand": "gucci" } 
      }
    }
  },
  "aggs": {
    "colors": {
      "terms": { "field": "color" } 
    }
  },
  "post_filter": { 
    "term": { "color": "red" }
  }
}

查詢結果

{
  "took": 109,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0,
    "hits": [
      {
        "_index": "shirts",
        "_type": "_doc",
        "_id": "1",
        "_score": 0,
        "_source": {
          "brand": "gucci",
          "color": "red",
          "model": "slim"
        }
      }
    ]
  },
  "aggregations": {
    "colors": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "green",
          "doc_count": 1
        },
        {
          "key": "red",
          "doc_count": 1
        }
      ]
    }
  }
}

6.2.9 sort 排序

可以指定按一個或多個字段排序。也可通過_score指定按評分值排序，_doc 按索引順序排序。默認是按相關性評分從高到低排序。

GET /bank/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "age": {
        "order": "desc"
      }    },
    {
      "balance": {
        "order": "asc"
      }    },
    "_score"
  ]
}

說明：

order 值：asc、desc。如果不給定，默認是asc，_score默認是desc

查詢結果：

View Code

結果中每個文檔會有排序字段值給出

 "hits": {
    "total": 1000,
    "max_score": null,
    "hits": [
      {
        "_index": "bank",
        "_type": "_doc",
        "_id": "549",
        "_score": 1,
        "_source": {
          "account_number": 549,
          "balance": 1932, "age": 40, "state": "OR"
        },
        "sort": [
          40,
          1932,
          1
        ]    }

多值字段排序

對於值是數組或多值的字段，也可進行排序，通過mode參數指定按多值的：

PUT /my_index/_doc/1?refresh
{
   "product": "chocolate",
   "price": [20, 4]
}

POST /_search
{
   "query" : {
      "term" : { "product" : "chocolate" }
   },
   "sort" : [
      {"price" : {"order" : "asc", "mode" : "avg"}}
   ]
}

Missing values 缺失該字段的文檔

missing 的值可以是 _last, _first

GET /_search
{
    "sort" : [
        { "price" : {"missing" : "_last"} }
    ],
    "query" : {
        "term" : { "product" : "chocolate" }
    }
}

地理空間距離排序

官方文檔：

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html#geo-sorting

GET /_search
{
    "sort" : [
        {
            "_geo_distance" : {
                "pin.location" : [-70, 40],
                "order" : "asc",
                "unit" : "km",
                "mode" : "min",
                "distance_type" : "arc"
            }
        }
    ],
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

參數說明：

_geo_distance 距離排序關鍵字
pin.location是 geo_point 類型的字段
distance_type：距離計算方式 arc球面、plane 平面。
unit: 距離單位 km 、m 默認m

Script Based Sorting 基於腳本計算的排序

GET /_search
{
    "query" : {
        "term" : { "user" : "kimchy" }
    },
    "sort" : {
        "_script" : {
            "type" : "number",
            "script" : {
                "lang": "painless",
                "source": "doc['field_name'].value * params.factor",
                "params" : {
                    "factor" : 1.1
                }
            },
            "order" : "asc"
        }
    }
}

6.3.0 摺疊

用 collapse指定根據某個字段對命中結果進行摺疊

GET /bank/_search
{
    "query": {
        "match_all": {}
    },
    "collapse" : {
        "field" : "age" 
    },
    "sort": ["balance"] 
}

查詢結果：

View Code

高級摺疊

GET /bank/_search
{
    "query": {
        "match_all": {}
    },
    "collapse" : {
        "field" : "age" ,
        <!--指定inner_hits來解釋摺疊 -->
        "inner_hits": {
            "name": "details", <!-- 自命名 -->
            "size": 5,   <!-- 指定每組取幾個文檔 -->
            "sort": [{ "balance": "asc" }] <!-- 組內排序 -->
        },
        "max_concurrent_group_searches": 4 <!-- 指定組查詢的併發數 -->
    },
    "sort": ["balance"] 
}

查詢結果：

View Code

在inner_hits 中返回多個角度的組內topN

GET /twitter/_search
{
    "query": {
        "match": {
            "message": "elasticsearch"
        }
    },
    "collapse" : {
        "field" : "user", 
        "inner_hits": [
            {
                "name": "most_liked",  
                "size": 3,
                "sort": ["likes"]
            },
            {
                "name": "most_recent", 
                "size": 3,
                "sort": [{ "date": "asc" }]
            }
        ]
    },
    "sort": ["likes"]
}

說明：

most_liked：最像

most_recent：最近一段時間的

6.3.1 分頁

from and size

GET /_search
{
    "from" : 0, "size" : 10,
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

注意：搜索請求耗用的堆內存和時間與 from + size 大小成正比。分頁越深耗用越大，爲了不因分頁導致OOM或嚴重影響性能，ES中規定from + size 不能大於索引setting參數 index.max_result_window 的值，默認值爲 10,000。

需要深度分頁，不受index.max_result_window 限制，怎麼辦？

Search after 在指定文檔後取文檔，可用於深度分頁

首次查詢第一頁

GET twitter/_search
{
    "size": 10,
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    },
    "sort": [
        {"date": "asc"},
        {"_id": "desc"}
    ]
}

後續頁的查詢

GET twitter/_search
{
    "size": 10,
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    },
    "search_after": [1463538857, "654323"],
    "sort": [
        {"date": "asc"},
        {"_id": "desc"}
    ]
}

注意：使用search_after，要求查詢必須指定排序，並且這個排序組合值每個文檔唯一（最好排序中包含_id字段）。 search_after的值用的就是這個排序值。用search_after時 from 只能爲0、-1。

6.3.2 高亮

準備數據：

PUT /hl_test/_doc/1
{
  "title": "lucene solr and elasticsearch",
  "content": "lucene solr and elasticsearch for search"
}

查詢高亮數據

GET /hl_test/_search
{
  "query": {
    "match": {
      "title": "lucene"
    }
  },
  "highlight": {
    "fields": {
      "title": {},
      "content": {}
    }
  }
}

查詢結果：

{
  "took": 113,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "hl_test",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "title": "lucene solr and elasticsearch",
          "content": "lucene solr and elasticsearch for search"
        },
        "highlight": {
          "title": [
            "<em>lucene</em> solr and elasticsearch"
          ]
        }
      }
    ]
  }
}

多字段高亮

GET /hl_test/_search
{
  "query": {
    "match": {
      "title": "lucene"
    }
  },
  "highlight": {
    "require_field_match": false,
    "fields": {
      "title": {},
      "content": {}
    }
  }
}

查詢結果：

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "hl_test",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "title": "lucene solr and elasticsearch",
          "content": "lucene solr and elasticsearch for search"
        },
        "highlight": {
          "title": [
            "<em>lucene</em> solr and elasticsearch"
          ],
          "content": [
            "<em>lucene</em> solr and elasticsearch for search"
          ]
        }
      }
    ]
  }
}

說明：

高亮結果在返回的每個文檔中以hightlight節點給出

指定高亮標籤

GET /hl_test/_search
{
  "query": {
    "match": {
      "title": "lucene"
    }
  },
  "highlight": {
    "require_field_match": false,
    "fields": {
      "title": {
        "pre_tags":["<strong>"],
        "post_tags": ["</strong>"]
      },
      "content": {}
    }
  }
}

查詢結果：

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "hl_test",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "title": "lucene solr and elasticsearch",
          "content": "lucene solr and elasticsearch for search"
        },
        "highlight": {
          "title": [
            "<strong>lucene</strong> solr and elasticsearch"
          ],
          "content": [
            "<em>lucene</em> solr and elasticsearch for search"
          ]
        }
      }
    ]
  }
}

高亮的詳細設置請參考官網：https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-highlighting.html

6.3.3 Profile 爲了調試、優化

對於執行緩慢的查詢，我們很想知道它爲什麼慢，時間都耗在哪了，可以在查詢上加入上 profile 來獲得詳細的執行步驟、耗時信息。

GET /twitter/_search
{
  "profile": true,
  "query" : {
    "match" : { "message" : "some number" }
  }
}

信息的說明請參考：

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-profile.html

7. count api 查詢數量

PUT /twitter/_doc/1?refresh
{
    "user": "kimchy"
}

GET /twitter/_doc/_count?q=user:kimchy

GET /twitter/_doc/_count
{
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

結果說明：

{
    "count" : 1,
    "_shards" : {
        "total" : 5,
        "successful" : 5,
        "skipped" : 0,
        "failed" : 0
    }
}

8. validate api

用來檢查我們的查詢是否正確，以及查看底層生成查詢是怎樣的

GET twitter/_validate/query?q=user:foo

8.1 校驗查詢

GET twitter/_doc/_validate/query
{
  "query": {
    "query_string": {
      "query": "post_date:foo",
      "lenient": false
    }
  }
}

查詢結果：

{
  "valid": true,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  }
}

8.2 獲得查詢解釋

GET twitter/_doc/_validate/query?explain=true
{
  "query": {
    "query_string": {
      "query": "post_date:foo",
      "lenient": false
    }
  }
}

查詢結果

{
  "valid": true,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "explanations": [
    {
      "index": "twitter",
      "valid": true,
      "explanation": """+MatchNoDocsQuery("unmapped field [post_date]") #MatchNoDocsQuery("Type list does not contain the index type")"""
    }
  ]
}

8.3 用rewrite獲得比explain 更詳細的解釋

GET twitter/_doc/_validate/query?rewrite=true
{
  "query": {
    "more_like_this": {
      "like": {
        "_id": "2"
      },
      "boost_terms": 1
    }
  }
}

查詢結果：

{
  "valid": true,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "explanations": [
    {
      "index": "twitter",
      "valid": true,
      "explanation": """+(MatchNoDocsQuery("empty BooleanQuery") -ConstantScore(MatchNoDocsQuery("empty BooleanQuery"))) #MatchNoDocsQuery("Type list does not contain the index type")"""
    }
  ]
}

8.4 獲得所有分片上的查詢解釋

GET twitter/_doc/_validate/query?rewrite=true&all_shards=true
{
  "query": {
    "match": {
      "user": {
        "query": "kimchy",
        "fuzziness": "auto"
      }
    }
  }
}

查詢結果：

{
  "valid": true,
  "_shards": {
    "total": 3,
    "successful": 3,
    "failed": 0
  },
  "explanations": [
    {
      "index": "twitter",
      "shard": 0,
      "valid": true,
      "explanation": """MatchNoDocsQuery("unmapped field [user]")"""
    },
    {
      "index": "twitter",
      "shard": 1,
      "valid": true,
      "explanation": """MatchNoDocsQuery("unmapped field [user]")"""
    },
    {
      "index": "twitter",
      "shard": 2,
      "valid": true,
      "explanation": """MatchNoDocsQuery("unmapped field [user]")"""
    }
  ]
}

官網鏈接：

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-validate.html

9. Explain api

獲得某個查詢的評分解釋,及某個文檔是否被這個查詢命中

GET /twitter/_doc/0/_explain
{
      "query" : {
        "match" : { "message" : "elasticsearch" }
      }
}

官網鏈接：

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html

10. Search Shards API

讓我們可以瞭解可執行查詢的索引分片節點情況

GET /twitter/_search_shards

查詢結果：

View Code

想知道指定routing值的查詢將在哪些分片節點上執行

GET /twitter/_search_shards?routing=foo,baz

查詢結果：

{
  "nodes": {
    "qkmtovyLRPWjXcfDTryNwA": {
      "name": "qkmtovy",
      "ephemeral_id": "sxgsvzsORraAnN7PIlMYpg",
      "transport_address": "127.0.0.1:9300",
      "attributes": {}
    }
  },
  "indices": {
    "twitter": {}
  },
  "shards": [
    [
      {
        "state": "STARTED",
        "primary": true,
        "node": "qkmtovyLRPWjXcfDTryNwA",
        "relocating_node": null,
        "shard": 1,
        "index": "twitter",
        "allocation_id": {
          "id": "8S88pnUkSSy8kiCcwBgb9Q"
        }
      }
    ]
  ]
}

11. Search Template 查詢模板

註冊一個模板

POST _scripts/<templatename>
{
    "script": {
        "lang": "mustache",
        "source": {
            "query": {
                "match": {
                    "title": "{{query_string}}"
                }
            }
        }
    }
}

使用模板進行查詢

GET _search/template
{
    "id": "<templateName>", 
    "params": {
        "query_string": "search for these words"
    }
}

查詢結果：

{
  "took": 11,
  "timed_out": false,
  "_shards": {
    "total": 38,
    "successful": 38,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

詳細瞭解請參考官網：

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-template.html

二、Query DSL

官網介紹鏈接：https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html

Query DSL 介紹

1. DSL是什麼？

Domain Specific Language：領域特定語言

Elasticsearch基於JSON提供完整的查詢DSL來定義查詢。

一個查詢可由兩部分字句構成：

Leaf query clauses 葉子查詢字句
Leaf query clauses 在指定的字段上查詢指定的值, 如：match, term or range queries. 葉子字句可以單獨使用.
Compound query clauses 複合查詢字句
以邏輯方式組合多個葉子、複合查詢爲一個查詢

2. Query and filter context

一個查詢字句的行爲取決於它是用在query context 還是 filter context 中。

Query context 查詢上下文
用在查詢上下文中的字句回答“這個文檔有多匹配這個查詢?”。除了決定文檔是否匹配，字句匹配的文檔還會計算一個字句評分，來評定文檔有多匹配。查詢上下文由 query 元素表示。
Filter context 過濾上下文
過濾上下文由 filter 元素或 bool 中的 must not 表示。用在過濾上下文中的字句回答“這個文檔是否匹配這個查詢？”，不參與相關性評分。
被頻繁使用的過濾器將被ES自動緩存，來提高查詢性能。

示例：

GET /_search
{
  <!--查詢 -->
  "query": { 
    "bool": { 
      "must": [
        { "match": { "title":   "Search"        }}, 
        { "match": { "content": "Elasticsearch" }}  
      ],
      <!--過濾 -->
      "filter": [ 
        { "term":  { "status": "published" }}, 
        { "range": { "publish_date": { "gte": "2015-01-01" }}} 
      ]
    }
  }
}

說明：查詢和過濾都是對所有文檔進行查詢，最後兩個結果取交集

提示：在查詢上下文中使用查詢子句來表示影響匹配文檔得分的條件，並在過濾上下文中使用所有其他查詢子句。

查詢分類介紹

1. Match all query 查詢所有

GET /_search
{
    "query": {
        "match_all": {}
    }
}

相反，什麼都不查

GET /_search
{
    "query": {
        "match_none": {}
    }
}

2. Full text querys

全文查詢，用於對分詞的字段進行搜索。會用查詢字段的分詞器對查詢的文本進行分詞生成查詢。可用於短語查詢、模糊查詢、前綴查詢、臨近查詢等查詢場景

官網鏈接：

https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html

3. match query

全文查詢的標準查詢，它可以對一個字段進行模糊、短語查詢。 match queries 接收 text/numerics/dates, 對它們進行分詞分析, 再組織成一個boolean查詢。可通過operator 指定bool組合操作（or、and 默認是 or ），以及minimum_should_match 指定至少需多少個should(or)字句需滿足。還可用ananlyzer指定查詢用的特殊分析器。

GET /_search
{
    "query": {
        "match" : {
            "message" : "this is a test"
        }
    }
}

說明：message是字段名

官網鏈接：https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html

示例：

構造索引和數據：

PUT /ftq/_doc/1
{
  "title": "lucene solr and elasticsearch",
  "content": "lucene solr and elasticsearch for search"
}

PUT /ftq/_doc/2
{
  "title": "java spring boot",
  "content": "lucene is writerd by java"
}

執行查詢1

GET ftq/_doc/_validate/query?rewrite=true
{
  "query": {
    "match": {
      "title": "lucene java"
    }
  }
}

查詢結果1：

{
  "valid": true,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "explanations": [
    {
      "index": "ftq",
      "valid": true,
      "explanation": "title:lucene title:java"
    }
  ]
}

執行查詢2：

GET ftq/_search
{
  "query": {
    "match": {
      "title": "lucene java"
    }
  }
}

查詢結果2：

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "ftq",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.2876821,
        "_source": {
          "title": "java spring boot",
          "content": "lucene is writerd by java"
        }
      },
      {
        "_index": "ftq",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "title": "lucene solr and elasticsearch",
          "content": "lucene solr and elasticsearch for search"
        }
      }
    ]
  }
}

執行查詢3：指定操作符

GET ftq/_search
{
  "query": {
    "match": {
      "title": {
        "query": "lucene java",
        "operator": "and"
      }
    }
  }
}

查詢結果3：

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

模糊查詢，最大編輯數爲2

GET ftq/_search
{
  "query": {
    "match": {
      "title": {
        "query": "ucen elatic",
        "fuzziness": 2
      }
    }
  }
}

模糊查詢結果：

{
  "took": 280,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.14384104,
    "hits": [
      {
        "_index": "ftq",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.14384104,
        "_source": {
          "title": "lucene solr and elasticsearch",
          "content": "lucene solr and elasticsearch for search"
        }
      }
    ]
  }
}

指定最少需滿足兩個詞匹配

GET ftq/_search
{
  "query": {
    "match": {
      "content": {
        "query": "ucen elatic java",
        "fuzziness": 2,
        "minimum_should_match": 2
      }
    }
  }
}

查詢結果：

{
  "took": 19,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.43152314,
    "hits": [
      {
        "_index": "ftq",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.43152314,
        "_source": {
          "title": "java spring boot",
          "content": "lucene is writerd by java"
        }
      }
    ]
  }
}

可用max_expansions 指定模糊匹配的最大詞項數，默認是50。比如：反向索引中有 100 個詞項與 ucen 模糊匹配，只選用前50 個。

4. match phrase query

match_phrase 查詢用來對一個字段進行短語查詢，可以指定 analyzer、slop移動因子。

對字段進行短語查詢1：

GET ftq/_search
{
  "query": {
    "match_phrase": {
      "title": "lucene solr"
    }
  }
}

結果1：

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "ftq",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.5753642,
        "_source": {
          "title": "lucene solr and elasticsearch",
          "content": "lucene solr and elasticsearch for search"
        }
      }
    ]
  }
}

對字段進行短語查詢2：

GET ftq/_search
{
  "query": {
    "match_phrase": {
      "title": "lucene elasticsearch"
    }
  }
}

結果2：

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

對查詢指定移動因子：

GET ftq/_search
{
  "query": {
    "match_phrase": {
      "title": {
        "query": "lucene elasticsearch",
        "slop": 2
      }
    }
  }
}

查詢結果：

{
  "took": 2174,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.27517417,
    "hits": [
      {
        "_index": "ftq",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.27517417,
        "_source": {
          "title": "lucene solr and elasticsearch",
          "content": "lucene solr and elasticsearch for search"
        }
      }
    ]
  }
}

5. match phrase prefix query

match_phrase_prefix 在 match_phrase 的基礎上支持對短語的最後一個詞進行前綴匹配

GET /_search
{
    "query": {
        "match_phrase_prefix" : {
            "message" : "quick brown f"
        }
    }
}

指定前綴匹配選用的最大詞項數量

GET /_search
{
    "query": {
        "match_phrase_prefix" : {
            "message" : {
                "query" : "quick brown f",
                "max_expansions" : 10
            }
        }
    }
}

6. Multi match query

如果你需要在多個字段上進行文本搜索，可用multi_match 。 multi_match在 match的基礎上支持對多個字段進行文本查詢。

查詢1：

GET ftq/_search
{
  "query": {
    "multi_match" : {
      "query":    "lucene java", 
      "fields": [ "title", "content" ] 
    }
  }
}

結果1：

{
  "took": 1973,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "ftq",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.5753642,
        "_source": {
          "title": "java spring boot",
          "content": "lucene is writerd by java"
        }
      },
      {
        "_index": "ftq",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "title": "lucene solr and elasticsearch",
          "content": "lucene solr and elasticsearch for search"
        }
      }
    ]
  }
}

查詢2：字段通配符查詢

GET ftq/_search
{
  "query": {
    "multi_match" : {
      "query":    "lucene java", 
      "fields": [ "title", "cont*" ] 
    }
  }
}

結果2：

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "ftq",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.5753642,
        "_source": {
          "title": "java spring boot",
          "content": "lucene is writerd by java"
        }
      },
      {
        "_index": "ftq",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "title": "lucene solr and elasticsearch",
          "content": "lucene solr and elasticsearch for search"
        }
      }
    ]
  }
}

查詢3：給字段的相關性評分加權重

GET ftq/_search?explain=true
{
  "query": {
    "multi_match" : {
      "query":    "lucene elastic", 
      "fields": [ "title^5", "content" ] 
    }
  }
}

結果3：

View Code

7. Common terms query

common 常用詞查詢

問1、什麼是停用詞？索引時做停用詞處理的目的是什麼？

不再使用的詞，做停用詞處理的目的是提高索引的效率，去掉不需要的索引操作，即停用詞不需要索引
問2、如果在索引時應用停用詞處理，下面的兩個查詢會查詢什麼詞項？
the brown fox—— brown fox
not happy——happy

問3、索引時應用停用詞處理對搜索精度是否有影響？如果不做停用詞處理又會有什麼影響？如何協調這兩個問題？如何保證搜索的精確度又兼顧搜索性能？

索引時應用停用詞處理對搜索精度有影響，不做停用詞處理又會影響索引的效率，要協調這兩個問題就必須要使用tf-idf 相關性計算模型

7.1 tf-idf 相關性計算模型簡介

tf：term frequency 詞頻：指一個詞在一篇文檔中出現的頻率。

如“世界盃”在文檔A中出現3次，那麼可以定義“世界盃”在文檔A中的詞頻爲3。請問在一篇3000字的文章中出現“世界盃”3次和一篇150字的文章中出現3詞，哪篇文章更是與“世界盃”有關的。也就是說，簡單用出現次數作爲頻率不夠準確。那就用佔比來表示：

問：tf值越大是否就一定說明這個詞更相關？

不是，出現太多了說明不重要

說明：tf的計算不一定非是這樣的，可以定義不同的計算方式。

df：document frequency 詞的文檔頻率：指包含某個詞的文檔數（有多少文檔中包含這個詞）。 df越大的詞越常見，哪些詞會是高頻詞？

問1：詞的df值越大說明這個詞在這個文檔集中是越重要還是越不重要？

越不重要

問2：詞t的tf高，在文檔集中的重要性也高，是否說明文檔與該詞越相關？舉例：整個文檔集中只有3篇文檔中有“世界盃”，文檔A中就出現了“世界盃”好幾次。

不能說明文檔與該詞越相關

問3：如何用數值體現詞t在文檔集中的重要性？df可以嗎？

不可以

idf：inverse document frequency 詞的逆文檔頻率 ：用來表示詞在文檔集中的重要性。文檔總數/ df ，df越小，詞越重要，這個值會很大，那就對它取個自然對數，將值映射到一個較小的取值範圍。

說明： +1 是爲了避免除0（即詞t在文檔集中未出現的情況）

tf-idf 相關性性計算模型：tf-idf t = tf t,d * idf t

說明： tf-idf 相關性性計算模型的值爲詞頻（ tf t,d）乘以詞的逆文檔頻率（idf t）

7.2 Common terms query

common 區分常用（高頻）詞查詢讓我們可以通過cutoff_frequency來指定一個分界文檔頻率值，將搜索文本中的詞分爲高頻詞和低頻詞，低頻詞的重要性高於高頻詞，先對低頻詞進行搜索並計算所有匹配文檔相關性得分；然後再搜索和高頻詞匹配的文檔，這會搜到很多文檔，但只對和低頻詞重疊的文檔進行相關性得分計算（這可保證搜索精確度，同時大大提高搜索性能），和低頻詞累加作爲文檔得分。實際執行的搜索是必須包含低頻詞 + 或包含高頻詞。

思考：這樣處理下，如果用戶輸入的都是高頻詞如 “to be or not to be”結果會是怎樣的？你希望是怎樣的？

優化：如果都是高頻詞，那就對這些詞進行and 查詢。
進一步優化：讓用戶可以自己定對高頻詞做and/or 操作，自己定對低頻詞進行and/or 操作；或指定最少得多少個同時匹配

示例1:

GET /_search
{
    "query": {
        "common": {
            "message": {
                "query": "this is bonsai cool",
                "cutoff_frequency": 0.001
            }
        }
    }
}

說明：

cutoff_frequency : 值大於1表示文檔數，0-1.0表示佔比。此處界定文檔頻率大於 0.1%的詞爲高頻詞。

示例2：

GET /_search
{
    "query": {
        "common": {
            "body": {
                "query": "nelly the elephant as a cartoon",
                "cutoff_frequency": 0.001,
                "low_freq_operator": "and"
            }
        }
    }
}

說明：low_freq_operator指定對低頻詞做與操作

可用參數：minimum_should_match (high_freq, low_freq), low_freq_operator (default “or”) and high_freq_operator (default “or”)、 boost and analyzer

示例3：

GET /_search
{
    "query": {
        "common": {
            "body": {
                "query": "nelly the elephant as a cartoon",
                "cutoff_frequency": 0.001,
                "minimum_should_match": 2
            }
        }
    }
}

示例4：

GET /_search
{
    "query": {
        "common": {
            "body": {
                "query": "nelly the elephant not as a cartoon",
                "cutoff_frequency": 0.001,
                "minimum_should_match": {
                    "low_freq" : 2,
                    "high_freq" : 3
                }
            }
        }
    }
}

示例5：

8. Query string query

query_string 查詢，讓我們可以直接用lucene查詢語法寫一個查詢串進行查詢，ES中接到請求後，通過查詢解析器解析查詢串生成對應的查詢。使用它要求掌握lucene的查詢語法。

示例1：指定單個字段查詢

GET /_search
{
    "query": {
        "query_string" : {
            "default_field" : "content",
            "query" : "this AND that OR thus"
        }
    }
}

示例2：指定多字段通配符查詢

GET /_search
{
    "query": {
        "query_string" : {
            "fields" : ["content", "name.*^5"],
            "query" : "this AND that OR thus"
        }
    }
}

可與query同用的參數，如 default_field、fields，及query 串的語法請參考：

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html

9. 查詢描述規則語法（查詢解析語法）

Term 詞項：

單個詞項的表示：電腦
短語的表示： "聯想筆記本電腦"

Field 字段：

字段名:
示例： name:“聯想筆記本電腦” AND type:電腦
如果name是默認字段，則可寫成： “聯想筆記本電腦” AND type:電腦
如果查詢串是：type:電腦計算機手機
注意：只有第一個是type的值，後兩個則是使用默認字段。

Term Modifiers 詞項修飾符：

10. Simple Query string query

simple_query_string 查同 query_string 查詢一樣用lucene查詢語法寫查詢串，較query_string不同的地方：更小的語法集；查詢串有錯誤，它會忽略錯誤的部分，不拋出錯誤。更適合給用戶使用。

示例：

GET /_search
{
  "query": {
    "simple_query_string" : {
        "query": "\"fried eggs\" +(eggplant | potato) -frittata",
        "fields": ["title^5", "body"],
        "default_operator": "and"
    }
  }
}

語法請參考：

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html

11. Term level querys

官網鏈接：

https://www.elastic.co/guide/en/elasticsearch/reference/current/term-level-queries.html

11.1 Term query

term 查詢用於查詢指定字段包含某個詞項的文檔。

示例1：

POST _search
{
  "query": {
    "term" : { "user" : "Kimchy" } 
  }
}

示例2：加權重

GET _search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "status": {
              "value": "urgent",
              "boost": 2
            }
          }
        },
        {
          "term": {
            "status": "normal"
          }
        }
      ]
    }
  }
}

11.2 Terms query

terms 查詢用於查詢指定字段包含某些詞項的文檔。

GET /_search
{
    "query": {
        "terms" : { "user" : ["kimchy", "elasticsearch"]}
    }
}

Terms 查詢支持嵌套查詢的方式來獲得查詢詞項，相當於 in (select term from other)

示例1：Terms query 嵌套查詢示例

PUT /users/_doc/2
{
    "followers" : ["1", "3"]
}

PUT /tweets/_doc/1
{
    "user" : "1"
}

GET /tweets/_search
{
  "query": {
    "terms": {
      "user": {
        "index": "users",
        "type": "_doc",
        "id": "2",
        "path": "followers"
      }
    }
  }
}

查詢結果：

{
  "took": 14,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "tweets",
        "_type": "_doc",
        "_id": "1",
        "_score": 1,
        "_source": {
          "user": "1"
        }
      }
    ]
  }
}

嵌套查詢可用參數說明：

11.3 range query

範圍查詢示例1：

GET _search
{
    "query": {
        "range" : {
            "age" : {
                "gte" : 10,
                "lte" : 20,
                "boost" : 2.0
            }
        }
    }
}

範圍查詢示例2：

GET _search
{
    "query": {
        "range" : {
            "date" : {
                "gte" : "now-1d/d",
                "lt" :  "now/d"
            }
        }
    }
}

範圍查詢示例3：

GET _search
{
    "query": {
        "range" : {
            "born" : {
                "gte": "01/01/2012",
                "lte": "2013",
                "format": "dd/MM/yyyy||yyyy"
            }
        }
    }
}

範圍查詢參數說明：

範圍查詢時間舍入 ||說明：

時間數學計算規則請參考：

https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#date-math

11.4 exists query

查詢指定字段值不爲空的文檔。相當 SQL 中的 column is not null

GET /_search
{
    "query": {
        "exists" : { "field" : "user" }
    }
}

查詢指定字段值爲空的文檔

GET /_search
{
  "query": {
    "bool": {
      "must_not": {
        "exists": {
          "field": "user"
        }
      }
    }
  }
}

11.5 prefix query 詞項前綴查詢

示例1：

GET /_search
{ "query": {
    "prefix" : { "user" : "ki" }
  }
}

示例2：加權

GET /_search
{ "query": {
    "prefix" : { "user" :  { "value" : "ki", "boost" : 2.0 } }
  }
}

11.6 wildcard query 通配符查詢：？ *

示例1：

GET /_search
{
    "query": {
        "wildcard" : { "user" : "ki*y" }
    }
}

示例2：加權

GET /_search
{
  "query": {
    "wildcard": {
      "user": {
        "value": "ki*y",
        "boost": 2
      }
    }
  }}

11.7 regexp query 正則查詢

示例1：

GET /_search
{
    "query": {
        "regexp":{
            "name.first": "s.*y"
        }
    }
}

示例2：加權

GET /_search
{
    "query": {
        "regexp":{
            "name.first":{
                "value":"s.*y",
                "boost":1.2
            }
        }
    }
}

正則語法請參考：

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html#regexp-syntax

11.8 fuzzy query 模糊查詢

示例1:

GET /_search
{
    "query": {
       "fuzzy" : { "user" : "ki" }
    }
}

示例2：

GET /_search
{
    "query": {
        "fuzzy" : {
            "user" : {
                "value": "ki",
                "boost": 1.0,
                "fuzziness": 2,
                "prefix_length": 0,
                "max_expansions": 100
            }
        }
    }
}

11.9 type query mapping type 查詢

GET /_search
{
    "query": {
        "type" : {
            "value" : "_doc"
        }
    }
}

11.10 ids query 根據文檔id查詢

GET /_search
{
    "query": {
        "ids" : {
            "type" : "_doc",
            "values" : ["1", "4", "100"]
        }
    }
}

12. Compound querys 複合查詢

官網鏈接：

https://www.elastic.co/guide/en/elasticsearch/reference/current/compound-queries.html

12.1 Constant Score query

用來包裝另一個查詢，將查詢匹配的文檔的評分設爲一個常值。

GET /_search
{
    "query": {
        "constant_score" : {
            "filter" : {
                "term" : { "user" : "kimchy"}
            },
            "boost" : 1.2
        }
    }
}

12.2 Bool query

Bool 查詢用bool操作來組合多個查詢字句爲一個查詢。可用的關鍵字：

示例：

POST _search
{
  "query": {
    "bool" : {
      "must" : {
        "term" : { "user" : "kimchy" }
      },
      "filter": {
        "term" : { "tag" : "tech" }
      },
      "must_not" : {
        "range" : {
          "age" : { "gte" : 10, "lte" : 20 }
        }
      },
      "should" : [
        { "term" : { "tag" : "wow" } },
        { "term" : { "tag" : "elasticsearch" } }
      ],
      "minimum_should_match" : 1,
      "boost" : 1.0
    }
  }
}

說明：should滿足一個或者兩個或者都不滿足

elasticsearch系列四：搜索詳解（搜索API、Query DSL）

一、搜索API

1. 搜索API 端點地址

2. URI Search

3. 查詢結果說明

5. 特殊的查詢參數用法

6. Request body Search

7. count api 查詢數量

8. validate api

9. Explain api

10. Search Shards API

11. Search Template 查詢模板

二、Query DSL

Query DSL 介紹

1. DSL是什麼？

2. Query and filter context

查詢分類介紹

1. Match all query 查詢所有

2. Full text querys

3. match query

4. match phrase query

5. match phrase prefix query

6. Multi match query

7. Common terms query

8. Query string query

9. 查詢描述規則語法（查詢解析語法）

10. Simple Query string query

11. Term level querys

12. Compound querys 複合查詢

win11關閉自動檢測病毒刪文件

千兆寬帶實際網速能到達多少？

epoll的本質（3）

（半）自動化爬蟲系統該包含的功能點及相關介紹

Elasticsearch基礎整理-Elasticsearch Lucene 數據寫入原理

elasticsearch系列四：搜索詳解（搜索API、Query DSL）

elasticsearch系列五：搜索詳解（查詢建議介紹、Suggester 介紹）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結