Elasticsearch之Query DSL語法入門

1． query DSL入門

1.1 DSL

query string 後邊的參數原來越多，搜索條件越來越複雜，不能滿足需求。

GET /book/_search?q=name:java&size=10&from=0&sort=price:desc

DSL:Domain Specified Language，特定領域的語言

es特有的搜索語言，可在請求體中攜帶搜索條件，功能強大。

查詢全部 GET /book/_search

GET /book/_search
{
  "query": { "match_all": {} }
}

排序 GET /book/_search?sort=price:desc

GET /book/_search 
{
    "query" : {
        "match" : {
            "name" : " java"
        }
    },
    "sort": [
        { "price": "desc" }
    ]
}

分頁查詢 GET /book/_search?size=10&from=0

GET  /book/_search 
{
  "query": { "match_all": {} },
  "from": 0,
  "size": 1
}

指定返回字段 GET /book/ _search? _source=name,studymodel

GET /book/_search 
{
  "query": { "match_all": {} },
  "_source": ["name", "studymodel"]
}

通過組合以上各種類型查詢，實現複雜查詢。

1.2． Query DSL語法

{
    QUERY_NAME: {
        ARGUMENT: VALUE,
        ARGUMENT: VALUE,...
    }
}

{
    QUERY_NAME: {
        FIELD_NAME: {
            ARGUMENT: VALUE,
            ARGUMENT: VALUE,...
        }
    }
}

GET /test_index/_search 
{
  "query": {
    "match": {
      "test_field": "test"
    }
  }
}

1.3 組合多個搜索條件

搜索需求：title必須包含elasticsearch，content可以包含elasticsearch也可以不包含，author_id必須不爲111

sql where and or !=

初始數據：

POST /website/_doc/1
{
          "title": "my hadoop article",
          "content": "hadoop is very bad",
          "author_id": 111
}

POST /website/_doc/2
{
          "title": "my elasticsearch  article",
          "content": "es is very bad",
          "author_id": 112
}
POST /website/_doc/3
{
          "title": "my elasticsearch article",
          "content": "es is very goods",
          "author_id": 111
}

搜索：

GET /website/_doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "elasticsearch"
          }
        }
      ],
      "should": [
        {
          "match": {
            "content": "elasticsearch"
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "author_id": 111
          }
        }
      ]
    }
  }
}

{
  "took" : 488,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.47000363,
    "hits" : [
      {
        "_index" : "website",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.47000363,
        "_source" : {
          "title" : "my elasticsearch  article",
          "content" : "es is very bad",
          "author_id" : 112
        }
      }
    ]
  }
}

更復雜的搜索需求：

select * from test_index where name='tom' or (hired =true and (personality ='good' and rude != true ))

GET /test_index/_search
{
    "query": {
            "bool": {
                "must": { "match":{ "name": "tom" }},
                "should": [
                    { "match":{ "hired": true }},
                    { "bool": {
                        "must":{ "match": { "personality": "good" }},
                        "must_not": { "match": { "rude": true }}
                    }}
                ],
                "minimum_should_match": 1
            }
    }
}

2． full-text search 全文檢索

2.1 全文檢索

重新創建book索引

PUT /book/
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "name":{
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      },
      "description":{
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      },
      "studymodel":{
        "type": "keyword"
      },
      "price":{
        "type": "double"
      },
      "timestamp": {
         "type": "date",
         "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      },
      "pic":{
        "type":"text",
        "index":false
      }
    }
  }
}

插入數據

PUT /book/_doc/1
{
"name": "Bootstrap開發",
"description": "Bootstrap是由Twitter推出的一個前臺頁面開發css框架，是一個非常流行的開發框架，此框架集成了多種頁面效果。此開發框架包含了大量的CSS、JS程序代碼，可以幫助開發者（尤其是不擅長css頁面開發的程序人員）輕鬆的實現一個css，不受瀏覽器限制的精美界面css效果。",
"studymodel": "201002",
"price":38.6,
"timestamp":"2019-08-25 19:11:35",
"pic":"group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
"tags": [ "bootstrap", "dev"]
}

PUT /book/_doc/2
{
"name": "java編程思想",
"description": "java語言是世界第一編程語言，在軟件開發領域使用人數最多。",
"studymodel": "201001",
"price":68.6,
"timestamp":"2019-08-25 19:11:35",
"pic":"group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
"tags": [ "java", "dev"]
}

PUT /book/_doc/3
{
"name": "spring開發基礎",
"description": "spring 在java領域非常流行，java程序員都在用。",
"studymodel": "201001",
"price":88.6,
"timestamp":"2019-08-24 19:11:35",
"pic":"group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
"tags": [ "spring", "java"]
}

搜索

GET  /book/_search 
{
    "query" : {
        "match" : {
            "description" : "java程序員"
        }
    }
}

2.2 _score初探

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 2.137549,
    "hits" : [
      {
        "_index" : "book",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 2.137549,
        "_source" : {
          "name" : "spring開發基礎",
          "description" : "spring 在java領域非常流行，java程序員都在用。",
          "studymodel" : "201001",
          "price" : 88.6,
          "timestamp" : "2019-08-24 19:11:35",
          "pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
          "tags" : [
            "spring",
            "java"
          ]
        }
      },
      {
        "_index" : "book",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.57961315,
        "_source" : {
          "name" : "java編程思想",
          "description" : "java語言是世界第一編程語言，在軟件開發領域使用人數最多。",
          "studymodel" : "201001",
          "price" : 68.6,
          "timestamp" : "2019-08-25 19:11:35",
          "pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
          "tags" : [
            "java",
            "dev"
          ]
        }
      }
    ]
  }
}

結果分析

1、建立索引時, description字段 term倒排索引

java 2,3

程序員 3

2、搜索時，直接找description中含有java的文檔 2,3，並且3號文檔含有兩個java字段，一個程序員，所以得分高，排在前面。2號文檔含有一個java，排在後面。

3． DSL 語法練習

3.1 match_all

查詢所有文檔

GET /book/_search
{
    "query": {
        "match_all": {}
    }
}

3.2 match

match query 知道分詞器的存在，會對field進行分詞操作，然後再查詢

GET /book/_search
{
    "query": { 
        "match": { 
            "description": "java程序員"
        }
    }
}

3.3 multi_match

可以指定多個字段

GET /book/_search
{
  "query": {
    "multi_match": {
      "query": "java程序員",
      "fields": ["name", "description"]
    }
  }
}

3.4、range query

範圍查詢

GET /book/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 80,
        "lte": 90
      }
    }
  }
}

3.5、term query

字段爲keyword時，存儲和搜索都不分詞

GET /book/_search
{
  "query": {
    "term": {
      "description": "java程序員"
    }
  }
}

3.6、terms query

查詢某個字段裏含有多個關鍵詞的文檔

GET /book/_search
{
    "query": { "terms": { "tags": [ "search", "full_text", "dev" ] }}
}

3.7、exist query

查詢有某些字段值的文檔

GET /_search
{
    "query": {
        "exists": {
            "field": "join_date"
        }
    }
}

3. 8、Fuzzy query

返回包含與搜索詞類似的詞的文檔，該詞由Levenshtein編輯距離度量。

包括以下幾種情況：

更改角色（box→fox）
刪除字符（aple→apple）
插入字符（sick→sic）
調換兩個相鄰字符（ACT→CAT）

GET /book/_search
{
    "query": {
        "fuzzy": {
            "description": {
                "value": "jave"
            }
        }
    }
}

3.9、ids

GET /book/_search
{
    "query": {
        "ids" : {
            "values" : ["1", "4", "100"]
        }
    }
}

3.10、prefix 前綴查詢

GET /book/_search
{
    "query": {
        "prefix": {
            "description": {
                "value": "spring"
            }
        }
    }
}

3.11、regexp query 正則查詢

GET /book/_search
{
    "query": {
        "regexp": {
            "description": {
                "value": "j.*a",
                "flags" : "ALL",
                "max_determinized_states": 10000,
                "rewrite": "constant_score"
            }
        }
    }
}

4． Filter

4.1 filter與query示例

需求：用戶查詢description中有"java程序員"，並且價格大於80小於90的數據。

GET /book/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "description": "java程序員"
          }
        },
        {
          "range": {
            "price": {
              "gte": 80,
              "lte": 90
            }
          }
        }
      ]
    }
  }
}

使用filter:

GET /book/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "description": "java程序員"
          }
        }
      ],
      "filter": {
        "range": {
          "price": {
            "gte": 80,
             "lte": 90
          }
        }
      }
    }
  }
}

4.2 filter與query對比

filter，僅僅只是按照搜索條件過濾出需要的數據而已，不計算任何相關度分數，對相關度沒有任何影響。

query，會去計算每個document相對於搜索條件的相關度，並按照相關度進行排序。

應用場景：

一般來說，如果你是在進行搜索，需要將最匹配搜索條件的數據先返回，那麼用query 如果你只是要根據一些條件篩選出一部分數據，不關注其排序，那麼用filter

4.3 filter與query性能

filter，不需要計算相關度分數，不需要按照相關度分數進行排序，同時還有內置的自動cache最常使用filter的數據

query，相反，要計算相關度分數，按照分數進行排序，而且無法cache結果

5．定位錯誤語法

驗證錯誤語句：

GET /book/_validate/query?explain
{
  "query": {
    "mach": {
      "description": "java程序員"
    }
  }
}

{
  "valid" : false,
  "error" : "org.elasticsearch.common.ParsingException: no [query] registered for [mach]"
}

正確

GET /book/_validate/query?explain
{
  "query": {
    "match": {
      "description": "java程序員"
    }
  }
}

{
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "valid" : true,
  "explanations" : [
    {
      "index" : "book",
      "valid" : true,
      "explanation" : "description:java description:程序員"
    }
  ]
}

一般用在那種特別複雜龐大的搜索下，比如你一下子寫了上百行的搜索，這個時候可以先用validate api去驗證一下，搜索是否合法。

合法以後，explain就像mysql的執行計劃，可以看到搜索的目標等信息。

6．定製排序規則

6.1 默認排序規則

默認情況下，是按照_score降序排序的

然而，某些情況下，可能沒有用到_score，比如說filter

GET book/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "description": "java程序員"
          }
        }
      ]
    }
  }
}

當然，也可以是constant_score

6.2 定製排序規則

相當於sql中order by ?sort=sprice:desc

GET /book/_search 
{
  "query": {
    "constant_score": {
      "filter" : {
            "term" : {
                "studymodel" : "201001"
            }
        }
    }
  },
  "sort": [
    {
      "price": {
        "order": "asc"
      }
    }
  ]
}

7． Text字段排序問題

如果對一個text field進行排序，結果往往不準確，因爲分詞後是多個單詞，再排序就不是我們想要的結果了。

通常解決方案是，將一個text field建立兩次索引，一個分詞，用來進行搜索；一個不分詞，用來進行排序。

fielddate:true

PUT /website 
{
  "mappings": {
  "properties": {
    "title": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword"
        }        
      }      
    },
    "content": {
      "type": "text"
    },
    "post_date": {
      "type": "date"
    },
    "author_id": {
      "type": "long"
    }
  }
 }
}