Elasticsearch Index API & Aggregations API & Query DSL

這篇小菜給大家演示和講解一些Elasticsearch的API，如在工作中用到時，方便查閱。

一、Index API

創建索引庫

curl -XPUT 'http://127.0.0.1:9200/test_index/' -d '{
    "settings" : {
      "index" : {
      "number_of_shards" : 3,
      "number_of_replicas" : 1
      }
    },
    "mappings" : {
      "type_test_01" : {
        "properties" : {
          "field1" : { "type" : "string"},
          "field2" : { "type" : "string"}
        }
      },
      "type_test_02" : {
        "properties" : {
          "field1" : { "type" : "string"},
          "field2" : { "type" : "string"}
        }
      }
    }
}'

驗證索引庫是否存在

curl –XHEAD -i 'http://127.0.0.1:9200/test_index?pretty'

注: 這裏加上的?pretty參數，是爲了讓輸出的格式更好看。

查看索引庫的mapping信息

curl –XGET -i 'http://127.0.0.1:9200/test_index/_mapping?pretty'

驗證當前庫type爲article是否存在

curl -XHEAD -i 'http://127.0.0.1:9200/test_index/article'

查看test_index索引庫type爲type_test_01的mapping信息

curl –XGET -i 'http://127.0.0.1:9200/test_index/_mapping/type_test_01/?pretty'

測試索引分詞器

curl -XGET 'http://127.0.0.1:9200/_analyze?pretty' -d '
{
  "analyzer" : "standard",
  "text" : "this is a test"
}'

輸出索引庫的狀態信息

curl 'http://127.0.0.1:9200/test_index/_stats?pretty'

輸出索引庫的分片相關信息

curl -XGET 'http://127.0.0.1:9200/test_index/_segments?pretty'

刪除索引庫

curl -XDELETE http://127.0.0.1:9200/logstash-nginxacclog-2016.09.20/

二、Count API

簡易語法

curl -XGET 'http://elasticsearch_server:port/索引庫名稱/_type(當前索引類型,沒有可以不寫)/_count

用例:

1、統計 logstash-nginxacclog-2016.10.09 索引庫有多少條記錄

curl -XGET 'http://127.0.0.1:9200/logstash-nginxacclog-2016.10.09/_count'

2、統計 logstash-nginxacclog-2016.10.09 索引庫status爲200的有多少條記錄

curl -XGET 'http://127.0.0.1:9200/logstash-nginxacclog-2016.10.09/_count?q=status:200'

DSL 寫法

curl -XGET 'http://127.0.0.1:9200/logstash-nginxacclog-2016.10.09/_count' -d '
{ "query":
  { "term":{"status":"200"}}
}'

三、Aggregations API （數據分析和統計）

注: 聚合相關的API只能對數值、日期類型的字段做計算。

1、求平均數

業務場景: 統計訪問日誌中的平均響應時長

curl -XGET 'http://127.0.0.1:9200/logstash-nginxacclog-2016.10.09/_search?pretty' -d '{
"query" :
{ "match_all" : {} },

"aggs" : {
"avg_num" : { "avg" : { "field" : "responsetime" } }
},"size":0  # 這裏的 size:0 表示不輸出匹配到數據，只輸出聚合結果。
}'


{
  "took" : 598,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 32523067,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "avg_num" : {
      "value" : 0.0472613558675975
    }
  }
}

# 得到平均響應時長爲 0.0472613558675975 秒

2、求最大值

業務場景:獲取訪問日誌中最長的響應時間

curl -XGET 'http://127.0.0.1:9200/logstash-nginxacclog-2016.10.09/_search?pretty' -d '{
"query" :
{ "match_all" : {} },

"aggs" : {
"max_num" : { "max" : { "field" : "responsetime" } }
},"size":0
}'


{
  "took" : 29813,
  "timed_out" : false,
  "_shards" : {
    "total" : 431,
    "successful" : 431,
    "failed" : 0
  },
  "hits" : {
    "total" : 476952009,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "max_num" : {
      "value" : 65.576
    }
  }
}

# 得到最大響應時長爲 65.576 秒

3、求最小值

業務場景: 獲取訪問日誌中最快的響應時間

curl -XGET 'http://127.0.0.1:9200/logstash-nginxacclog-2016.10.09/_search?pretty' -d '{
"query" :
{ "match_all" : {} },

"aggs" : {
"min_num" : { "min" : { "field" : "responsetime" } }
},"size":0
}'


{
  "took" : 2145,
  "timed_out" : false,
  "_shards" : {
    "total" : 431,
    "successful" : 431,
    "failed" : 0
  },
  "hits" : {
    "total" : 477156773,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "min_num" : {
      "value" : 0.0
    }
  }
}

# 看來最快的響應時間竟然是0，筆者通過查詢日誌發現，原來這些響應時間爲0的請求是被nginx拒絕掉的。

4、數值求和

業務場景: 統計一天的訪問日誌中爲響應請求總共輸出了多少流量。

curl -XGET 'http://127.0.0.1:9200/logstash-nginxacclog-2016.10.09/_search?pretty' -d '{
"query" :
{ "match_all" : {} },

"aggs" : {
"sim_num" : { "sum" : { "field" : "size" } }
},"size":0
}'

{
  "took" : 1226,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 32523067,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "sim_num" : {
      "value" : 6.9285945505E10
    }
  }
}

# 這個數有點大，後面的E10 表示 6.9285945505 X 10^10 ，筆者算了下，大概 70GB 流量。

5、獲取常用的數據統計指標

其中包括（最大值、最小值、平均值、求和、個數）

業務場景: 求訪問日誌中的 responsetime （最大值、最小值、平均值、求和、個數）

curl -XGET 'http://127.0.0.1:9200/logstash-nginxacclog-2016.10.09/_search?pretty' -d '{
"query" :
{ "match_all" : {} },

"aggs" : {
"like_stats" : { "stats" : { "field" : "responsetime" } }
},"size":0
}'


{
  "took" : 2868,
  "timed_out" : false,
  "_shards" : {
    "total" : 431,
    "successful" : 431,
    "failed" : 0
  },
  "hits" : {
    "total" : 477797577,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "like_stats" : {
      "count" : 469345191,
      "min" : 0.0,
      "max" : 65.576,
      "avg" : 0.06088492952649428,
      "sum" : 2.8576048877634E7
    }
  }
}

這個是上面統計方式的增強版，新增了幾個統計數據

curl -XGET 'http://127.0.0.1:9200/logstash-nginxacclog-2016.10.09/_search?pretty' -d '{
"query" :
{ "match_all" : {} },

"aggs" : {
"like_stats" : { "extended_stats" : { "field" : "responsetime" } }
},"size":0
}'


{
  "took" : 2830,
  "timed_out" : false,
  "_shards" : {
    "total" : 431,
    "successful" : 431,
    "failed" : 0
  },
  "hits" : {
    "total" : 478145456,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "like_stats" : {
      "count" : 469687072,
      "min" : 0.0,
      "max" : 65.576,
      "avg" : 0.06087745173159307,
      "sum" : 2.859335205463328E7,
      "sum_of_squares" : 1.3162790273264633E7,
      "variance" : 0.02431853151732958,
      "std_deviation" : 0.1559440012226491,
      "std_deviation_bounds" : {
        "upper" : 0.3727654541768913,
        "lower" : -0.2510105507137051
      }
    }
  }
}

# 其中新增的三個返回結果分別是:

# sum_of_squares  平方和

# variance  方差

# std_deviation  標準差

6、統計數據在某個區間所佔的百分比

業務場景: 求出訪問日誌中響應時間的各個區間，所佔的百分比

curl -XGET 'http://127.0.0.1:9200/logstash-nginxacclog-2016.10.09/_search?pretty' -d '{
"query" :
{ "match_all" : {} },

"aggs" : {
"outlier" : { "percentiles" : { "field" : "responsetime" } }
},"size":0
}'

{
  "took" : 60737,
  "timed_out" : false,
  "_shards" : {
    "total" : 431,
    "successful" : 431,
    "failed" : 0
  },
  "hits" : {
    "total" : 478287997,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "outlier" : {
      "values" : {
        "1.0" : 0.0,
        "5.0" : 0.0,
        "25.0" : 0.02,
        "50.0" : 0.038999979789136247,
        "75.0" : 0.06247223731250421,
        "95.0" : 0.16479760590682113,
        "99.0" : 0.520510492464275
      }
    }
  }
}

# values 對應的列爲所佔的百分比，右邊則是對應的數據值。表示:

# 響應時間小於或等於0的請求佔 1%

# 響應時間小於或等於0的請求佔 5%

# 響應時間小於或等於0.02的請求佔 25%

# 響應時間小於或等於0.038999979789136247的請求佔 50%

# 響應時間小於或等於0.06247223731250421的請求佔 75%

# 響應時間小於或等於0.16479760590682113的請求佔 95%

# 響應時間小於或等於0.520510492464275的請求佔 99%


# 還可以通過 percents 參數，自定義一些百分比區間，如 10%,30%,60%,90% 等。

# 注: 經筆者測試，這個方法只能對數值類型的字段進行統計，無法操作字符串類型的字段。


curl -XGET 'http://127.0.0.1:9200/logstash-nginxacclog-2016.10.09/_search?pretty' -d '{
"query" :
{ "match_all" : {} },

"aggs" : {
"outlier" : { "percentiles" : {
                                  "field" : "status",
                                  "percents":[5, 10, 20, 50, 99.9]
                                  }
              }
},"size":0
}'

7、求指定字段數值在各個區間所佔的百分比

業務場景：求響應時間 0, 0.01, 0.1, 0.2 在整個日誌文件中，分別所佔的百分比。

curl -XGET 'http://127.0.0.1:9200/logstash-nginxacclog-2016.10.09/_search?pretty' -d '{
"query" :
{ "match_all" : {} },

"aggs" : {
"outlier" : { "percentile_ranks" : {
"field" : "responsetime",
"values":[0, 0.01, 0.1, 0.2]
}
}
},"size":0
}'

{
  "took" : 6950,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 32523067,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "outlier" : {
      "values" : {
        "0.0" : 8.79897648675993,
        "0.01" : 17.90331319256336,
        "0.1" : 91.18297638776373,
        "0.2" : 98.22564774611764
      }
    }
  }
}

# 響應時間小於或等於0的請求佔 8.7%

# 響應時間小於或等於0.01的請求佔 17.9%

# 響應時間小於或等於0.1的請求佔 91.1%

# 響應時間小於或等於0.2的請求佔 98.2%

8、求該數值範圍內有多少文檔匹配

業務場景: 求訪問日誌中的響應時間爲，0 ~ 0.02、0.02 ~ 0.1 、大於 0.1 這三個數值區間內，各有多少文檔匹配。

"ranges":[{"to": 0.02}, {"from":0.02,"to":0.1},{"from":0.1}]

{"to": 0.02} 求響應時間 0 ~ 0.02 區間內的匹配文檔數

{"from":0.02,"to":0.1} 求響應時間 0.02 ~ 0.1 區間內匹配的文檔數

{"from":0.1} 求響應時間大於 0.1 匹配的文檔數

curl -XGET 'http://127.0.0.1:9200/logstash-nginxacclog-2016.10.09/_search?pretty' -d '{
"query" :
{ "match_all" : {} },

"aggs" : {
"range_info" : { "range" : {
"field" : "responsetime",
"ranges":[{"to": 0.02}, {"from":0.02,"to":0.1},{"from":0.1}]
}
}
},"size":0
}'


{
  "took" : 474,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 32523067,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "range_info" : {
      "buckets" : [ {
        "key" : "*-0.02",
        "to" : 0.02,
        "to_as_string" : "0.02",
        "doc_count" : 9093600
      }, {
        "key" : "0.02-0.1",
        "from" : 0.02,
        "from_as_string" : "0.02",
        "to" : 0.1,
        "to_as_string" : "0.1",
        "doc_count" : 20547128
      }, {
        "key" : "0.1-*",
        "from" : 0.1,
        "from_as_string" : "0.1",
        "doc_count" : 2879418
      } ]
    }
  }
}

  "aggregations" : {
    "range_info" : {
      "buckets" : [ {
        "key" : "*-0.02",
        "to" : 0.02,
        "to_as_string" : "0.02",
        "doc_count" : 9093600
      } # 響應時間在 0 ~ 0.02 的文檔數是 9093600

      , {
        "key" : "0.02-0.1",
        "from" : 0.02,
        "from_as_string" : "0.02",
        "to" : 0.1,
        "to_as_string" : "0.1",
        "doc_count" : 20547128
      } # 響應時間在 0.02 ~ 0.1 的文檔數是 20547128

      , {
        "key" : "0.1-*",
        "from" : 0.1,
        "from_as_string" : "0.1",
        "doc_count" : 2879418
      } # 響應時間在大於 0.1 的文檔數是 2879418
       ]
    }
  }

9、求時間範圍內有多少文檔匹配

業務場景：求訪問日誌中，在 2016-10-09T01:00:00 之前的文檔有多少。和在 2016-10-09T02:00:00 之後的文檔有多少。

curl -XGET 'http://127.0.0.1:9200/logstash-nginxacclog-2016.10.09/_search?pretty' -d '{
"query" :
{ "match_all" : {} },

"aggs" : {
"range_info" : { "date_range" : {
"field" : "@timestamp",
"ranges":[{"to": "2016-10-09T01:00:00"},{"from":"2016-10-09T02:00:00"}]
}
}
},"size":0
}'


{
  "took" : 432,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 32523067,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "range_info" : {
      "buckets" : [ {
        "key" : "*-2016-10-09T01:00:00.000Z",
        "to" : 1.4759748E12,
        "to_as_string" : "2016-10-09T01:00:00.000Z",
        "doc_count" : 613460
      }, # 在 2016-10-09T01:00:00 之前的文檔數有 613460

      {
        "key" : "2016-10-09T02:00:00.000Z-*",
        "from" : 1.4759784E12,
        "from_as_string" : "2016-10-09T02:00:00.000Z",
        "doc_count" : 31264881
      } # 在 2016-10-09T02:00:00 之後的文檔數有 31264881
      ]
    }
  }
}

10、聚合結果不依賴於查詢結果集 "global":{}

curl -XGET 'http://127.0.0.1:9200/logstash-nginxacclog-2016.10.09/_search?pretty' -d '{
"query" :
{ "term" : { "status" : "200" } },

"aggs" :{
"all_articles":{
"global":{},
"aggs":{
"sum_like": {"sum":{"field": "responsetime"}}
}
}
},"size":0
}'


{
  "took" : 1519,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 26686196,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "all_articles" : {
      "doc_count" : 32523067,
      "sum_like" : {
        "value" : 1536946.1929722272
      }
    }
  }
}

# 可以看到查詢結果集hits total部分才匹配到 26686196 條記錄。 而聚合的文檔數則是 32523067 多於查詢結果匹配到的文檔。
# 聚合結果爲 1536946.1929722272


# 我們再看看沒有引用 "global":{} 參數的方式

curl -XGET 'http://127.0.0.1:9200/logstash-nginxacclog-2016.10.09/_search?pretty' -d '{
"query" :
{ "term" : { "status" : "200" } },
"aggs":{
"sum_like": {"sum":{"field": "responsetime"}}
                                             
},"size":0                                   
}'


{
  "took" : 1326,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 26686196,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "sum_like" : {
      "value" : 1526710.3929916811
    }
  }
}

# 聚合結果小於上訴的結果。 表示這次的聚合的值，是依賴於檢索匹配到的文檔。

11、分組聚合

用於統計指定字段在自定義的固定增長區間下，每個增長後的值，所匹配的文檔數量。

curl -XGET 'http://127.0.0.1:9200/logstash-nginxacclog-2016.10.09/_search?pretty' -d '{
"aggs" :{
"like_histogram":{
"histogram":{"field": "status", "interval": 200,
        "min_doc_count": 1}
}
},"size":0 
}'

# 對 status 字段操作，增長區間爲 200 ，爲了避免有的區間匹配爲0所導致空數據，所以這裏指定最小文檔數爲 1 "histogram":{"field": "status", "interval": 200, "min_doc_count": 1}

12、分組聚合-基於時間做分組

"date_histogram":{"field": "@timestamp", "interval": "1d","format": "yyyy-MM-dd",}

"field": "@timestamp" 指定記錄時間的字段

"interval": "1d" 分組區間爲每天. 1M 每月、1H 每小時、1m 每分鐘

"format": "yyyy-MM-dd" 指定時間的輸出格式

統計每天產生的日誌數量

curl -XGET 'http://127.0.0.1:9200/logstash-nginxacclog-*/_search?pretty' -d '{
"aggs" :{
"date_histogram_info":{
"date_histogram":{"field": "@timestamp", "interval": "1d","format": "yyyy-MM-dd",
"min_doc_count": 1}
}
}
}'


"aggregations" : {
    "date_histogram_info" : {
      "buckets" : [ {
        "key_as_string" : "2016-09-27",
        "key" : 1474934400000,
        "doc_count" : 6895375
      }, {
        "key_as_string" : "2016-09-28",
        "key" : 1475020800000,
        "doc_count" : 1255775
      }, {
        "key_as_string" : "2016-09-29",
        "key" : 1475107200000,
        "doc_count" : 38512862
      }, {
        "key_as_string" : "2016-09-30",
        "key" : 1475193600000,
        "doc_count" : 35314225
      }, {
        "key_as_string" : "2016-10-01",
        "key" : 1475280000000,
        "doc_count" : 45358162
      }, {
        "key_as_string" : "2016-10-02",
        "key" : 1475366400000,
        "doc_count" : 42058056
      }, {
        "key_as_string" : "2016-10-03",
        "key" : 1475452800000,
        "doc_count" : 39945587
      }, {
        "key_as_string" : "2016-10-04",
        "key" : 1475539200000,
        "doc_count" : 39509128
      }, {
        "key_as_string" : "2016-10-05",
        "key" : 1475625600000,
        "doc_count" : 40506342
      }, {
        "key_as_string" : "2016-10-06",
        "key" : 1475712000000,
        "doc_count" : 43303499
      }, {
        "key_as_string" : "2016-10-07",
        "key" : 1475798400000,
        "doc_count" : 44234780
      }, {
        "key_as_string" : "2016-10-08",
        "key" : 1475884800000,
        "doc_count" : 32880600
      }, {
        "key_as_string" : "2016-10-09",
        "key" : 1475971200000,
        "doc_count" : 32523067
      }, {
        "key_as_string" : "2016-10-10",
        "key" : 1476057600000,
        "doc_count" : 31454044
      }, {
        "key_as_string" : "2016-10-11",
        "key" : 1476144000000,
        "doc_count" : 2018401
      } ]
    }
  }
}


# 基於小時做分組

# 統計當天每小時產生的日誌數量

curl -XGET 'http://127.0.0.1:9200/logstash-nginxacclog-2016.10.09/_search?pretty' -d '{
"aggs" :{
"date_histogram_info":{
"date_histogram":{"field": "@timestamp", "interval": "1H","format": "yyyy-MM-dd-H",
"min_doc_count": 1}
}
},"size":0
}'


{
  "took" : 530,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 32523067,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "date_histogram_info" : {
      "buckets" : [ {
        "key_as_string" : "2016-10-09-0",
        "key" : 1475971200000,
        "doc_count" : 613460
      }, {
        "key_as_string" : "2016-10-09-1",
        "key" : 1475974800000,
        "doc_count" : 644726
      }, {
        "key_as_string" : "2016-10-09-2",
        "key" : 1475978400000,
        "doc_count" : 687196
      }, {
        "key_as_string" : "2016-10-09-3",
        "key" : 1475982000000,
        "doc_count" : 730831
      }, {
        "key_as_string" : "2016-10-09-4",
        "key" : 1475985600000,
        "doc_count" : 1460320
      }, {
        "key_as_string" : "2016-10-09-5",
        "key" : 1475989200000,
        "doc_count" : 1469098
      }, {
        "key_as_string" : "2016-10-09-6",
        "key" : 1475992800000,
        "doc_count" : 1004399
      }, {
        "key_as_string" : "2016-10-09-7",
        "key" : 1475996400000,
        "doc_count" : 962843
      }, {
        "key_as_string" : "2016-10-09-8",
        "key" : 1476000000000,
        "doc_count" : 1232560
      }, {
        "key_as_string" : "2016-10-09-9",
        "key" : 1476003600000,
        "doc_count" : 1809741
      }, {
        "key_as_string" : "2016-10-09-10",
        "key" : 1476007200000,
        "doc_count" : 2802804
      }, {
        "key_as_string" : "2016-10-09-11",
        "key" : 1476010800000,
        "doc_count" : 3941192
      }, {
        "key_as_string" : "2016-10-09-12",
        "key" : 1476014400000,
        "doc_count" : 4631032
      }, {
        "key_as_string" : "2016-10-09-13",
        "key" : 1476018000000,
        "doc_count" : 3651968
      }, {
        "key_as_string" : "2016-10-09-14",
        "key" : 1476021600000,
        "doc_count" : 2079933
      }, {
        "key_as_string" : "2016-10-09-15",
        "key" : 1476025200000,
        "doc_count" : 973578
      }, {
        "key_as_string" : "2016-10-09-16",
        "key" : 1476028800000,
        "doc_count" : 517435
      }, {
        "key_as_string" : "2016-10-09-17",
        "key" : 1476032400000,
        "doc_count" : 388382
      }, {
        "key_as_string" : "2016-10-09-18",
        "key" : 1476036000000,
        "doc_count" : 361296
      }, {
        "key_as_string" : "2016-10-09-19",
        "key" : 1476039600000,
        "doc_count" : 345926
      }, {
        "key_as_string" : "2016-10-09-20",
        "key" : 1476043200000,
        "doc_count" : 342214
      }, {
        "key_as_string" : "2016-10-09-21",
        "key" : 1476046800000,
        "doc_count" : 360897
      }, {
        "key_as_string" : "2016-10-09-22",
        "key" : 1476050400000,
        "doc_count" : 714336
      }, {
        "key_as_string" : "2016-10-09-23",
        "key" : 1476054000000,
        "doc_count" : 796900
      } ]
    }
  }
}

# 可以看到當天 0 ~ 23 點每個時段產生的日誌數量。 通過這個數據，我們是不是很容易就可以得到，業務的高峯時段呢？

四、Query DSL

curl -XGET 'http://127.0.0.1:9200/search_test/article/_count?pretty' -d '{
  "query" :
  { "term" : { "title" : "article" } }
}'

在 Query DSL 中有兩種子句:

1、Leaf query clauses （簡單葉子節點查詢子句）

2、Compound query clauses （複合查詢子句）

Query context & Filter context

在 Query context 查詢上下文中，關注的是當前文檔和查詢子句的匹配度。而在 Filter context 中關注的是當前文檔是否匹配查詢子句，不計算相似度分值。

{"match_all":{}} 匹配全部

{"match_all":{"boost":{"boost":1.2}}} 手動指定_score返回值

Term level queries

返回文檔:在user字段的倒排索引中包含"kitty"的文檔（精確匹配）

{
  "term":{"user":"kitty"}
}

用例:

curl -XGET 'http://169.254.135.217:9200/search_test/article/_count?pretty' -d '{
  "query" :
  { "term" : { "user" : "kitty" } }
}'

Term level Range query （範圍查詢）

用例:

curl -XGET 'http://127.0.0.1:9200/logstash-nginxacclog-2016.10.09/_search?pretty' -d '
{
"query" :
        { "range" :{
                      "status" :{ "gt" : 200, "lte" : 500, "boost" : 2.0 }
                    }

        }
,"size":1
}'

# 這裏的"size":1 表示只返回一條數據，類似SQL裏面的limit。 最大指定10000

# 如果要返回更多的數據，則可以加上?scroll參數，如/_search?scroll=1m ，這裏的1m 表示1分鐘。

# 詳細請參考: https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#time-units

Term level Exists query (存在查詢)

用例:

curl -XGET 'http://127.0.0.1:9200/logstash-nginxacclog-2016.10.09/_search?pretty' -d '
{
"query":
    { "exists":{ "field":"status" } 
}
}'

Term level Prefix and Wildcard

前綴查詢用例:

curl -XGET 'http://127.0.0.1:9200/logstash-nginxacclog-2016.10.09/_search?pretty' -d '{
  "query" :{
  "prefix" :{"agent": "io" }
  }
}'

通配符查詢用例:

curl -XGET 'http://127.0.0.1:9200/logstash-nginxacclog-2016.10.09/_search?pretty' -d '{
"query" :{
"wildcard" :{"agent": "io*" }
}
}'

Compound query : Bool Query

Bool Query 常用的三個分支:

1、Must 表示必須包含的字符串

2、Must not 表示需要過濾掉的條件

3、should 類似 or 條件，"minimum_should_match" 表示最少要匹配幾個條件才通過。

假設我在should 裏面定義了三個條件，並且把minimum_should_match 設置爲 2 ，表示我這三個條件中，只要要有兩個條件能被匹配才能通過。如果minimum_should_match 改爲 3 表示這三個條件需要同時匹配才通過。

"should" : [ { "term" : { "body" : "article" } }, { "term" : { "body" : "document" } }, { "term" : { "body" : "tuchao" } } ], "minimum_should_match" : 3,

用例:

在這裏可以看到，我給should 加了一個它決定不可能匹配到的條件，body:'tuchao' ，因爲文檔裏面根本就沒有這個字符串，然後我把 minimum_should_match 設置爲 2 . 讓它最小匹配2個條件就可以。果然查詢到了

接下來我把minimum_should_match 改爲 3 讓它最少要匹配三個條件，它顯然做不到，就查不出來了

Request body search : Sort

Elasticsearch Index API & Aggregations API & Query DSL

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

Python 安裝庫指令大全

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

【2024-05-21】以茶會友

python fcntl 文件鎖

python3.7 ImportError: No module named _ssl 解決方法

真值表

Supervisor Event Listener 任務監控與告警

npm使用國內鏡像源

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結