Elastic Search 學習筆記

Reference

6.4最新版英文：https://www.elastic.co/guide/...
中文：https://www.elastic.co/guide/...
5.4中文：http://cwiki.apachecn.org/pag...

Defination

DSL（Domain Specific Language）:Elasticsearch 定義的查詢語言

ES字段類型：https://blog.csdn.net/chengyu...

API

Stats API：獲取索引統計信息（http://cwiki.apachecn.org/pag...）

GET es-index_*/_stats
{
  "_shards": {
    "total": 622,
    "successful": 622,
    "failed": 0
  },
 //返回的統計信息是索引級的聚合結果，具有primaries和total的聚合結果。其中primaries只是主分片的值，total是主分片和副本分片的累積值。
  "_all": {
    "primaries": {
      "docs": {  //文檔和已刪除文檔（尚未合併的文檔）的數量。注意，此值受刷新索引的影響。
        "count": 2932357017,
        "deleted": 86610
      },
      "store": { //索引的大小。
        "size_in_bytes": 2573317479532,
      }, 
      "indexing": {}, //索引統計信息，可以用逗號分隔的type列表組合，以提供文檔級統計信息。
      "get": {}, // get api調用統計
      "search": {}, // search api 調用統計
     },
  
    "total": {
    }
  }
}

Search API（兩種形式）

using a simple query string as a parameter

GET es-index_*/_search?q=eventid:OMGH5PageView

using a request body

GET es-index_*/_search
{
  "query": {
    "term": {
      "eventid": {
        "value": "OMGH5PageView"
      }
    }
  }
}

Query DSL

Leaf Query Clause: 葉查詢子句
Compound Query Clause: 複合查詢子句

DSL查詢上下文

query context
在查詢上下文中，回答的問題是：How well does this document match this query clause?
除了判斷一條數據記錄(document)是否匹配查詢條件以外，還要計算其相對於其他記錄的匹配程度，通過_score進行記錄。
filter context**
在查詢上下文中，回答的問題是：Does this document match this query clause?
僅判斷document是否匹配，不計算_score
一般用來過濾結構化數據,
e.g. timestamp是否在2017-2018範圍內，status是否是published
頻繁使用的過濾器會被Elasticsearch自動緩存，可提高性能

** 查詢時，可先使用filter過濾操作過濾數據，然後使用query查詢匹配數據

查詢結果字段過濾

fields：字段過濾
script_fields：可對原始數據進行計算

"fields": ["eh"],  //僅返回eh字段
"script_fields": {
   "test": {
      "script": "doc['eh'].value*2"
   }
} // 返回eh字段值*2的數據並命名爲test字段

查詢過濾：query

bool 組合過濾器

{
   "bool" : {
      "must" :     {}, // 所有的語句都必須匹配，相當於SQL中的and
      "must_not" : {}, // 所有的語句都不能匹配，相當於SQL中的not
      "should" :   {}, // 至少有一個語句要匹配，相當於SQL中的OR
      "filter" :   {}, // 
   }
}

filtered過濾器

{
    "filtered": {
          "query": {},
          "filter": {} // 在filter中進行數據過濾，然後再去query中進行匹配
    }
}

match和term

match（模糊匹配）：先檢查字段類型是否是analyzed，如果是，則先分詞，再去去匹配token；如果不是，則直接去匹配token。
term（精確匹配）：直接去匹配token。

terms: 多項查詢

{ terms : { user: ['tony', 'kitty' ] } }

range範圍過濾

對於date類型字段的範圍選擇可以使用 Date Math

{
     "range" : {
          "born" : {
              "gte": "01/01/2012",
              "lte": "2013",
              "format": "dd/MM/yyyy||yyyy" 
           }
       }
 }


{
     "range" : {
          "timestamp" : {
              "gte": "now-6d/d", // Date Math
              "lte": "now/d", // Date Math
              "time_zone": "+08:00"  // 時區
           }
       }
 }

exists 該條記錄是否存在某個字段

{
     "exists" : { "field" : "user" }
}

wildcard: 通配符查詢（對分詞進行匹配查詢）

Note that this query can be slow, as it needs to iterate over many terms. In order to prevent extremely slow wildcard queries, a wildcard term should not start with one of the wildcards * or ?
wildcard查詢性能較差，儘量避免使用*或？開頭來進行wildcard匹配

prefix: 前綴查詢
regexp：正則表達式查詢

Tips

value帶-的特殊處理

value帶了-，則默認會被切詞，導致搜索結果不準確。解決辦法之一就是在字段那裏加個.raw

term: {status:'pre-active'} => term: {status.raw: 'pre-active'}

sort

GET es-index_*/_search
{
  "fields" : ["eventid", "logtime"],
  "query": {
    "term": {
      "eventid": {
        "value": "OMGH5PageView"
      }
    }
  },
  "sort": [
    {
      "logtime": {
        "order": "asc"
      }
    }
  ]
}

聚合aggregation

date_histogram

（和 histogram 一樣）默認只會返回文檔數目非零的 buckets。即使 buckets
中沒有文檔我們也想返回。可以通過設置兩個額外參數來實現這種效果：

"min_doc_count" : 0,  // 這個參數強制返回空 buckets。
"extended_bounds" : {  // 強制返回整年
    "min" : "2014-01-01",
    "max" : "2014-12-31"
}

查詢返回結果參數

took: 查詢返回的時間（單位：毫秒）
time_out: 查詢是否超時
_shards: 描述查詢分片的信息，包括：查詢了多少分片，成功的分片數量，失敗的分片數量等
hits：搜索的結果
total: 滿足查詢條件的文檔數
max_score:
hits: 滿足條件的文檔
_score: 文檔的匹配程度

Elastic Search 學習筆記

Reference

Defination

API

Query DSL

Tips

sort

聚合aggregation

查詢返回結果參數

AST 抽象語法樹學習

mongodb數組字段prefix匹配返回

Elastic Search 學習筆記

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結