Elasticsearch文檔表現及服務API操作

ElasticSearch是一個基於Lucene的搜索服務器,它提供了一個分佈式多用戶能力的全文搜索引擎,基於RESTful web接口。Elasticsearch是用Java語言開發的,並作爲Apache許可條款下的開放源碼發佈,是一種流行的企業級搜索引擎。ElasticSearch用於雲計算中,能夠達到實時搜索,穩定,可靠,快速,安裝使用方便。

1.Elasticsearch中的文檔表現

ElasticSearch是面向文檔(document oriented)的,這意味着它可以存儲整個對象或文檔(document)。然而它不僅僅是存儲,還會索引(index)每個文檔的內容使之可以被快速搜索。在ElasticSearch中,你可以對文檔(而非成行成列的數據)進行索引、搜索、排序、過濾,集合及數據分析。

ElasticSearch使用 JSON作爲文檔序列化格式。JSON現在已經被大多語言所支持,而且已經成爲NoSQL數據領域的標準格式。

ElasticSearch的一個文檔不僅包含文檔信息,還包含元數據--有關文檔的信息。元數據的三大元素分別是:

_index:索引庫,類似於關係型數據庫裏的“數據庫”,它是我們存儲和索引關聯數據的地方。

_type:類型,類似於關係型數據庫中的表。可以是大寫或小寫,不能包含下劃線或逗號。

_id:與_index和_type組合時,就可以在ELasticsearch中唯一標識(類似於主鍵)一個文檔。當創建一個文檔,你可以自定義_id,也可以讓Elasticsearch自動生成。

另外,元數據還包括以下信息:

_uid:文檔唯一標識(_type#_id)

_source:文檔原始數據

_all:所有字段的連接字符串

2.Elasticsearch中的服務URL

ElasticSearch中常用的的各種服務的URL地址,如下表所示:

功能

URL

請求方式

說明

集羣相關

/_cat/health?v

GET

查看集羣健康狀態

/_cat/nodes?v

GET

查看節點健康狀態

/_cat/indices?v

GET

查看集羣所有索引

/_cluster/nodes

GET

獲得集羣中所有節點和信息

/_cluster/health

GET

查看集羣健康狀態

/_cluster/state

GET

獲得集羣裏的所有信息(集羣信息、節點信息、mapping信息等)

節點相關

/_nodes/process

GET

查看file descriptor的相關信息

/_nodes/process/stats

GET

統計節點的資源信息(內存、CPU等)

/_nodes/jvm

GET

獲得各節點的虛擬機統計和配置信息

/_nodes/jvm/stats

GET

更加詳細的虛擬機信息

/_nodes/http

GET

獲得各個節點的http信息(如ip地址)

/_nodes/http/stats

GET

獲得各個節點處理http請求的統計情況

/_nodes/thread_pool  

GET

獲得各種類型的線程池

/_nodes/thread_pool/stats

GET

獲得各種類型的線程池的統計信息

索引相關

/index/_search

GET,POST

索引查詢

/index

PUT,DELETE

創建或操作索引

/_aliases

GET,POST

獲取或操作索引的別名

/index/_settings

PUT

創建或操作設置(其中number_of_shards不可更改)

/index/_mapping

PUT

創建或操作mapping

/index/_open

POST

打開被關閉的索引

/index/_close

POST

關閉索引

/index/_refresh

POST

刷新索引(使新加內容對搜索可見)

/index/_flush

POST

刷新索引,將變動提交到lucene索引文件中並清空elasticsearch的transaction log

/index/_optimize

POST

優化segement,主要是對索引的segement進行合併

/index/_status

GET

獲得索引的狀態信息

/index/_segments

GET

獲得索引的segments的狀態信息

/index/type/id

PUT,POST,DELETE

操作指定文檔(增刪改查)

/index/type/id/_create

PUT

創建一個文檔,如果該文件已經存在,則返回失敗

/index/type/id/_update

POST

更新一個文件,如果改文件不存在,則返回失敗

/index/type/_bulk

PUT

批量提交數據更新

/index/type/_mget

GTE

批量獲取指定_id的文檔信息

/index/_explain

GET

不執行實際搜索,而返回解釋信息

/index/_analyze

GET

不執行實際搜索,根據輸入的參數進行文本分析

3.ElasticSearch的URL操作

3.1 查看集羣信息

3.1.1 查看集羣健康狀態

GET _cat/health?v

Response:

epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1565253576 08:39:36  my-es.cluster green           1         1      2   2    0    0        0             0                  -                100.0%

3.1.2 查看節點健康狀態

GET _cat/nodes?v

Response:

ip            heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.1.199           36          63   1    0.00    0.14     0.12 mdi       *      node-1

3.1.3 查看集羣所有索引

GET _cat/indices?v

Response:

health status index                uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .kibana_task_manager aP_xUt7lQD2RdQDuT5ynbw   1   0          2            0     12.5kb         12.5kb
green  open   .kibana_1            -axbsiTwRPmlIVniX-0hOA   1   0          4            1     19.8kb         19.8kb

3.1.4 查看集羣健康狀態

GET _cluster/health

Response:

{
  "cluster_name" : "my-es.cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 2,
  "active_shards" : 2,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

3.2 查看節點信息

3.2.1 查看file descriptor的相關信息

GET _nodes/process

Response:

{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "my-es.cluster",
  "nodes" : {
    "SQYgJvIZR7yqA3TzkURejA" : {
      "name" : "node-1",
      "transport_address" : "192.168.1.199:9300",
      "host" : "192.168.1.199",
      "ip" : "192.168.1.199",
      "version" : "6.8.2",
      "build_flavor" : "default",
      "build_type" : "tar",
      "build_hash" : "b506955",
      "roles" : [
        "master",
        "data",
        "ingest"
      ],
      "attributes" : {
        "ml.machine_memory" : "3954188288",
        "xpack.installed" : "true",
        "ml.max_open_jobs" : "20",
        "ml.enabled" : "true"
      },
      "process" : {
        "refresh_interval_in_millis" : 1000,
        "id" : 9496,
        "mlockall" : false
      }
    }
  }
}

3.2.2 獲得各個節點的http信息

GET _nodes/http

Response:

{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "my-es.cluster",
  "nodes" : {
    "SQYgJvIZR7yqA3TzkURejA" : {
      "name" : "node-1",
      "transport_address" : "192.168.1.199:9300",
      "host" : "192.168.1.199",
      "ip" : "192.168.1.199",
      "version" : "6.8.2",
      "build_flavor" : "default",
      "build_type" : "tar",
      "build_hash" : "b506955",
      "roles" : [
        "master",
        "data",
        "ingest"
      ],
      "attributes" : {
        "ml.machine_memory" : "3954188288",
        "xpack.installed" : "true",
        "ml.max_open_jobs" : "20",
        "ml.enabled" : "true"
      },
      "http" : {
        "bound_address" : [
          "192.168.1.199:9200"
        ],
        "publish_address" : "192.168.1.199:9200",
        "max_content_length_in_bytes" : 104857600
      }
    }
  }
}

3.3 索引的相關操作

3.3.1創建一個索引,並設置shards和replicas的個數

PUT user_index
{
  "settings": {
    "number_of_replicas": 1,
    "number_of_shards": 1
  }
}

Response:

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "user_index"
}

3.3.2 修改索引的replicas數,shards是不能修改

PUT user_index/_settings
{
  "settings": {
    "number_of_replicas": 2
  }
}

Response:

{
  "acknowledged" : true
}

3.3.3 刪除索引

DELETE user_index

Response:

{
  "acknowledged" : true
}

3.3.4 添加索引關聯別名

POST _aliases
{
  "actions":[{
      "add":{"index":"user_index","alias":"user_alias"}
  }]
}

也可以這樣寫:

PUT user_index/_aliases
{
  "actions":[{
      "add":{"alias":"user_alias"}
  }]
}

3.3.5 刪除索引關聯別名

POST _aliases
{
  "actions":[{
      "remove":{"index":"user_index","alias":"user_alias"}
  }]
}

也可以這樣寫:

PUT user_index/_aliases
{
  "actions":[{
      "remove":{"alias":"user_alias"}
  }]
}

3.3.6查看索引別名信息

GET _aliases

Response:

{
  ".kibana_1" : {
    "aliases" : {
      ".kibana" : { }
    }
  },
  ".kibana_task_manager" : {
    "aliases" : { }
  },
  "user_index" : {
    "aliases" : {
      "user_alias" : { }
    }
  }
}

3.3.7 創建索引mapping

PUT user_index/_mapping/user_type
{
  "dynamic":false,
  "properties": {
    "name":{
      "type": "text",
      "analyzer": "standard"
    },
    "age": {
      "type": "integer"
    },
    "join_date":{
      "type": "date"
    },
    "phone":{
      "type": "keyword"
    },
    "country":{
      "type": "keyword"
    },
    "province":{
      "type": "keyword"
    },
    "city":{
      "type": "keyword"
    },
    "remark":{
      "type": "text",
      "analyzer": "whitespace"
    }
  }
}

3.3.8 添加一個doc文檔,指定doc的_id。

如果沒有指定_id則Elasticsearch會自動創建一個_id的值

PUT user_index/user_type/1
{
  "name":"chen zhuangyuan",
  "age":27,
  "join_date":"2018-01-01",
  "phone":"18823450001",
  "country":"CN",
  "province":"guangdong",
  "city":"guangzhou",
  "remark":"I'm zhuangyuan,I like elasticsearch"
}

Response:

{
  "_index" : "user_index",
  "_type" : "user_type",
  "_id" : "2",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 20,
  "_primary_term" : 3
}

3.3.9 更新一個doc文檔的值,完全替換更新。

這個和新增一個doc一樣,如果doc存在則完全更新,doc不存在則創建。

PUT user_index/user_type/1
{
  "name":"chen zhuangyuan",
  "age":28
}

更新後_id=1的這個doc的信息如下,其他字段的值已經被清空了。

GET user_index/user_type/1

Response:

{
  "_index" : "user_index",
  "_type" : "user_type",
  "_id" : "1",
  "_version" : 8,
  "_seq_no" : 23,
  "_primary_term" : 3,
  "found" : true,
  "_source" : {
    "name" : "chen zhuangyuan",
    "age" : 28
  }
}

3.3.10 創建一個doc文檔,當且僅當文檔不存在時創建,存在是返回錯誤。

PUT user_index/user_type/3/_create
{
  "name":"zhang fulai",
  "age":28,
  "join_date":"2018-03-01",
  "phone":"18823450003",
  "country":"CN",
  "province":"guangdong",
  "city":"shenzhen",
  "remark":"I'm liaiguo,I like hadoop"
}

Response:

{
  "_index" : "user_index",
  "_type" : "user_type",
  "_id" : "3",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 25,
  "_primary_term" : 3
}

再次執行,則返回錯誤,創建失敗。

{
  "error": {
    "root_cause": [
      {
        "type": "version_conflict_engine_exception",
        "reason": "[user_type][3]: version conflict, document already exists (current version [1])",
        "index_uuid": "FT3HUBPESD6Yih2o_EddLw",
        "shard": "2",
        "index": "user_index"
      }
    ],
    "type": "version_conflict_engine_exception",
    "reason": "[user_type][3]: version conflict, document already exists (current version [1])",
    "index_uuid": "FT3HUBPESD6Yih2o_EddLw",
    "shard": "2",
    "index": "user_index"
  },
  "status": 409
}

3.3.11 更新一個doc文檔的指定字段的值。

如將_id=3的這個用戶age修改爲29。

POST user_index/user_type/3/_update
{
  "doc": {
     "age":29
  }
}

更新後_id=3的這個doc的信息如下:

GET user_index/user_type/3

Response:

{
  "_index" : "user_index",
  "_type" : "user_type",
  "_id" : "3",
  "_version" : 2,
  "_seq_no" : 26,
  "_primary_term" : 3,
  "found" : true,
  "_source" : {
    "name" : "zhang fulai",
    "age" : 29,
    "join_date" : "2018-03-01",
    "phone" : "18823450003",
    "country" : "CN",
    "province" : "guangdong",
    "city" : "shenzhen",
    "remark" : "I'm liaiguo,I like hadoop"
  }
}

3.3.12 批量提交_bulk。

一次提交增、刪、改的文檔信息,這種操作的效率減少了請求服務器的網絡次數,提高了執行的效率。

PUT user_index/user_type/_bulk
{"index":{"_id":"4"}}
{"name":"guo daming","age":26,"phone":"18823450004","country":"CN","province":"beijing","city":"beijingshi","remark":"I.m from beijing,I like java"}
{"index":{"_id":"5"}}
{"name":"zhao mingming","age":26,"phone":"18823450005","country":"CN","province":"shanghai","city":"shanghaishi","remark":"I.m from shanghai,I like spark"}
{"delete":{"_id":"1"}}
{"update":{"_id":"2"}}
{"doc":{"age":"25"}}

Response:

{
  "took" : 19,
  "errors" : false,
  "items" : [
    {
      "index" : {
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "4",
        "_version" : 7,
        "result" : "updated",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 33,
        "_primary_term" : 3,
        "status" : 200
      }
    },
    {
      "index" : {
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "5",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 34,
        "_primary_term" : 3,
        "status" : 201
      }
    },
    {
      "delete" : {
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "1",
        "_version" : 10,
        "result" : "deleted",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 27,
        "_primary_term" : 3,
        "status" : 200
      }
    },
    {
      "update" : {
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "2",
        "_version" : 8,
        "result" : "updated",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 35,
        "_primary_term" : 3,
        "status" : 200
      }
    }
  ]
}

另外,如果一次提交的_bulk的參數不在同一個index下,在每一個參數體裏面指定index和type就可以。

PUT _bulk
{"create":{"_index":"user_index","_type":"user_type","_id":"4"}}
{"name":"guo daming","age":26,"phone":"18823450004","country":"CN","province":"beijing","city":"beijingshi","remark":"I.m from beijing,I like java"}
{"create":{"_index":"user_index","_type":"user_type","_id":"5"}}
{"name":"zhao mingming","age":26,"phone":"18823450005","country":"CN","province":"shanghai","city":"shanghaishi","remark":"I.m from shanghai,I like spark"}
{"delete":{"_index":"user_index","_type":"user_type","_id":"1"}}
{"update":{"_index":"user_index","_type":"user_type","_id":"2"}}
{"doc":{"age":"25"}}

3.3.13 Elasticsearch的文檔查詢

3.3.13.1 根據文檔_id獲取。

URL地址格式:index/type/_id

GET user_index/user_type/1

3.3.13.2 批量查詢_mget。

URL地址格式:index/type/_mget。API參數是一個docs數組,數組的每個節點定義一個文檔的_index、_type、_id元數據。如果你只想檢索一個或幾個確定的字段,也可以定義一個_source。

GET _mget
{
  "docs":[{
   "_index":"user_index",
   "_type":"user_type",
   "_id":"1",
   "_source":["name","phone"]
  },
  {
  "_index":"user_index",
  "_type":"user_type",
  "_id":"1",
  "_source":["name","phone"]
  }]
}

Response:

{
  "docs" : [
    {
      "_index" : "user_index",
      "_type" : "user_type",
      "_id" : "1",
      "_version" : 1,
      "_seq_no" : 29,
      "_primary_term" : 3,
      "found" : true,
      "_source" : {
        "phone" : "18823450001",
        "name" : "chen zhuangyuan"
      }
    },
    {
      "_index" : "user_index",
      "_type" : "user_type",
      "_id" : "1",
      "_version" : 1,
      "_seq_no" : 29,
      "_primary_term" : 3,
      "found" : true,
      "_source" : {
        "phone" : "18823450001",
        "name" : "chen zhuangyuan"
      }
    }
  ]
}

另外,也可以使用簡單的參數查詢,通過數組指定文檔的_id。

GET user_index/user_type/_mget
{
  "ids":["1","2","3","4"]
}

3.3.13.3 空查詢,即查詢所有。

如果沒有指定查詢參數,則查詢索引下的所有文檔信息。

GET user_index/user_type/_search
GET user_index/_search
GET _search
GET user_index/user_type/_search
{
  "query": {
    "match_all": {}
  }
}

3.3.13.4 查詢字符串搜索。

如搜索索引中包含elasticsearch的所有文檔信息。

GET user_index/user_type/_search?q=elasticsearch

因爲用戶信息中只有remark字段包含了elasticsearch,因此這個查詢等價於:

GET user_index/user_type/_search?q=remark:elasticsearch

Response:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "1",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "chen zhuangyuan",
          "age" : 27,
          "join_date" : "2018-01-01",
          "phone" : "18823450001",
          "country" : "CN",
          "province" : "guangdong",
          "city" : "guangzhou",
          "remark" : "I'm zhuangyuan,I like elasticsearch"
        }
      }
    ]
  }
}

3.3.13.5 請求參數體搜索。

如搜索索引中包含elasticsearch的所有文檔信息。

GET user_index/user_type/_search
{
  "query": {
    "term": {
      "remark": "elasticsearch"
    }
  }
}

GET user_index/user_type/_search
{
 "query": {
   "terms": {
     "remark": [
       "hadoop",
       "spark"
     ]
   }
 }
}

3.3.13.6 分頁查詢From/Size。

通過from和size參數,可以實現分頁查詢。from表示從第幾條開始取,size 表示最多取多少條。from默認值是0,size默認值是10。

GET user_index/user_type/_search
{
 "query": {
   "match": {
     "remark":"spark"
   }
 },
 "from": 0,
 "size": 1
}

Response:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.49917623,
    "hits" : [
      {
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "2",
        "_score" : 0.49917623,
        "_source" : {
          "name" : "li aiguo",
          "age" : "25",
          "join_date" : "2018-02-01",
          "phone" : "18823450002",
          "country" : "CN",
          "province" : "guangdong",
          "city" : "shenzhen",
          "remark" : "I'm liaiguo,I like spark"
        }
      }
    ]
  }
}

3.3.13.7 Sort排序。

實現按照指定一個或多個字段進行排序。默認請求下,搜索結果會按照_score的得分進行排序。

GET user_index/user_type/_search
{
 "query": {
   "match": {
     "remark":"spark"
   }
 },
 "sort": [
   {
     "age": {
       "order": "asc"
     },
     "province": {
       "order": "asc"
     }
   }
 ]
}

Response:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "name" : "li aiguo",
          "age" : "25",
          "join_date" : "2018-02-01",
          "phone" : "18823450002",
          "country" : "CN",
          "province" : "guangdong",
          "city" : "shenzhen",
          "remark" : "I'm liaiguo,I like spark"
        },
        "sort" : [
          25,
          "guangdong"
        ]
      },
      {
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "5",
        "_score" : null,
        "_source" : {
          "name" : "zhao mingming",
          "age" : 26,
          "phone" : "18823450005",
          "country" : "CN",
          "province" : "shanghai",
          "city" : "shanghaishi",
          "remark" : "I.m from shanghai,I like spark"
        },
        "sort" : [
          26,
          "shanghai"
        ]
      }
    ]
  }
}

3.3.13.8 範圍查詢。

如搜索用戶索引中age大於等於27且小於等於30的所有用戶信息,並且結果按照年齡升序排序。

GET user_index/user_type/_search
{
  "query": {
    "range": {
      "age": {
        "gte": 27,
        "lte": 30
      }
    }
  }
  , "sort": [
    {
      "age": {
        "order": "asc"
      }
    }
  ]
}

Response:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "name" : "chen zhuangyuan",
          "age" : 27,
          "join_date" : "2018-01-01",
          "phone" : "18823450001",
          "country" : "CN",
          "province" : "guangdong",
          "city" : "guangzhou",
          "remark" : "I'm zhuangyuan,I like elasticsearch"
        },
        "sort" : [
          27
        ]
      },
      {
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "name" : "zhang fulai",
          "age" : 29,
          "join_date" : "2018-03-01",
          "phone" : "18823450003",
          "country" : "CN",
          "province" : "guangdong",
          "city" : "shenzhen",
          "remark" : "I'm liaiguo,I like hadoop"
        },

        "sort" : [
          29
        ]
      }
    ]
  }
}

3.3.13.9 查看索引中的所有文檔總數。

GET user_index/user_type/_count

Response:

{
  "count" : 5,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  }
}

3.13.19 組合多條件查詢

在項目的實際開發中,基本都是組合多條件查詢來滿足實際的需求。elasticsearch提供bool來實現這種需求。主要參數:

must:文檔必須匹配這些條件才能被包含進來。

must_not:文檔必須不匹配這些條件才能被包含進來。

should:如果滿足這些語句中的任意語句將增加_score得分 ,否則無任何影響。它們主要用於修正每個文檔的相關性得分。

filter:必須匹配,但它以不評分、過濾模式來進行。這些語句對評分沒有貢獻,只是根據過濾標準來排除或包含文檔。

例如:查詢用戶信息中,remark必須包含elasticsearch,並且不包含spark的用戶信息。

GET user_index/user_type/_search
{
  "query": {
    "bool": {
      "must":  {
        "match": {
          "remark": "elasticsearch"
        }
      },
      "must_not": {
         "match":{
           "remark":"spark"
         }
      },
      "should": {
        "match":{
          "age":27
        }
      }
    }
  }
}

Response:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.6931472,
    "hits" : [
      {
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "1",
        "_score" : 1.6931472,
        "_source" : {
          "name" : "chen zhuangyuan",
          "age" : 27,
          "join_date" : "2018-01-01",
          "phone" : "18823450001",
          "country" : "CN",
          "province" : "guangdong",
          "city" : "guangzhou",
          "remark" : "I'm zhuangyuan,I like elasticsearch"
        }
      }
    ]
  }
}

3.13.20 explian評分分析

      從elasticsearch的搜索結果顯示來看,展現給我們的是一個按score得分從高到底排好序的結果集。_explain用來幫助分析文檔的score是如何計算出來的。

GET user_index/user_type/_search
{
  "query": {
    "match": {
      "remark": "elasticsearch"
    }
  },
  "explain": true
}

Response:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0594962,
    "hits" : [
      {
        "_shard" : "[user_index][0]",
        "_node" : "SQYgJvIZR7yqA3TzkURejA",
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "6",
        "_score" : 1.0594962,
        "_source" : {
          "name" : "liu haoqiang",
          "age" : 27,
          "join_date" : "2018-06-01",
          "phone" : "18823450006",
          "country" : "CN",
          "province" : "guangdong",
          "city" : "guangzhou",
          "remark" : "I'm from guangzhou,I like spark and elasticsearch"
        },
        "_explanation" : {
          "value" : 1.059496,
          "description" : "weight(remark:elasticsearch in 0) [PerFieldSimilarity], result of:",
          "details" : [
            {
              "value" : 1.059496,
              "description" : "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
              "details" : [
                {
                  "value" : 1.2039728,
                  "description" : "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                  "details" : [
                    {
                      "value" : 1.0,
                      "description" : "docFreq",
                      "details" : [ ]
                    },
                    {
                      "value" : 4.0,
                      "description" : "docCount",
                      "details" : [ ]
                    }
                  ]
                },
                {
                  "value" : 0.88,
                  "description" : "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                  "details" : [
                    {
                      "value" : 1.0,
                      "description" : "termFreq=1.0",
                      "details" : [ ]
                    },
                    {
                      "value" : 1.2,
                      "description" : "parameter k1",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.75,
                      "description" : "parameter b",
                      "details" : [ ]
                    },
                    {
                      "value" : 5.25,
                      "description" : "avgFieldLength",
                      "details" : [ ]
                    },
                    {
                      "value" : 7.0,
                      "description" : "fieldLength",
                      "details" : [ ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      },
      {
        "_shard" : "[user_index][2]",
        "_node" : "SQYgJvIZR7yqA3TzkURejA",
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "1",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "chen zhuangyuan",
          "age" : 27,
          "join_date" : "2018-01-01",
          "phone" : "18823450001",
          "country" : "CN",
          "province" : "guangdong",
          "city" : "guangzhou",
          "remark" : "I'm zhuangyuan,I like elasticsearch"
        },
        "_explanation" : {
          "value" : 0.6931472,
          "description" : "weight(remark:elasticsearch in 0) [PerFieldSimilarity], result of:",
          "details" : [
            {
              "value" : 0.6931472,
              "description" : "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
              "details" : [
                {
                  "value" : 0.6931472,
                  "description" : "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                  "details" : [
                    {
                      "value" : 1.0,
                      "description" : "docFreq",
                      "details" : [ ]
                    },
                    {
                      "value" : 2.0,
                      "description" : "docCount",
                      "details" : [ ]
                    }
                  ]
                },
                {
                  "value" : 1.0,
                  "description" : "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                  "details" : [
                    {
                      "value" : 1.0,
                      "description" : "termFreq=1.0",
                      "details" : [ ]
                    },
                    {
                      "value" : 1.2,
                      "description" : "parameter k1",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.75,
                      "description" : "parameter b",
                      "details" : [ ]
                    },
                    {
                      "value" : 4.0,
                      "description" : "avgFieldLength",
                      "details" : [ ]
                    },
                    {
                      "value" : 4.0,
                      "description" : "fieldLength",
                      "details" : [ ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      }
    ]
  }
}

3.3.21 _analyze分詞分析

      _analyz是Elasticsearch一個非常有用的API,它可以幫助你分析每一個field或者某個analyzer/tokenizer是如何分析和索引一段文字。返回結果字段含義:

token:是一個實際被存儲在索引中的詞

position:指明詞在原文本中是第幾個位置出現的

start_offset,end_offset:表示詞在原文本中佔據的位置。

GET user_index/_analyze
{
  "analyzer": "standard",
  "text": "I'm from shenzhen,I like elasticsearch,spark and hbase"
}

Response:

{
  "tokens" : [
    {
      "token" : "i'm",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "from",
      "start_offset" : 4,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "shenzhen",
      "start_offset" : 9,
      "end_offset" : 17,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "i",
      "start_offset" : 18,
      "end_offset" : 19,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "like",
      "start_offset" : 20,
      "end_offset" : 24,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "elasticsearch",
      "start_offset" : 25,
      "end_offset" : 38,
      "type" : "<ALPHANUM>",
      "position" : 5
    },
    {
      "token" : "spark",
      "start_offset" : 39,
      "end_offset" : 44,
      "type" : "<ALPHANUM>",
      "position" : 6
    },
    {
      "token" : "and",
      "start_offset" : 45,
      "end_offset" : 48,
      "type" : "<ALPHANUM>",
      "position" : 7
    },
    {
      "token" : "hbase",
      "start_offset" : 49,
      "end_offset" : 54,
      "type" : "<ALPHANUM>",
      "position" : 8
    }
  ]
}

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章