ElasticSearch必備知識：從索引別名、分詞器、文檔管理、路由到搜索詳解

原文：https://www.enmotech.com/web/detail/1/839/1.html

ElasticSearch必備知識：從索引別名、分詞器、文檔管理、路由到搜索詳解（上）

https://www.enmotech.com/web/detail/1/840/1.html

ElasticSearch必備知識：從索引別名、分詞器、文檔管理、路由到搜索詳解（下）

導讀：之前我們分享了ElasticSearch最全詳細使用教程：入門、索引管理、映射詳解，本文詳細介紹ElasticSearch的索引別名、分詞器、文檔管理、路由、搜索詳解。

一、索引別名

1. 別名的用途

如果希望一次查詢可查詢多個索引。
如果希望通過索引的視圖來操作索引，就像數據庫庫中的視圖一樣。
索引的別名機制，就是讓我們可以以視圖的方式來操作集羣中的索引，這個視圖可是多個索引，也可是一個索引或索引的一部分。

2. 新建索引時定義別名

PUT /logs_20162801 { "mappings" : { "type" : { "properties" : { "year" : {"type" : "integer"} } } },  "aliases" : { "current_day" : {}, "2016" : { "filter" : { "term" : {"year" : 2016 } } } } }

3. 創建別名 /_aliases

爲索引test1創建別名alias1

POST /_aliases
    {
        "actions" : [
            { "add" : { "index" : "test1", "alias" : "alias1" } }
        ]
    }

4. 刪除別名

POST /_aliases
    {
        "actions" : [
            { "remove" : { "index" : "test1", "alias" : "alias1" } }
        ]
    }

還可以這樣寫

DELETE /{index}/_alias/{name}

5. 批量操作別名

刪除索引test1的別名alias1，同時爲索引test2添加別名alias1

    POST /_aliases
    {
        "actions" : [
            { "remove" : { "index" : "test1", "alias" : "alias1" } },
            { "add" : { "index" : "test2", "alias" : "alias1" } }
        ]
    }

6. 爲多個索引定義一樣的別名

方式1：

POST /_aliases
    {
        "actions" : [
            { "add" : { "index" : "test1", "alias" : "alias1" } },
            { "add" : { "index" : "test2", "alias" : "alias1" } }
        ]
    }

方式2：

    POST /_aliases
    {
        "actions" : [
            { "add" : { "indices" : ["test1", "test2"], "alias" : "alias1" } }
        ]
    }

注意：只可通過多索引別名進行搜索，不可進行文檔索引和根據id獲取文檔。

方式3：通過統配符*模式來指定要別名的索引

    POST /_aliases
    {
        "actions" : [
            { "add" : { "index" : "test*", "alias" : "all_test_indices" } }
        ]
    }

注意：在這種情況下，別名是一個點時間別名，它將對所有匹配的當前索引進行別名，當添加/刪除與此模式匹配的新索引時，它不會自動更新。

7. 帶過濾器的別名

索引中需要有字段

PUT /test1
    {
      "mappings": {
        "type1": {
          "properties": {
            "user" : {
              "type": "keyword"
            }
          }
        }
      }
    }

過濾器通過Query DSL來定義，將作用於通過該別名來進行的所有Search, Count,

Delete By Query and More Like This 操作。

    POST /_aliases
    {
        "actions" : [
            {
                "add" : {
                     "index" : "test1",
                     "alias" : "alias2",
                     "filter" : { "term" : { "user" : "kimchy" } }
                }
            }
        ]
    }

8. 帶routing的別名

可在別名定義中指定路由值，可和filter一起使用，用來限定操作的分片，避免不需要的其他分片操作。

POST /_aliases
    {
        "actions" : [
            {
                "add" : {
                     "index" : "test",
                     "alias" : "alias1",
                     "routing" : "1"
                }
            }
        ]
    }

爲搜索、索引指定不同的路由值

POST /_aliases
    {
        "actions" : [
            {
                "add" : {
                     "index" : "test",
                     "alias" : "alias2",
                     "search_routing" : "1,2",
                     "index_routing" : "2"
                }
            }
        ]
    }

9. 以PUT方式來定義一個別名

PUT /{index}/_alias/{name}
PUT /logs_201305/_alias/2013

帶filter 和 routing

PUT /users
    {
        "mappings" : {
            "user" : {
                "properties" : {
                    "user_id" : {"type" : "integer"}
                }
            }
        }
    }
    PUT /users/_alias/user_12
    {
        "routing" : "12",
        "filter" : {
            "term" : {
                "user_id" : 12
            }
        }
    }

10. 查看別名定義信息

    GET /{index}/_alias/{alias}
    GET /logs_20162801/_alias/*
    GET /_alias/2016
    GET /_alias/20*

二、分詞器

1. 認識分詞器

1.1 Analyzer分析器
在ES中一個Analyzer 由下面三種組件組合而成：
character filter ：字符過濾器，對文本進行字符過濾處理，如處理文本中的html標籤字符。處理完後再交給tokenizer進行分詞。一個analyzer中可包含0個或多個字符過濾器，多個按配置順序依次進行處理。
tokenizer：分詞器，對文本進行分詞。一個analyzer必需且只可包含一個tokenizer。
token filter：詞項過濾器，對tokenizer分出的詞進行過濾處理。如轉小寫、停用詞處理、同義詞處理。一個analyzer可包含0個或多個詞項過濾器，按配置順序進行過濾。

1.2 如何測試分詞器

POST _analyze
    {
      "analyzer": "whitespace",
      "text":     "The quick brown fox."
    }

    POST _analyze
    {
      "tokenizer": "standard",
      "filter": [ "lowercase", "asciifolding" ],
      "text":      "Is this déja vu?"
    }

position：第幾個詞

offset：詞的偏移位置

2. 內建的character filter

HTML Strip Character Filter
　　html_strip ：過濾html標籤，解碼HTML entities like &.
Mapping Character Filter
　　mapping ：用指定的字符串替換文本中的某字符串。
Pattern Replace Character Filter
　　pattern_replace ：進行正則表達式替換。

2.1 HTML Strip Character Filter

POST _analyze
 {
 "tokenizer": "keyword",
 "char_filter": [ "html_strip" ],
 "text": "I'm so happy!"
 }

在索引中配置：

    PUT my_index
    {
      "settings": {
        "analysis": {
          "analyzer": {
            "my_analyzer": {
              "tokenizer": "keyword",
              "char_filter": ["my_char_filter"]
            }
          },
          "char_filter": {
            "my_char_filter": {
              "type": "html_strip",
              "escaped_tags": ["b"]
            }
          }
        }
      }
    }

escaped_tags 用來指定例外的標籤。如果沒有例外標籤需配置，則不需要在此進行客戶化定義，在上面的my_analyzer中直接使用 html_strip

測試：

POST my_index/_analyze
 {
 "analyzer": "my_analyzer",
 "text": "I'm so happy!"
 }

2.2 Mapping character filter

官網鏈接：https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-mapping-charfilter.html

PUT my_index
    {
      "settings": {

"analysis": {

          "analyzer": {
            "my_analyzer": {
              "tokenizer": "keyword",
              "char_filter": [
                "my_char_filter"
              ]
            }
          },
          "char_filter": {
            "my_char_filter": {
              "type": "mapping",
              "mappings": [
                "٠ => 0",
                "١ => 1",
                "٢ => 2",
                "٣ => 3",
                "٤ => 4",
                "٥ => 5",
                "٦ => 6",
                "٧ => 7",
                "٨ => 8",
                "٩ => 9"
              ]
            }
          }
        }
      }
    }

測試

POST my_index/_analyze
    {
      "analyzer": "my_analyzer",
      "text": "My license plate is ٢٥٠١٥"
    }

2.3 Pattern Replace Character Filter

官網鏈接：https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-replace-charfilter.html

PUT my_index

    {
      "settings": {
        "analysis": {
          "analyzer": {
            "my_analyzer": {
              "tokenizer": "standard",
              "char_filter": [
                "my_char_filter"
              ]
            }
          },
          "char_filter": {
            "my_char_filter": {
              "type": "pattern_replace",
              "pattern": "(\\d+)-(?=\\d)",
              "replacement": "$1_"
            }
          }
        }
      }
    }

測試

   POST my_index/_analyze
    {
      "analyzer": "my_analyzer",
      "text": "My credit card is 123-456-789"
    }

3. 內建的Tokenizer

官網鏈接：https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html

Standard Tokenizer
    Letter Tokenizer
    Lowercase Tokenizer
    Whitespace Tokenizer
    UAX URL Email Tokenizer
    Classic Tokenizer
    Thai Tokenizer
    NGram Tokenizer
    Edge NGram Tokenizer
    Keyword Tokenizer
    Pattern Tokenizer
    Simple Pattern Tokenizer
    Simple Pattern Split Tokenizer
    Path Hierarchy Tokenizer

前面集成的中文分詞器Ikanalyzer中提供的tokenizer：ik_smart 、 ik_max_word

測試tokenizer

POST _analyze
    {
      "tokenizer":      "standard",
      "text": "張三說的確實在理"
    }

    POST _analyze
    {
      "tokenizer":      "ik_smart",
      "text": "張三說的確實在理"
    }

4. 內建的Token Filter

ES中內建了很多Token filter ，詳細瞭解：https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html

    Lowercase Token Filter ：lowercase 轉小寫
    Stop Token Filter ：stop 停用詞過濾器
    Synonym Token Filter：synonym 同義詞過濾器

說明：中文分詞器Ikanalyzer中自帶有停用詞過濾功能。

4.1 Synonym Token Filter 同義詞過濾器

PUT /test_index
 {
 "settings": {
 "index" : {
 "analysis" : {
 "analyzer" : {
 "my_ik_synonym" : {
 "tokenizer" : "ik_smart",
 "filter" : ["synonym"]
 }
 },
 "filter" : {
 "synonym" : {
 "type" : "synonym",
 
 "synonyms_path" : "analysis/synonym.txt"
 }
 }
 }
 }
 }
 }

同義詞定義格式

ES同義詞格式支持 solr、 WordNet 兩種格式。

在analysis/synonym.txt中用solr格式定義如下同義詞

    張三,李四
    電飯煲,電飯鍋 => 電飯煲
    電腦 => 計算機,computer

注意：

文件一定要UTF-8編碼

一行一類同義詞，=> 表示標準化爲

測試：通過例子的結果瞭解同義詞的處理行爲

   POST test_index/_analyze
    {
      "analyzer": "my_ik_synonym",
      "text": "張三說的確實在理"
    }

    POST test_index/_analyze
    {
      "analyzer": "my_ik_synonym",
      "text": "我想買個電飯鍋和一個電腦"
    }

5. 內建的Analyzer

官網鏈接：https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-analyzers.html

Standard Analyzer

    Simple Analyzer
    Whitespace Analyzer
    Stop Analyzer
    Keyword Analyzer
    Pattern Analyzer
    Language Analyzers
    Fingerprint Analyzer

集成的中文分詞器Ikanalyzer中提供的Analyzer：ik_smart 、 ik_max_word

內建的和集成的analyzer可以直接使用。如果它們不能滿足我們的需要，則我們可自己組合字符過濾器、分詞器、詞項過濾器來定義自定義的analyzer

5.1 自定義 Analyzer

配置參數：

PUT my_index8
    {
      "settings": {
        "analysis": {
          "analyzer": {
            "my_ik_analyzer": {
              "type": "custom",
              "tokenizer": "ik_smart",
              "char_filter": [
                "html_strip"
              ],
              "filter": [
                 "synonym"
              ]
            }
          },
          "filter": {
            "synonym": {
              "type": "synonym",
              "synonyms_path": "analysis/synonym.txt"
            }
          }    } }}

5.2 爲字段指定分詞器

PUT my_index8/_mapping/_doc
    {
      "properties": {
        "title": {
            "type": "text",
            "analyzer": "my_ik_analyzer"
        }
      }
    }

如果該字段的查詢需要使用不同的analyzer

PUT my_index8/_mapping/_doc
    {
      "properties": {
        "title": {
            "type": "text",
            "analyzer": "my_ik_analyzer",
            "search_analyzer": "other_analyzer"
        }
      }
    }

測試結果

PUT my_index8/_doc/1
    {
      "title": "張三說的確實在理"
    }

    GET /my_index8/_search
    {
      "query": {
        "term": {
          "title": "張三"
        }
      }
    }

5.3 爲索引定義個default分詞器

    PUT /my_index10
    {
      "settings": {
        "analysis": {
          "analyzer": {
            "default": {
              "tokenizer": "ik_smart",
              "filter": [
                "synonym"
              ]
            }
          },
          "filter": {
            "synonym": {
              "type": "synonym",
              "synonyms_path": "analysis/synonym.txt"
            }
          }
        }
      },
    "mappings": {
        "_doc": {
          "properties": {
            "title": {
              "type": "text"
            }
          }
        }
      }
    }

測試結果：

PUT my_index10/_doc/1
    {
      "title": "張三說的確實在理"
    }

    GET /my_index10/_search
    {
      "query": {
        "term": {
          "title": "張三"
        }
      }
    }

6. Analyzer的使用順序

我們可以爲每個查詢、每個字段、每個索引指定分詞器。

在索引階段ES將按如下順序來選用分詞：

首先選用字段mapping定義中指定的analyzer
字段定義中沒有指定analyzer，則選用 index settings中定義的名字爲default 的analyzer。
如index setting中沒有定義default分詞器，則使用 standard analyzer.

查詢階段ES將按如下順序來選用分詞：
The analyzer defined in a full-text query.
The search_analyzer defined in the field mapping.
The analyzer defined in the field mapping.
An analyzer named default_search in the index settings.
An analyzer named default in the index settings.
The standard analyzer.

三、文檔管理

1. 新建文檔

指定文檔id，新增/修改

PUT twitter/_doc/1
    {
        "id": 1,
        "user" : "kimchy",
        "post_date" : "2009-11-15T14:12:12",
        "message" : "trying out Elasticsearch"
    }

新增，自動生成文檔id

POST twitter/_doc/
    {
        "id": 1,
        "user" : "kimchy",
        "post_date" : "2009-11-15T14:12:12",
        "message" : "trying out Elasticsearch"
    }

返回結果說明：

2. 獲取單個文檔

HEAD twitter/_doc/11
GET twitter/_doc/1

不獲取文檔的source：

GET twitter/_doc/1?_source=false

獲取文檔的source：

GET twitter/_doc/1/_source
    {
      "_index": "twitter",
      "_type": "_doc",
      "_id": "1",
      "_version": 2,
      "found": true,
      "_source": {
        "id": 1,
        "user": "kimchy",
        "post_date": "2009-11-15T14:12:12",
        "message": "trying out Elasticsearch"
      }}

獲取存儲字段

PUT twitter11
    {
       "mappings": {
          "_doc": {
             "properties": {
                "counter": {
                   "type": "integer",
                   "store": false
                },
                "tags": {
                   "type": "keyword",
                   "store": true
                } }   } }}
    PUT twitter11/_doc/1
    {
        "counter" : 1,
        "tags" : ["red"]
    }
    GET twitter11/_doc/1?stored_fields=tags,counter

3. 獲取多個文檔 _mget

方式1：

GET /_mget
    {
        "docs" : [
            {
                "_index" : "twitter",
                "_type" : "_doc",
                "_id" : "1"
            },
            {
                "_index" : "twitter",
                "_type" : "_doc",
                "_id" : "2"
                "stored_fields" : ["field3", "field4"]
            }
        ]
    }

方式2：

GET /twitter/_mget
    {
        "docs" : [
            {
                "_type" : "_doc",
                "_id" : "1"
            },
            {
                "_type" : "_doc",
                "_id" : "2"
            }
        ]
    }

方式3：

GET /twitter/_doc/_mget
    {
        "docs" : [
            {
                "_id" : "1"
            },
            {
                "_id" : "2"
            }
        ]
    }

方式4：

GET /twitter/_doc/_mget
    {
        "ids" : ["1", "2"]
    }

4. 刪除文檔

指定文檔id進行刪除
DELETE twitter/_doc/1

用版本來控制刪除
DELETE twitter/_doc/1?version=1

返回結果：

{
        "_shards" : {
            "total" : 2,
            "failed" : 0,
            "successful" : 2
        },
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "1",
        "_version" : 2,
        "_primary_term": 1,
        "_seq_no": 5,
        "result": "deleted"
    }

查詢刪除

POST twitter/_delete_by_query
    {
      "query": {
        "match": {
          "message": "some message"
        }
      }
    }

當有文檔有版本衝突時，不放棄刪除操作（記錄衝突的文檔，繼續刪除其他複合查詢的文檔）

    POST twitter/_doc/_delete_by_query?conflicts=proceed
    {
      "query": {
        "match_all": {}
      }
    }

通過task api 來查看查詢刪除任務
GET _tasks?detailed=true&actions=*/delete/byquery

查詢具體任務的狀態
GET /_tasks/taskId:1

取消任務
POST _tasks/task_id:1/_cancel

5. 更新文檔

指定文檔id進行修改

    PUT twitter/_doc/1
    {
        "id": 1,
        "user" : "kimchy",
        "post_date" : "2009-11-15T14:12:12",
        "message" : "trying out Elasticsearch"
    }

樂觀鎖併發更新控制

   PUT twitter/_doc/1?version=1
    {
        "id": 1,
        "user" : "kimchy",
        "post_date" : "2009-11-15T14:12:12",
        "message" : "trying out Elasticsearch"
    }

返回結果

   {
      "_index": "twitter",
      "_type": "_doc",
      "_id": "1",
      "_version": 3,
      "result": "updated",
      "_shards": {
        "total": 3,
        "successful": 1,
        "failed": 0
      },
      "_seq_no": 2,
      "_primary_term": 3
    }

6. Scripted update 通過腳本來更新文檔

6.1 準備一個文檔

PUT uptest/_doc/1
    {
        "counter" : 1,
        "tags" : ["red"]
    }

6.2 對文檔1的counter + 4

   POST uptest/_doc/1/_update
    {
        "script" : {
            "source": "ctx._source.counter += params.count",
            "lang": "painless",
            "params" : {
                "count" : 4
            }
        }
    }

6.3 往數組中加入元素

    POST uptest/_doc/1/_update
    {
        "script" : {
            "source": "ctx._source.tags.add(params.tag)",
            "lang": "painless",
            "params" : {
                "tag" : "blue"
            }
        }
    }

腳本說明：painless是es內置的一種腳本語言，ctx執行上下文對象（通過它還可訪問_index, _type, _id, _version, _routing and _now (the current timestamp) ），params是參數集合

說明：腳本更新要求索引的_source 字段是啓用的。更新執行流程：

a、獲取到原文檔
b、通過_source字段的原始數據，執行腳本修改。
c、刪除原索引文檔

d、索引修改後的文檔

它只是降低了一些網絡往返，並減少了get和索引之間版本衝突的可能性。

6.4 添加一個字段

   POST uptest/_doc/1/_update
    {
        "script" : "ctx._source.new_field = 'value_of_new_field'"
    }

6.5 移除一個字段

POST uptest/_doc/1/_update
    {
        "script" : "ctx._source.remove('new_field')"
    }

6.6 判斷刪除或不做什麼

POST uptest/_doc/1/_update
    {
        "script" : {
            "source": "if (ctx._source.tags.contains(params.tag)) { ctx.op = 'delete' } else { ctx.op = 'none' }",
            "lang": "painless",
            "params" : {
                "tag" : "green"
            }
        }
    }

6.7 合併傳人的文檔字段進行更新

    POST uptest/_doc/1/_update
    {
        "doc" : {
            "name" : "new_name"
        }

}

6.8 再次執行7，更新內容相同，不需做什麼

{
      "_index": "uptest",
      "_type": "_doc",
      "_id": "1",
      "_version": 4,
      "result": "noop",
      "_shards": {
        "total": 0,
        "successful": 0,
        "failed": 0
      }
    }

6.9 設置不做noop檢測

   POST uptest/_doc/1/_update
    {
        "doc" : {
            "name" : "new_name"
        },
        "detect_noop": false
    }

什麼是noop檢測？

即已經執行過的腳本不再執行

6.10 upsert 操作：如果要更新的文檔存在，則執行腳本進行更新，如不存在，則把 upsert中的內容作爲一個新文檔寫入。

   POST uptest/_doc/1/_update

    {
        "script" : {
            "source": "ctx._source.counter += params.count",
            "lang": "painless",
            "params" : {
                "count" : 4
            }
        },
        "upsert" : {
            "counter" : 1
        }
    }

7. 通過條件查詢來更新文檔

滿足查詢條件的才更新

POST twitter/_update_by_query
    {
      "script": {
        "source": "ctx._source.likes++",
        "lang": "painless"
      },
      "query": {
        "term": {
          "user": "kimchy"
        }
      }
    }

8. 批量操作

批量操作API /_bulk 讓我們可以在一次調用中執行多個索引、刪除操作。這可以大大提高索引數據的速度。批量操作內容體需按如下以新行分割的json結構格式給出：

語法：

action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
....
action_and_meta_data\n
optional_source\n

說明：

action_and_meta_data: action可以是 index, create, delete and update ，meta_data 指: _index ,_type,_id 請求端點可以是: /_bulk, /{index}/_bulk, {index}/{type}/_bulk

示例：

POST _bulk

    { "index" : { "_index" : "test", "_type" : "_doc", "_id" : "1" } }
    { "field1" : "value1" }
    { "delete" : { "_index" : "test", "_type" : "_doc", "_id" : "2" } }
    { "create" : { "_index" : "test", "_type" : "_doc", "_id" : "3" } }
    { "field1" : "value3" }
    { "update" : {"_id" : "1", "_type" : "_doc", "_index" : "test"} }
    { "doc" : {"field2" : "value2"} }

8.1 curl + json 文件批量索引多個文檔

注意：accounts.json要放在執行curl命令的同等級目錄下，後續學習的測試數據基本都使用這份銀行的數據了
curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@accounts.json"

9. reindex 重索引

Reindex API /_reindex 讓我們可以將一個索引中的數據重索引到另一個索引中（拷貝），要求源索引的_source 是開啓的。目標索引的setting 、mapping 信息與源索引無關。

什麼時候需要重索引？

即當需要做數據的拷貝的時候

POST _reindex
    {
      "source": {
        "index": "twitter"
      },
      "dest": {
        "index": "new_twitter"
      }
    }

重索引要考慮的一個問題：目標索引中存在源索引中的數據，這些數據的version如何處理。

1. 如果沒有指定version_type 或指定爲 internal，則會是採用目標索引中的版本，重索引過程中，執行的就是新增、更新操作。

POST _reindex

    {
      "source": {
        "index": "twitter"
      },
      "dest": {
        "index": "new_twitter",
        "version_type": "internal"
      }

2. 如果想使用源索引中的版本來進行版本控制更新，則設置 version_type 爲extenal。重索引操作將寫入不存在的，更新舊版本的數據。

POST _reindex

    {
      "source": {
        "index": "twitter"
      },
      "dest": {
        "index": "new_twitter",
        "version_type": "external"
      }
    }

如果你只想從源索引中複製目標索引中不存在的文檔數據，可以指定 op_type 爲 create 。此時存在的文檔將觸發版本衝突（會導致放棄操作），可設置“conflicts”: “proceed“，跳過繼續

POST _reindex

    {
      "conflicts": "proceed",
      "source": {
        "index": "twitter"
      },
      "dest": {
        "index": "new_twitter",
        "op_type": "create"
      }
    }

你也可以只索引源索引的一部分數據，通過 type 或查詢來指定你需要的數據

POST _reindex
    {
      "source": {
        "index": "twitter",
        "type": "_doc",
        "query": {
          "term": {
            "user": "kimchy"
          }
        }
      },
      "dest": {
        "index": "new_twitter"
      }
    }

可以從多個源獲取數據

POST _reindex
    {
      "source": {
        "index": ["twitter", "blog"],
        "type": ["_doc", "post"]
      },
      "dest": {
        "index": "all_together"
      }
    }

可以限定文檔數量

    POST _reindex
    {
      "size": 10000,
      "source": {
        "index": "twitter",
        "sort": { "date": "desc" }
      },
      "dest": {
        "index": "new_twitter"
      }
    }

可以選擇複製源文檔的哪些字段

POST _reindex
    {
      "source": {
        "index": "twitter",
        "_source": ["user", "_doc"]
      },
      "dest": {
        "index": "new_twitter"
      }
    }

可以用script來改變文檔

   POST _reindex

    {
      "source": {
        "index": "twitter"
      },
      "dest": {
        "index": "new_twitter",
        "version_type": "external"
      },
      "script": {
        "source": "if (ctx._source.foo == 'bar') {ctx._version++; ctx._source.remove('foo')}",
        "lang": "painless"
      }
    }

可以指定路由值把文檔放到哪個分片上

POST _reindex
    {
      "source": {
        "index": "source",
        "query": {
          "match": {
            "company": "cat"
          }
        }
      },
      "dest": {
        "index": "dest",
        "routing": "=cat"
      }
    }

從遠程源複製

POST _reindex
    {
      "source": {
        "remote": {
          "host": "http://otherhost:9200",
          "username": "user",
          "password": "pass"
        },
        "index": "source",
        "query": {
          "match": {
            "test": "data"
          }
        }
      },
      "dest": {
        "index": "dest"
      }
    }

通過_task 來查詢執行狀態
GET _tasks?detailed=true&actions=*reindex

10. refresh

對於索引、更新、刪除操作如果想操作完後立馬重刷新可見，可帶上refresh參數

    PUT /test/_doc/1?refresh
    {"test": "test"}
    PUT /test/_doc/2?refresh=true
    {"test": "test"}

refresh 可選值說明

未給值或=true，則立馬會重刷新讀索引。
=false ，相當於沒帶refresh 參數，遵循內部的定時刷新。
=wait_for ，登記等待刷新，當登記的請求數達到index.max_refresh_listeners 參數設定的值時(defaults to 1000)，將觸發重刷新。

四、路由詳解

1. 集羣組成

第一個節點啓動

說明：首先啓動的一定是主節點，主節點存儲的是集羣的元數據信息

Node2啓動

說明：

Node2節點啓動之前會配置集羣的名稱Cluster-name：ess，然後配置可以作爲主節點的ip地址信息discovery.zen.ping.unicast.hosts: [“10.0.1.11",“10.0.1.12"]，配置自己的ip地址networ.host: 10.0.1.12；

Node2啓動的過程中會去找到主節點Node1告訴Node1我要加入到集羣裏面了，主節點Node1接收到請求以後看Node2是否滿足加入集羣的條件，如果滿足就把node2的ip地址加入的元信息裏面，然後廣播給集羣中的其他節點有

新節點加入，並把最新的元信息發送給其他的節點去更新

Node3..NodeN加入

說明：集羣中的所有節點的元信息都是和主節點一致的，因爲一旦有新的節點加入進來，主節點會通知其他的節點同步元信息

2. 在集羣中創建索引的流程

3. 有索引的集羣

4. 集羣有節點出現故障，如主節點掛了，會重新選擇主節點

5. 在集羣中索引文檔

索引文檔的步驟：
1. node2計算文檔的路由值得到文檔存放的分片（假定路由選定的是分片0）。
2. 將文檔轉發給分片0(P0)的主分片節點 node1。
3. node1索引文檔，同步給副本（R0）節點node3索引文檔。
4. node1向node2反饋結果
5. node2作出響應
6. 文檔是如何路由的

文檔該存到哪個分片上？
決定文檔存放到哪個分片上就是文檔路由。ES中通過下面的計算得到每個文檔的存放分片：
shard = hash(routing) % number_of_primary_shards

參數說明：

routing 是用來進行hash計算的路由值，默認是使用文檔id值。我們可以在索引文檔時通過routing參數指定別的路由值

number_of_primary_shards：創建索引時指定的主分片數

POST twitter/_doc?routing=kimchy
    {
        "user" : "kimchy",
        "post_date" : "2009-11-15T14:12:12",
        "message" : "trying out Elasticsearch"
    }

在索引、刪除、更新、查詢中都可以使用routing參數（可多值）指定操作的分片。

創建索引時強制要求給定路由值：

PUT my_index2
    {
      "mappings": {
        "_doc": {
          "_routing": {
            "required": true
          }
        }
      }
    }

7. 在集羣中進行搜索

搜索的步驟：如要搜索

索引 s0
1. node2解析查詢。
2. node2將查詢發給索引s0的分片/副本（R1,R2,R0）節點
3. 各節點執行查詢，將結果發給Node2

想了解更多關於數據庫、雲技術的內容嗎？

快來關注“數據和雲"、"雲和恩墨，"公衆號及"雲和恩墨"官方網站，我們期待大家一同學習與進步！

數據和雲小程序”DBASK“在線問答，隨時解惑，歡迎瞭解和關注！

ElasticSearch必備知識：從索引別名、分詞器、文檔管理、路由到搜索詳解

2020年4月數據庫流行度排行：MySQL 成事實王者，國產openGauss引期待

Oracle 20c 新特性：區塊鏈表的加密HASH以及刪除保護

ElasticSearch最全詳細使用教程：入門、索引管理、映射詳解

GS-00001 - GaussDB 100 OLTP第一號錯誤的診斷和解決

Oracle Database Express 18.4版本：敏捷的二步安裝法

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結