如何將elastic search 的健康狀態由紅色red變爲綠色green

現狀

查詢健康狀態

curl -XGET 'http://localhost:9200/_cluster/health?pretty'

{
“cluster_name” : “go”,
“status” : “red”,
“timed_out” : false,
“number_of_nodes” : 1,
“number_of_data_nodes” : 1,
“active_primary_shards” : 114,
“active_shards” : 114,
“relocating_shards” : 0,
“initializing_shards” : 0,
“unassigned_shards” : 397,
“delayed_unassigned_shards” : 0,
“number_of_pending_tasks” : 0,
“number_of_in_flight_fetch” : 0,
“task_max_waiting_in_queue_millis” : 0,
“active_shards_percent_as_number” : 22.309197651663403
}

集羣狀態解讀

1）、綠色——最健康的狀態，代表所有的主分片和副本分片都可用；

2）、黃色——所有的主分片可用，但是部分副本分片不可用；

3）、紅色——部分主分片不可用。（此時執行查詢部分數據仍然可以查到，遇到這種情況，還是趕快解決比較好）

如果集羣狀態爲紅色， Head插件顯示：集羣健康值red 。則說明：至少一個主分片分配失敗。

這將導致一些數據以及索引的某些部分不再可用。

儘管如此， ElasticSearch還是允許我們執行查詢

什麼是unassigned 分片？

一句話解釋：未分配的分片。
啓動ES的時候，通過Head插件不停刷新，你會發現集羣分片會呈現紫色、灰色、最終綠色的狀態。

如果不能分配分片，例如，您已經爲集羣中的節點數過分分配了副本分片的數量，則分片將保持UNASSIGNED狀態。
其錯誤碼爲：ALLOCATION_FAILED。

你可以通過如下指令，查看集羣中不同節點、不同索引的狀態。

 curl -XGET 'http://172.17.161.205:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason'

出現unassigned 分片後的症狀？
head插件查看會：Elasticsearch啓動N長時候後，某一個或幾個分片仍持續爲灰色。

unassigned 分片問題可能的原因？

1）INDEX_CREATED：由於創建索引的API導致未分配。
2）CLUSTER_RECOVERED ：由於完全集羣恢復導致未分配。
3）INDEX_REOPENED ：由於打開open或關閉close一個索引導致未分配。
4）DANGLING_INDEX_IMPORTED ：由於導入dangling索引的結果導致未分配。
5）NEW_INDEX_RESTORED ：由於恢復到新索引導致未分配。
6）EXISTING_INDEX_RESTORED ：由於恢復到已關閉的索引導致未分配。
7）REPLICA_ADDED：由於顯式添加副本分片導致未分配。
8）ALLOCATION_FAILED ：由於分片分配失敗導致未分配。
9）NODE_LEFT ：由於承載該分片的節點離開集羣導致未分配。
10）REINITIALIZED ：由於當分片從開始移動到初始化時導致未分配（例如，使用影子shadow副本分片）。
11）REROUTE_CANCELLED ：作爲顯式取消重新路由命令的結果取消分配。
12）REALLOCATED_REPLICA ：確定更好的副本位置被標定使用，導致現有的副本分配被取消，出現未分配。

集羣狀態紅色如何排查？

症狀：集羣健康值紅色;
可能原因：集羣中部分節點的主分片未分配。
接下來的解決方案主要圍繞：使主分片unsigned 分片完成再分配展開

如何Fixed unassigned 分片問題？

方案一：極端情況——這個分片數據已經不可用，直接刪除該分片。
ES中沒有直接刪除分片的接口，除非整個節點數據已不再使用，刪除節點。

curl -XDELETE ‘localhost:9200/index_name/’

方案二：集羣中節點數量>=集羣中所有索引的最大副本數量 +１。

根據我們的集羣狀態

{
  "cluster_name" : "go",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 114,
  "active_shards" : 114,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 397,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 22.309197651663403
}

節點數量=N：1
集羣中所有索引的最大副本數量=R ：5

知識點：當節點加入和離開集羣時，主節點會自動重新分配分片，以確保分片的多個副本不會分配給同一個節點。換句話說，主節點不會將主分片分配給與其副本相同的節點，也不會將同一分片的兩個副本分配給同一個節點。
如果沒有足夠的節點相應地分配分片，則分片可能會處於未分配狀態。
就像下面這種，是一種健康的狀態。N=4,R=1。

由於我的集羣就一個節點，即Ｎ＝１；所以Ｒ＝０，才能滿足公式。

問題就轉嫁爲：
１）添加節點處理，即Ｎ增大；
２）刪除副本分片，即R置爲0。
R置爲0的方式，可以通過如下命令行實現：

curl -XPUT 'http://localhost:9200/index_name/_settings' -d'
{
  "number_of_replicas": 0
}'

再次查詢

但是索引很多的情況下，就全局設置個數

curl -XPUT 'http://localhost:9200/_settings' -d'
{
  "number_of_replicas": 0
}'

如果索引比較多的話，執行可能會耗時一些。

執行完畢後，仍然
不過，此時

此時可以考慮重新分配分片。
查詢所有unassigned的分片：

curl -XGET 'http://localhost:9200/_cat/shards' | grep UNASSIGNED

其中第2列代表分片編號，p代表是primary，主分片。可以看到對應如下的這個index

查詢下，爲啥未分片：
爲什麼集羣中的某些分片仍未分配:

curl -XGET localhost:9200/_cluster/allocation/explain?pretty

{
  "index" : "dev_index",
  "shard" : 4,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "ALLOCATION_FAILED",
    "at" : "2019-08-29T07:42:19.838Z",
    "failed_allocation_attempts" : 5,
    "details" : "failed to create shard, failure FileSystemException[/home/elasticsearch-5.4.1/data/nodes/0/indices/LOJ1KI0xQouDy6iwLeHtkw: No space left on device]",
    "last_allocation_status" : "no"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "E0_rcL8US_OIyPBJVmKHYA",
      "node_name" : "dev205",
      "transport_address" : "12.17.61.205:9300",
      "node_decision" : "no",
      "weight_ranking" : 1,
      "deciders" : [
        {
          "decider" : "max_retry",
          "decision" : "NO",
          "explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-08-29T07:42:19.838Z], failed_attempts[5], delayed=false, details[failed to create shard, failure FileSystemException[/home/elasticsearch-5.4.1/data/nodes/0/indices/LOJ1KI0xQouDy6iwLeHtkw: No space left on device]], allocation_status[deciders_no]]]"
        }
      ]
    }
  ]
}

原來是空間不足了

重新開啓分片：

curl -XPUT '12.17.61.205:9200/_cluster/settings' -d'{ "transient":    { "cluster.routing.allocation.enable":"all"     }}'

重啓後：

對於索引出現Unassigned 的情況，最好的解決辦法是reroute,如果不能reroute，則考慮重建分片，通過number_of_replicas的修改進行恢復。如果上述兩種情況都不能恢復，則考慮reindex。

參考
1、官網文檔地址：http://t.cn/RlttuVY
2、Elasticsearch unassigned shards 應急處理方案：http://t.cn/Rlwub5s
3、解決Unassigned Shards大探討：http://t.cn/RlwuVFn
4、快照&重新存儲數據方案：http://t.cn/RlwuXmm

如何將elastic search 的健康狀態由紅色red變爲綠色green

現狀

集羣狀態解讀

什麼是unassigned 分片？

unassigned 分片問題可能的原因？

集羣狀態紅色如何排查？

如何Fixed unassigned 分片問題？

公司新來一個幹練小夥，把 MyBatis 替換成 MyBatis-Plus，上線後哭暈在廁所。。。

Testin雲測上線華爲Pura 70系列真機測試服務！

5款開源、美觀、強大的WPF UI組件庫

10分鐘本地運行llama3及初體驗

golang 表格

手寫協議報文 c語言手法

甲骨文(Oracle)宣佈將以74億美元收購Sun公司

ElasticSearch Snowball token filter

Elastic search N-gram tokenizer

Elasticsearch中什麼是 tokenizer、analyzer、filter ?

Idea 多線程斷點被跳過

線上服務啓動卡死，堆棧分析

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結