Elasticsearch運維指南

1.常用監控工具

1.1 cerebro⼯具

地址：https://github.com/lmenezes/cerebro

1.2 Kibana Stack Monitoring

地址: https://www.elastic.co/guide/en/kibana/current/xpack-monitoring.html

2. 關鍵指標監控

2.1 集羣健康維度：分⽚和節點

通過GET _cluster/health監視羣集時，可以查詢集羣的狀態、節點數和活動分⽚計數的信息。還可

以查看重新定位分⽚，初始化分⽚和未分配分⽚的計數。

{
  "cluster_name" : "my_cluster",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 43,
  "active_shards" : 77,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 2,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 97.46835443037975
}

集羣運⾏的重要指標：

Status：狀態羣集的狀態。紅⾊：部分主分⽚未分配。⻩⾊：部分副本分⽚未分配。綠⾊：所有

分⽚分配ok。

Nodes：羣集中的節點總數。

Shards：活動分⽚計數。集羣中活動分⽚的數量。 Relocating Shards：重定位分⽚。由於節

點丟失⽽移動的分⽚計數。

Initializing Shards：初始化分⽚。由於添加索引等⽽初始化的分⽚計數。

Unassigned Shards:未分配的分⽚。尚未創建或分配副本的分⽚計數。

未分配狀態及原因解讀：

（1）INDEX_CREATED
 Unassigned as a result of an API creation of an index.
（2）CLUSTER_RECOVERED
 Unassigned as a result of a full cluster recovery.
（3）INDEX_REOPENED
 Unassigned as a result of opening a closed index.
（4）DANGLING_INDEX_IMPORTED
 Unassigned as a result of importing a dangling index.
（5）NEW_INDEX_RESTORED
 Unassigned as a result of restoring into a new index.
（6）EXISTING_INDEX_RESTORED
 Unassigned as a result of restoring into a closed index.
（7）REPLICA_ADDED
 Unassigned as a result of explicit addition of a replica.
（8）ALLOCATION_FAILED
 Unassigned as a result of a failed allocation of the shard.
（9）NODE_LEFT
 Unassigned as a result of the node hosting it leaving the cluste
r.
（10）REROUTE_CANCELLED
 Unassigned as a result of explicit cancel reroute command.
（11）REINITIALIZED
When a shard moves from started back to initializing, for exampl
e, with shadow replicas.
（12）REALLOCATED_REPLICA
A better replica location is identified and causes the existing r
eplica allocation to be cancelled.

2.2 搜索性能維度：請求率和延遲

通過測量系統處理請求的速率和每個請求的使⽤時間可以衡量集羣的有效性；

當集羣收到請求時，可能需要跨多個節點訪問多個分⽚中的數據。系統處理和返回請求的速率、當

前正在進⾏的請求數以及請求的持續時間等核⼼指標是衡量集羣健康重要因素。

請求過程本身分爲兩個階段：

第⼀是查詢階段（query phase），集羣將請求分發到索引中的每個分⽚（主分⽚或副本分

⽚）。

第⼆個是獲取階段（fetch phrase），查詢結果被收集，處理並返回給⽤戶。

通過GET blogs_analyzed/_stats可以查看對應目標索引的狀態，search狀態查詢的結果如下:

 "search" : {
        "open_contexts" : 0,
        "query_total" : 0,
        "query_time_in_millis" : 0,
        "query_current" : 0,
        "fetch_total" : 0,
        "fetch_time_in_millis" : 0,
        "fetch_current" : 0,
        "scroll_total" : 0,
        "scroll_time_in_millis" : 0,
        "scroll_current" : 0,
        "suggest_total" : 0,
        "suggest_time_in_millis" : 0,
        "suggest_current" : 0
      }

請求檢索性能相關的重要指標如下：

query_current：當前正在進⾏的查詢數。集羣當前正在處理的查詢計數。
fetch_current：當前正在進⾏的fetch次數。集羣中正在進⾏的fetch計數。
query_total：查詢總數。集羣處理的所有查詢的聚合數。
query_time_in_millis：查詢總耗時。所有查詢消耗的總時間（以毫秒爲單位）。
fetch_total：提取總數。集羣處理的所有fetch的聚合數。
fetch_time_in_millis：fetch所花費的總時間。所有fetch消耗的總時間（以毫秒爲單位）。

2.3 索引性能維度：刷新(refresh)和合並(Merge)時間

監視⽂檔的索引速率（ indexing rate ）和合並時間（merge time）有助於在開始影響集羣性能之

前提前識別異常和相關問題。將這些指標與每個節點的運⾏狀況並⾏考慮，這些指標爲系統內的潛

問題提供重要線索，爲性能優化提供重要參考。

可以通過GET /_nodes/stats獲取索引性能指標，並可以在節點，索引或分⽚級別進⾏彙總。

        "merges" : {
          "current" : 0,
          "current_docs" : 0,
          "current_size_in_bytes" : 0,
          "total" : 184,
          "total_time_in_millis" : 23110,
          "total_docs" : 1017919,
          "total_size_in_bytes" : 342535815,
          "total_stopped_time_in_millis" : 0,
          "total_throttled_time_in_millis" : 0,
          "total_auto_throttle_in_bytes" : 524288000
        },
        "refresh" : {
          "total" : 1842,
          "total_time_in_millis" : 20327,
          "external_total" : 1583,
          "external_total_time_in_millis" : 19195,
          "listeners" : 0
        },
        "flush" : {
          "total" : 21,
          "periodic" : 0,
          "total_time_in_millis" : 131
        },

索引性能維度相關重要指標：

refresh.total：總刷新計數。刷新總數的計數。
refresh.total_time_in_millis：刷新總時間。彙總所有花在刷新的時間（以毫秒爲單位進⾏測
量）。
merges.current_docs：⽬前的合併。合併⽬前正在處理中。
merges.total_docs：合併總數。合併總數的計數。
merges.total_time_in_millis。合併花費的總時間。合併段的所有時間的聚合。

2.4 節點運⾏狀況維度：內存，磁盤和CPU指標

Elasticsearch是⼀個嚴重依賴內存以實現性能的系統，因此密切關注內存使⽤情況與每個節點的

運⾏狀況和性能相關。改進指標的相關配置更改也可能會對內存分配和使⽤產⽣負⾯影響，因此記

住從整體上查看系統運⾏狀況⾮常重要。

監視節點的CPU使⽤情況並查找峯值有助於識別節點中的低效進程或潛在問題。CPU性能與Java虛

擬機（JVM）的垃圾收集過程密切相關。

GET /_cat/nodes?v&h=id,disk.total,disk.used,disk.avail,disk.used_percent,ram.current,ram.percent,ram.max,cpu
id   disk.total disk.used disk.avail disk.used_percent ram.current ram.percent ram.max cpu
jg1X     19.9gb    16.2gb      3.7gb             81.15       1.8gb          96   1.9gb   0
ZnpM     19.9gb    16.4gb      3.4gb             82.52       1.7gb          91   1.9gb   0
Hyt2     19.9gb    16.2gb      3.7gb             81.16       1.8gb          96   1.9gb   0

節點運⾏的重要指標：

disk.total ：總磁盤容量。節點主機上的總磁盤容量。
disk.used：總磁盤使⽤量。節點主機上的磁盤使⽤總量。
avail disk：可⽤磁盤空間總量。
disk.avail disk.used_percent：使⽤的磁盤百分⽐。已使⽤的磁盤百分⽐。
ram：當前的RAM使⽤情況。當前內存使⽤量（測量單位）。
percent ram：RAM百分⽐。正在使⽤的內存百分⽐。
max : 最⼤RAM。節點主機上的內存總量
cpu：中央處理器。正在使⽤的CPU百分⽐。

2.4.1 如何查看io壓⼒

1 iostat -d -k 1 10     #查看TPS和吞吐量信息(磁盤讀寫速度單位爲KB)
2 iostat -d -m 2        #查看TPS和吞吐量信息(磁盤讀寫速度單位爲MB)
3 iostat -d -x -k 1 10  #查看設備使⽤率（%util）、響應時間（await）
4 iostat -c 1 10        #查看cpu狀態實例分析

可以根據 iowait , ioutil 等值來綜合判斷. 當iowait⻓期接近100%基本代表io系統出現瓶頸了。這時候可

以⽤iotop命令來診斷出具體是什麼進程在消耗io資源。

2.5 JVM運⾏狀況維度：堆，GC和池⼤⼩（Pool Size）

作爲基於Java的應⽤程序，Elasticsearch在Java虛擬機（JVM）中運⾏。JVM在其“堆”分配中管

理其內存，並通過garbage collection進⾏垃圾回收處理。

如果應⽤程序的需求超過堆的容量，則應⽤程序開始強制使⽤連接的存儲介質上的交換空間。雖然

這可以防⽌系統崩潰，但它可能會對集羣的性能造成嚴重破壞。監視可⽤堆空間以確保系統具有⾜

夠的容量對於集羣的健康⾄關重要。

JVM內存分配給不同的內存池。您需要密切注意這些池中的每個池，以確保它們得到充分利⽤並且

沒有被超限利⽤的⻛險。

垃圾收集器（GC）很像物理垃圾收集服務。我們希望讓它定期運⾏，並確保系統不會讓它過載。理

想情況下，GC性能視圖應類似均衡波浪線⼤⼩的常規執⾏。尖峯和異常可以成爲更深層次問題的指

標。

可以通過GET /_nodes/stats 命令檢索JVM度量標準。

      "jvm" : {
        "timestamp" : 1623964232904,
        "uptime_in_millis" : 5786635,
        "mem" : {
          "heap_used_in_bytes" : 294418688,
          "heap_used_percent" : 28,
          "heap_committed_in_bytes" : 1038876672,
          "heap_max_in_bytes" : 1038876672,
          "non_heap_used_in_bytes" : 167878480,
          "non_heap_committed_in_bytes" : 181440512,
          "pools" : {
            "young" : {
              "used_in_bytes" : 153960136,
              "max_in_bytes" : 279183360,
              "peak_used_in_bytes" : 279183360,
              "peak_max_in_bytes" : 279183360
            },
            "survivor" : {
              "used_in_bytes" : 12341536,
              "max_in_bytes" : 34865152,
              "peak_used_in_bytes" : 34865136,
              "peak_max_in_bytes" : 34865152
            },
            "old" : {
              "used_in_bytes" : 128117016,
              "max_in_bytes" : 724828160,
              "peak_used_in_bytes" : 128117016,
              "peak_max_in_bytes" : 724828160
            }
          }
        },
        "threads" : {
          "count" : 108,
          "peak_count" : 114
        },
        "gc" : {
          "collectors" : {
            "young" : {
              "collection_count" : 256,
              "collection_time_in_millis" : 2136
            },
            "old" : {
              "collection_count" : 2,
              "collection_time_in_millis" : 151
            }
          }
        },

JVM運⾏的重要指標如下：

mem：內存使⽤情況。堆和⾮堆進程和池的使⽤情況統計信息。
threads：當前使⽤的線程和最⼤數量。
gc：垃圾收集。算和垃圾收集所花費的總時間。

3. 常⽤命令清單

3.1 查看集羣狀態

# 集羣狀態查看
GET _cluster/health
# 找到對應的索引,health可選值green, yellow, red
GET /_cat/indices?v&health=red
# 詳細查看未分配原因
GET _cluster/allocation/explain
# 查看具體的索引,分⽚以及未分配原因
GET _cat/shards?h=index,shard,prirep,state,unassigned.reason&v
# 查看數據分佈情況
GET _cat/allocation?v

3.2 節點間分⽚移動

⼿動移動分配分⽚。將啓動的分⽚從⼀個節點移動到另⼀節點。

POST /_cluster/reroute
{
  "commands": [
    {
      "move": {
        "index": "indexname",
        "shard": 1,
        "from_node": "nodename",
        "to_node": "nodename"
      }
    }
  ]
}

POST /_cluster/reroute
{
  "commands": [
    {
      "move": {
        "index": "test",
        "shard": 0,
        "from_node": "node1",
        "to_node": "node2"  # 將test 的shard 0分⽚從node1移動到nod
e2
      }
    },
    {
      "allocate_replica": {
        "index": "test",
        "shard": 1,
        "node": "node3"   # test索引的shard 1 分配到node3
      }
    },
    {
      "cancel": {
        "index": "test",
        "shard": 0,
        "node": "node2"   # 取消在node2上⾯分配test的shard 0 分⽚
      }
    }
  ]
}

allocate_stale_primary：以集羣內存在的陳舊的分⽚內容，再次分配。
allocate_empty_primary：分配空內容的分⽚

POST /_cluster/reroute
{
  "commands": [
    {
      "allocate_stale_primary": {
        "index": "dcvs_aps",
        "shard": 3,
        "node": "MYSQL2",
        "accept_data_loss": true
      }
    }
  ]
}

設置分配的最⼤失敗重試次數，默認是5次，當然系統分配到達重試次數後，可以⼿動分配分⽚。"index.allocation.max_retries" : "5",

POST _cluster/reroute?retry_failed

3.3 集羣節點優雅下線

保證集羣顏⾊綠⾊的前提下，將某個節點優雅下線。

PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.exclude._ip": "192.168.248.1"
  }
}

3.4 強制刷新

刷新索引是確保當前僅存儲在事務⽇志中的所有數據也永久存儲在Lucene索引中。

注意:7.6及之後版本會廢棄同步刷新改⽤_flush

 # >=7.6版本
 POST /_flush

 # <7.6版本使⽤同步刷新
 POST /_flush/synced

3.5 更改併發分⽚的數量以平衡集羣

控制在集羣範圍內允許多少併發分⽚重新平衡。默認值爲2。

PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.cluster_concurrent_rebalance": 2
  }
}

3.6 開啓和關閉分片自動重均衡

Elasticsearch在業務高峯期進行分片重均衡會造成網絡延遲或者io異常等現象，所以可以選擇在業務低峯期開啓ES分片自動均衡。

關閉ES自動重均衡
PUT _cluster/settings
{
  "transient": {
    "cluster.routing.rebalance.enable":"none"
  }
}
開啓ES自動重均衡
PUT _cluster/settings
{
  "transient": {
    "cluster.routing.rebalance.enable":"all"
  }
}

註釋:

transient：臨時調整集羣配置，整個集羣重啓後，對應參數失效；
persistent：永久調整集羣配置，整個集羣重啓後，對應參數仍然有效；

3.7 更改每個節點同時恢復的分⽚數量

如果節點已從集羣斷開連接，則其所有分⽚將都變爲未分配狀態。經過⼀定的延遲後，分⽚將分配到其他

位置。每個節點要恢復的併發分⽚數由該設置確定。

PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.node_concurrent_recoveries": 6
  }
}

3.8 調整恢復速度

爲了避免集羣過載，Elasticsearch限制了分配給恢復的速度。你可以仔細更改該設置，以使其恢復更快。

如果此值調的太⾼，則正在進⾏的恢復可能會消耗過多的帶寬和其他資源，這可能會使集羣不穩定。

PUT /_cluster/settings
{
  "transient": {
    "indices.recovery.max_bytes_per_sec": "80mb"
  }
}

3.9 清除節點上的緩存

如果節點達到較⾼的JVM值，則可以在節點級別上調⽤該API 以使 Elasticsearch 清理緩存。

注意:這會降低性能，但可以使你擺脫OOM（內存不⾜）的困擾。

 # 清除所有索引的所有緩存
 POST /_cache/clear

 # 清除特定索引的特定cache
 POST /my-index-000001/_cache/clear?fielddata=true
 POST /my-index-000001/_cache/clear?query=true
 POST /my-index-000001/_cache/clear?request=true

3.10 調整斷路器

斷熔器通過內部檢查（字段的類型、基數、⼤⼩等等）來估算⼀個查詢需要的內存。它然後檢查要求加載的 fielddata 是否會導致 fielddata 的總量超過堆的配置⽐例。

如果估算查詢的⼤⼩超出限制，就會觸發斷路器，查詢會被中⽌並返回異常。這都發⽣在數據加載之前，也就意味着不會引起 OutOfMemoryException 。

#1.查看斷路器
GET /_nodes/stats/breaker?pretty
GET /_cluster/settings?include_defaults&flat_settings

#2.⽗斷路器:默認95%,所有斷路器的⽗斷路器,保證所有的heap使⽤率不會超過該值
indices.breaker.total.limit

#3.fielddata斷路器:fielddata斷路器可以估算每⼀個field的所有數據被加載到內存中，需要耗費多⼤的內存。
 # 默認40%heap,7.9之前是60%
 indices.breaker.fielddata.limit
 #可以配置估算因⼦，估算出來的值會乘以這個估算因⼦，留⼀些buffer，默認是1.03。
 indices.breaker.fielddata.overhead
indices.breaker.fielddata.limit 必須⼤於 indices.fielddata.cache.size，否則只會觸發fielddata
circuit breaker，⽽不會剔除舊的fielddata。

#4.request circuit breaker
request circuit breaker可以阻⽌由於某個請求對應的⼀些數據結構造成的OOM（⽐如⼀個聚合請求可能
會⽤jvm內存來做⼀些彙總計算）。
indices.breaker.request.limit # 默認 60%

#5.in flight request circuit breaker
flight request circuit breaker可以限制當前所有進來的transport或http層的請求所使⽤內存總量，這個
內存的使⽤量就是請求⾃⼰本身的⻓度。
network.breaker.inflight_requests.limit # 默認100%
設置方法如下:
PUT /_cluster/settings
{
  "persistent": {
    "indices.breaker.total.limit": "40%"
  }
}

3.11 ElasticSearch 使⽤低速設備的 Tips

單個分⽚上⼀次合併的最⼤線程數。該參數影響lucene後臺的合併線程數量，默認設置只適合SDD。如果多個合併線程可能導致io壓⼒過⼤。

PUT movies/_settings
{
  "index.merge.scheduler.max_thread_count":1
}

3.12 限制每個節點分⽚數

index級別上設置index.routing.allocation.total_shards_per_node 避免同⼀個index的多個shard分配到同⼀個node

# index級別
index.routing.allocation.total_shards_per_node

# node級別,所有索引
cluster.routing.allocation.total_shards_per_node

3.13 查看任務tasks

返回集羣中⼀個或多個節點上當前執⾏的任務信息。

 GET _tasks
 GET _tasks?nodes=nodeId1,nodeId2
 GET _tasks?nodes=nodeId1,nodeId2&actions=cluster:*


 # task詳情
 GET /_tasks/<task_id>
 GET _tasks/oTUltX4IQMOUUVeiohTt8A:124
 GET _tasks?actions=*search&detailed

3.14 取消任務

 POST _tasks/oTUltX4IQMOUUVeiohTt8A:12345/_cancel
 POST _tasks/_cancel?nodes=nodeId1,nodeId2&actions=*reindex

3.15 查看pending tasks

返回尚未執⾏的集羣級更改列表，如創建索引、更新映射、分配碎⽚等。

 GET /_cluster/pending_tasks
 GET /_cat/pending_tasks

3.16 集羣級別分⽚分配

# 啓⽤或禁⽤對特定種類的分⽚的分配
cluster.routing.allocation.enable: all # 默認all
all -（默認值）允許爲所有類型的分⽚分配分⽚。
primaries -僅允許爲主分⽚分配。
new_primaries -僅允許爲新索引的主分⽚分配。
none -不允許對任何索引進⾏任何類型的分⽚分配。
# ⼀個節點上允許進⾏多少併發的傳⼊分⽚恢復
 cluster.routing.allocation.node_concurrent_incoming_recoveries #
認2
 # ⼀個節點上允許進⾏多少併發的傳出分⽚恢復
 cluster.routing.allocation.node_concurrent_outgoing_recoveries #
認2
 # 同時設置以上兩個值
 cluster.routing.allocation.node_concurrent_recoveries
 # 節點初始化主分⽚數
 cluster.routing.allocation.node_initial_primaries_recoveries #
默認4

3.17 分⽚平衡配置

羣集的平衡僅取決於每個節點上的分⽚數量以及這些分⽚所屬的索引

 # 爲特定種類的分⽚啓⽤或禁⽤重新平衡
 cluster.routing.rebalance.enable
 all -（默認值）允許所有種類的分⽚進⾏分⽚平衡。
 primaries -僅允許對主要分⽚進⾏分⽚平衡。
 replicas -僅允許對副本分⽚進⾏分⽚平衡。
 none -任何索引都不允許任何形式的分⽚平衡。

 # 允許控制在集羣範圍內允許多少併發分⽚重新平衡。默認爲2
 cluster.routing.allocation.cluster_concurrent_rebalance

3.18 基於磁盤的分⽚分配

# 默認爲true。設置爲false禁⽤磁盤分配決定器。
cluster.routing.allocation.disk.threshold_enabled
cluster.routing.allocation.disk.threshold_enabled # 默認true,啓⽤磁盤空間閾值檢查
cluster.routing.allocation.disk.watermark.low # 默認85%,分⽚不會分配(除新建的索引的主分⽚不受影響)
cluster.routing.allocation.disk.watermark.high # 默認90%,達到該值,ES會做relocate,影響所有分⽚
cluster.routing.allocation.disk.watermark.flood_stage # 默認95%, index.blocks.read_only_allow_delete,當磁盤使⽤率低於95%,(>=7.4)⾃動釋放index.blocks
cluster.info.update.interval # 默認30s設置多久檢查⼀次磁盤
# 可以配置具體⼤⼩
PUT _cluster/settings
{
  "transient": {
    "cluster.routing.allocation.disk.watermark.low": "100gb",
    "cluster.routing.allocation.disk.watermark.high": "50gb",
    "cluster.routing.allocation.disk.watermark.flood_stage": "10gb",
    "cluster.info.update.interval": "1m"
  }
}

 # 或者使⽤百分⽐
PUT _cluster/settings
{
  "transient": {
    "cluster.routing.allocation.disk.watermark.low": "85%",
    "cluster.routing.allocation.disk.watermark.high": "90%",
    "cluster.routing.allocation.disk.watermark.flood_stage": "95%",
    "cluster.info.update.interval": "1m"
  }
}
 # 當磁盤使⽤率超過flood_stage閾值是索引被設置爲只讀,可通過以下命令⼿動恢復讀寫
 PUT /my-index-000001/_settings
 {
 "index.blocks.read_only_allow_delete": null
 }

3.19 Segment查看段/合併段

GET /twitter/_segments
POST /twitter/_forcemerge?max_num_segments=1

3.20 關閉模糊_all / *刪除（避免誤刪）

 PUT _cluster/settings
{
"persistent": {
"action.destructive_requires_name":true
}
}