在一次巡检中发现一个 Elasticsearch 集群处于 red 状态:
$ curl --user xxx:xxx -X GET 'localhost:9200/_cat/health?v&pretty'
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1649831279 06:27:59 cluster-sk red 6 6 3155 3154 0 0 2 0 - 99.9%
开始 google,找到如下说法:
- red 状态表示存在主分片未分配到任一节点;
- yellow表示存在副本分片未分配。
在上面集群状态中也有显示 unassign 2
,那么,先尝试对未分配的分片重新分配一下:
$ curl --user xxxx:xxxx -X POST "localhost:9200/_cluster/reroute?retry_failed=true"
再次查看集群状态:
$ curl --user elastic:elastic -X GET 'localhost:9200/_cat/health?v&pretty'
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1649834617 07:23:37 cluster-sk green 6 6 3157 3156 0 0 0 0 - 100.0%
green!问题解决~
Elasticsearch 相关命令
-
查看 cluster 状态
# 显示title和结果 $ curl --user xxxx:xxxx -X GET 'localhost:9200/_cat/health?v&pretty' # 只显示结果 $ curl --user xxxx:xxxx -X GET "localhost:9200/_cluster/health?pretty"
-
查看 cluster 详情
$ curl --user xxxx:xxxx -X GET 'localhost:9200/_cluster/stats?human&pretty'
-
查看 shards 分片
$ curl --user xxxx:xxxx -X GET "localhost:9200/_cat/shards"
-
查看 node 状态
$ curl --user xxxx:xxxx -X GET "localhost:9200/_cat/nodes?v"
-
查看 node 详情
$ curl --user xxxx:xxxx -X GET "localhost:9200/_nodes/process?pretty"
-
查看 allocation 详情:显示错误详细原因(结果可结合 json 解析器 分析)
$ curl --user xxxx:xxxx -X GET "localhost:9200/_cluster/allocation/explain"
-
对 unsigned 分片进行重新分配
$ curl --user xxxx:xxxx -X POST "localhost:9200/_cluster/reroute?retry_failed=true"
-
按日期删除索引:https://juejin.cn/post/6844903472878321671
$ curl -u xxxx:xxxx -H'Content-Type:application/json' -d' { "query": { "range": { //范围 "@timestamp": {//时间字段 "lt": "now-7d",//lt是小于(<),lte是小于等于(<=),gt是大于(>),gte是大于等于(>=),now-7d是当前时间减7天 "format": "epoch_millis" } } } } ' -XPOST "http://127.0.0.1:9200/*-*/_delete_by_query?pretty"
-
热变更 es 配置:https://www.elastic.co/guide/en/elasticsearch/reference/7.6/cluster-update-settings.html