寫在前面
在按照下面步驟操作之前,請先確保服務器已經部署k8s,prometheus以及prometheus operator,關於這些環境的部署,可以自行查找相關資料安裝部署,本文檔便不在此贅述。
關於prometheus監控這部分,大致的系統架構圖如下,感興趣的同學可以自行研究一下,這裏就不再具體說明。
1、Deployment(工作負載)以及Service(服務)部署
配置yaml可參考如下:
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: elasticsearch-exporter
name: elasticsearch-exporter
namespace: prometheus-exporter
spec:
replicas: 1
selector:
matchLabels:
app: elasticsearch-exporter
template:
metadata:
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '9114'
prometheus.io/path: 'metrics'
labels:
app: elasticsearch-exporter
spec:
containers:
- command:
- '/bin/elasticsearch_exporter'
# 設置賬號密碼格式參考:--es.uri=http://username:password@localhost:9200
- '--es.uri=http://admin:[email protected]:9200'
image: prometheuscommunity/elasticsearch-exporter:v1.5.0
imagePullPolicy: IfNotPresent
name: elasticsearch-exporter
ports:
- containerPort: 9114
---
apiVersion: v1
kind: Service
metadata:
labels:
app: elasticsearch-exporter
name: elasticsearch-exporter-svc
namespace: prometheus-exporter
spec:
ports:
- name: http
port: 9114
protocol: TCP
targetPort: 9114
selector:
app: elasticsearch-exporter
type: ClusterIP
說明:
1> 關於yaml中配置,prometheus operator官方也有對應的模板說明,官方地址可如下:https://github.com/prometheus-community/elasticsearch_exporter
2> 關於elasticsearch exporter 鏡像版本可以根據需要選擇對應的版本,官方鏡像倉庫地址如下:https://hub.docker.com/r/prometheuscommunity/elasticsearch-exporter/tags
3> 部署成功圖如下:
(1)Deployment(工作負載)
(2)Service(服務)
2、創建ServiceMonitor配置文件
yaml配置文件如下:
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: elasticsearch-exporter
name: elasticsearch-exporter-
namespace: prometheus-exporter
spec:
endpoints:
- honorLabels: true
interval: 1m
path: /metrics
port: http
scheme: http
params:
target:
- 'es-cluster.monitorsoftware:9200'
relabelings:
- sourceLabels: [__param_target]
targetLabel: instance
namespaceSelector:
matchNames:
- prometheus-exporter
selector:
matchLabels:
app: elasticsearch-exporter
說明:
1> prometheus operator是通過ServiceMonitor發現監控目標,並對其進行監控。serviceMonitor 是對service 獲取數據的一種方式。
- promethus-operator可以通過serviceMonitor 自動識別帶有某些 label 的service ,並從這些service 獲取數據。
- serviceMonitor 也是由promethus-operator 自動發現的。
2> prometheus監控過程如下:
3> 部署成功圖如下
(1)serviceMonitor部署
(2)Prometheus部署成功圖
3、Prometheus告警規則配置
prometheus rule規則配置:
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
prometheus: k8s
role: alert-rules
name: elasticsearch-exporter-rules
namespace: k8s-monitor-system
spec:
groups:
- name: elasticsearch-exporter
rules:
- alert: es-ElasticsearchHealthyNodes
expr: elasticsearch_cluster_health_number_of_nodes < 3
for: 0m
labels:
severity: critical
annotations:
summary: Elasticsearch Healthy Nodes (instance {{ $labels.instance }})
description: "Missing node in Elasticsearch cluster\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: es-ElasticsearchClusterRed
expr: elasticsearch_cluster_health_status{color="red"} == 1
for: 0m
labels:
severity: critical
annotations:
summary: Elasticsearch Cluster Red (instance {{ $labels.instance }})
description: "Elastic Cluster Red status\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: es-ElasticsearchClusterYellow
expr: elasticsearch_cluster_health_status{color="yellow"} == 1
for: 0m
labels:
severity: warning
annotations:
summary: Elasticsearch Cluster Yellow (instance {{ $labels.instance }})
description: "Elastic Cluster Yellow status\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: es-ElasticsearchDiskOutOfSpace
expr: elasticsearch_filesystem_data_available_bytes / elasticsearch_filesystem_data_size_bytes * 100 < 10
for: 0m
labels:
severity: critical
annotations:
summary: Elasticsearch disk out of space (instance {{ $labels.instance }})
description: "The disk usage is over 90%\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: es-ElasticsearchHeapUsageTooHigh
expr: (elasticsearch_jvm_memory_used_bytes{area="heap"} / elasticsearch_jvm_memory_max_bytes{area="heap"}) * 100 > 90
for: 2m
labels:
severity: critical
annotations:
summary: Elasticsearch Heap Usage Too High (instance {{ $labels.instance }})
description: "The heap usage is over 90%\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: es-ElasticsearchHealthyDataNodes
expr: elasticsearch_cluster_health_number_of_data_nodes < 3
for: 0m
labels:
severity: critical
annotations:
summary: Elasticsearch Healthy Data Nodes (instance {{ $labels.instance }})
description: "Missing data node in Elasticsearch cluster\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
說明:
1> prometheusRule規則配置,可以參考模板配置,模板網址如下:https://awesome-prometheus-alerts.grep.to/rules#elasticsearch
2> 部署成功圖如下:
4、Grafana部署圖
4.1、grafana dashboard地址如下:https://grafana.com/grafana/dashboards
官方推薦模板ID爲:14191
4.2、dashboard效果圖如下