監控指標以及prometheus規則-不斷完善中

(1)node exporter 標準性能指標

1)監控項
cpu使用率: (100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[5m])) 100))
內存使用率:(100 - (((node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes)/node_memory_MemTotal_bytes)
100))
磁盤使用率:(1- (node_filesystem_free_bytes{fstype=~"ext3|ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext3|ext4|xfs"}) ) * 100

2)prometheus規則

groups:
- name: alert-rule
    rules:
    - alert: NodeFilesystemUsage-high
        expr: (1-  (node_filesystem_free_bytes{fstype=~"ext3|ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext3|ext4|xfs"}) ) * 100 > 80
        for: 2m
        labels:
            severity: warning
        annotations:
            summary: "{{$labels.instance}}: High Node Filesystem usage detected"
            description: "{{$labels.instance}}: Node Filesystem usage is above 80% ,(current value is: {{ $value }})"
    - alert: NodeMemoryUsage
        expr: (100 - (((node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes)/node_memory_MemTotal_bytes) * 100))  > 80
        for: 2m
        labels:
            severity: warning
        annotations:
            summary: "{{$labels.instance}}: High Node Memory usage detected"
            description: "{{$labels.instance}}: Node Memory usage is above 80% ,(current value is: {{ $value }})"
    - alert: NodeCPUUsage
        expr: (100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100))  > 80
        for: 2m
        labels:
            severity: warning
        annotations:
            summary: "{{$labels.instance}}: Node High CPU usage detected"
            description: "{{$labels.instance}}: Node CPU usage is above 80% ,(current value is: {{ $value }})"

(2)mysql 監控性能指標

1)mysql性能指標

mysql is down :mysql_up

每秒查詢次數指標:rate(mysql_global_status_slow_queries[5m])

連接數指標:rate(mysql_global_status_threads_connected[5m]) > 200
    或可用連接mysql_global_variables_max_connections - mysql_global_status_threads_connected <200

慢查詢:rate(mysql_global_status_slow_queries[5m])

mysql主從複製 sql線程: mysql_slave_status_slave_sql_running 
 mysql主從延遲:rate(mysql_slave_status_seconds_behind_master[5m])

2)prometheus規則

groups:
- name: MySQLStatsAlert
    rules:
    - alert: MySQL is down
        expr: mysql_up == 0
        for: 1m
        labels:
            severity: critical
        annotations:
            summary: "Instance {{ $labels.instance }} MySQL is down"
            description: "MySQL database is down. This requires immediate action!"
    - alert: Mysql_High_QPS
        expr: rate(mysql_global_status_questions[5m]) > 500 
        for: 2m
        labels:
            severity: warning
        annotations:
            summary: "{{$labels.instance}}: Mysql_High_QPS detected"
            description: "{{$labels.instance}}: Mysql opreation is more than 500 per second ,(current value is: {{ $value }})"  
    - alert: Mysql_Too_Many_Connections
        expr: rate(mysql_global_status_threads_connected[5m]) > 200
        for: 2m
        labels:
            severity: warning
        annotations:
            summary: "{{$labels.instance}}: Mysql Too Many Connections detected"
            description: "{{$labels.instance}}: Mysql Connections is more than 100 per second ,(current value is: {{ $value }})"  
    - alert: Mysql_Too_Many_slow_queries
        expr: rate(mysql_global_status_slow_queries[5m]) > 3
        for: 2m
        labels:
            severity: warning
        annotations:
            summary: "{{$labels.instance}}: Mysql_Too_Many_slow_queries detected"
            description: "{{$labels.instance}}: Mysql slow_queries is more than 3 per second ,(current value is: {{ $value }})"  
    - alert: SQL thread stopped 
        expr: mysql_slave_status_slave_sql_running == 0
        for: 1m
        labels:
            severity: critical
        annotations:
            summary: "Instance {{ $labels.instance }} SQL thread stopped"
            description: "SQL thread has stopped. This is usually because it cannot apply a SQL statement received from the master."
    - alert: Slave lagging behind Master
        expr: rate(mysql_slave_status_seconds_behind_master[5m]) >30 
        for: 1m
        labels:
            severity: warning 
        annotations:
            summary: "Instance {{ $labels.instance }} Slave lagging behind Master"
            description: "Slave is lagging behind Master. Please check if Slave threads are running and if there are some performance issues!"

(3)pod性能指標
1)容器性能指標

pod的cpu使用率:container_memory_usage_bytes{container_name!=""} / container_spec_memory_limit_bytes{container_name!=""}  *100 != +Inf
pod的內存使用率: sum by (pod_name)( rate(container_cpu_usage_seconds_total{image!=""}[1m] ) ) * 100

2)prometheus規則

groups:
- name: noah_pod.rules
  rules:
  - alert: PodMemUsage
    expr: container_memory_usage_bytes{container_name!=""} / container_spec_memory_limit_bytes{container_name!=""}  *100 != +Inf > 80
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "{{$labels.name}}: Pod High Mem usage detected"
      description: "{{$labels.name}}: Pod Mem is above 80% ,(current value is: {{ $value }})"
  - alert: PodCpuUsage
    expr: sum by (pod_name)( rate(container_cpu_usage_seconds_total{image!=""}[1m] ) ) * 100 > 80
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "{{$labels.name}}: Pod High CPU usage detected"
      description: "{{$labels.name}}: Pod CPU is above 80% ,(current value is: {{ $value }})"

參考文檔:

http://ylzheng.com/2018/04/02/use-prometheus-monitor-mysql/
https://www.cnblogs.com/zengkefu/p/5658252.html
https://blog.csdn.net/qq_25934401/article/details/82594478
https://blog.csdn.net/qq_39570637/article/details/81711328
https://blog.csdn.net/ichglauben/article/details/82381438

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章