kafka集羣監控(kafka_exporter&prometheus&Grafana)

爲了更直觀的展示kafka實時消息生產速率以及某一topic下group_id與當前kafka之間的消息積壓情況,採用kafka_exporter,promehues,grafana將相關指標實時展示

1. 下載 kafka_exporter(所在機器需與kafka集羣網絡相通)

wget https://github.com/danielqsj/kafka_exporter/releases/download/v1.2.0/kafka_exporter-1.2.0.linux-amd64.tar.gz

解壓:  tar -zxvf kafka_exporter-1.2.0.linux-amd64.tar.gz 

切到相應目錄:   cd kafka_exporter-1.2.0.linux-amd64

 ./kafka_exporter --kafka.server=kafkaIP或者域名:9092 & (只需填寫kafka集羣的一個ip即可

對應的服務端口爲9308

2.下載prometheus 

 wget https://github.com/prometheus/prometheus/releases/download/v2.15.1/prometheus-2.15.1.linux-amd64.tar.gz

解壓  

tar -zxvf prometheus-2.15.1.linux-amd64.tar.gz

 prometheus.yml爲promethues配置文件,可以先啓動驗證服務可用性

 

 cd ./prometheus-2.15.1.linux-amd64

prometheus.yml 這個文件是對應的配置文件,在未添加kafka_exporter之前可以先啓動查看下服務是否正常

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
 # evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
#alerting:
#  alertmanagers:
#  - static_configs:
#    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

 ip:9090即可打開其web頁面

將kafka_exporter對應的服務添加進preomethues(添加在配置文件後面就行


    static_configs:
    - targets: ['localhost:9090']
  - job_name: 'vpc_md_kafka'
    static_configs:
    - targets: ['localhost:9308']

重新啓動promethues,在status裏的targets看到服務正常,下面就是使用grafana將監控指標可視化

3. 下載grafana

wget https://dl.grafana.com/oss/release/grafana-6.5.2-1.x86_64.rpm

 root用戶下執行

yum localinstall grafana-6.5.2-1.x86_64.rpm

打開grafana的web頁面  ip:3000,添加promethues數據源

導入監控圖標,對於grafana的監控,官方有監控圖標,不需要自己搞

鼠標離開7589的框就會跳轉到下一步

官方的監控界面是這樣的,我這兒是測試華景,所以沒什麼數據

可以自己寫查詢滿足需求,生產環境的監控界面弄的也比較簡單,三個圖標

生產環境的監控環境配置及對應查詢語句

對應圖標的三條查詢語句爲

 

sum(irate(kafka_topic_partition_current_offset{topic !~ "__consumer_offsets|__transaction_state|test",env="$env",app="$app"}[30s])) by (topic) >= 0

 

sum(kafka_consumergroup_lag{env="$env",app="$app"})  by (topic,consumergroup)

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章