爲了更直觀的展示kafka實時消息生產速率以及某一topic下group_id與當前kafka之間的消息積壓情況,採用kafka_exporter,promehues,grafana將相關指標實時展示
1. 下載 kafka_exporter(所在機器需與kafka集羣網絡相通)
wget https://github.com/danielqsj/kafka_exporter/releases/download/v1.2.0/kafka_exporter-1.2.0.linux-amd64.tar.gz
解壓: tar -zxvf kafka_exporter-1.2.0.linux-amd64.tar.gz
切到相應目錄: cd kafka_exporter-1.2.0.linux-amd64
./kafka_exporter --kafka.server=kafkaIP或者域名:9092 & (只需填寫kafka集羣的一個ip即可)
對應的服務端口爲9308
2.下載prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.15.1/prometheus-2.15.1.linux-amd64.tar.gz
解壓
tar -zxvf prometheus-2.15.1.linux-amd64.tar.gz
prometheus.yml爲promethues配置文件,可以先啓動驗證服務可用性
cd ./prometheus-2.15.1.linux-amd64
prometheus.yml 這個文件是對應的配置文件,在未添加kafka_exporter之前可以先啓動查看下服務是否正常
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
# evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
#alerting:
# alertmanagers:
# - static_configs:
# - targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
ip:9090即可打開其web頁面
將kafka_exporter對應的服務添加進preomethues(添加在配置文件後面就行)
static_configs:
- targets: ['localhost:9090']
- job_name: 'vpc_md_kafka'
static_configs:
- targets: ['localhost:9308']
重新啓動promethues,在status裏的targets看到服務正常,下面就是使用grafana將監控指標可視化
3. 下載grafana
wget https://dl.grafana.com/oss/release/grafana-6.5.2-1.x86_64.rpm
root用戶下執行
yum localinstall grafana-6.5.2-1.x86_64.rpm
打開grafana的web頁面 ip:3000,添加promethues數據源
導入監控圖標,對於grafana的監控,官方有監控圖標,不需要自己搞
鼠標離開7589的框就會跳轉到下一步
官方的監控界面是這樣的,我這兒是測試華景,所以沒什麼數據
可以自己寫查詢滿足需求,生產環境的監控界面弄的也比較簡單,三個圖標
生產環境的監控環境配置及對應查詢語句
對應圖標的三條查詢語句爲
sum(irate(kafka_topic_partition_current_offset{topic !~ "__consumer_offsets|__transaction_state|test",env="$env",app="$app"}[30s])) by (topic) >= 0
sum(kafka_consumergroup_lag{env="$env",app="$app"}) by (topic,consumergroup)