使用Flink Metric Reporter 對flink任務指標進行監控

原創

云想慕尘

2020-06-30 12:46

從flink1.8版本開始，reporter支持了將指標數據寫入influxdb，用戶可以自研可視化系統讀取influxdb中的數據進行可視化。

但是對中小型公司來講，可能因爲成本原因，大多並不會選擇自研可視化，我們選擇grafana進行flink metrics的可視化。

本文主要重點講述influxdb、prometheus爲Reporter，將flink的metrics數據寫入外部系統，並使用grafana進行可視化。

安裝配置方式手把手教學，如下：

1. influxdb

1.1 啓動

docker run -p 8086:8086 \
    -v /data/docker_volume/influxdb:/var/lib/influxdb \
    influxdb

1.2 連接influxdb

docker exec -it e9b352ee20d4 influx

1.3 建庫

create database flink

1.4 建用戶

create user "flink" with password 'flink#123centos' with all privileges;

2. Prometheus

2.1 下載prometheus和pushgateway

https://prometheus.io/download/

2.2 安裝

分別解壓prometheus和pushgateway

2.3 配置

vim prometheus.yml 
在末尾新增：

  # pushgateway
  - job_name: 'pushgateway'
    scrape_interval: 10s
    honor_labels: true #加上此配置exporter節點上傳數據中的⼀些標籤將不會被pushgateway節點的相同標籤覆蓋 
    static_configs:
     - targets: ['localhost:9091']
       labels:
         instance: pushgateway

2.4 啓動

./prometheus  > /dev/null 2>&1 &

./pushgateway --web.enable-admin-api > /dev/null 2>&1 &
參數 --web.enable-admin-api，表示啓用通過webapi方式管理數據，可以在webUI中刪除metrics或通過命令curl -X PUT http://localhost:9091/api/v1/admin/wipe刪除所有metrics

2.5 驗證

# prometheus:
打開 http://10.42.63.116:9090/targets

可以在targets中看到pushgateway，如下圖:

# pushgateway:
打開http://10.42.63.116:9091/

可以看到flink寫入的監控指標數據（需要flink任務重啓）

3. flink配置

3.1 修改flink配置文件

vim flink-1.10.0/conf/flink-conf.yaml 

# 配置influxdb
metrics.reporter.influxdb.class: org.apache.flink.metrics.influxdb.InfluxdbReporter
metrics.reporter.influxdb.host: 10.42.63.116
metrics.reporter.influxdb.port: 8086
# db,username,password需要跟配置influxdb一致
metrics.reporter.influxdb.db: flink
metrics.reporter.influxdb.username: flink
metrics.reporter.influxdb.password: flink#123centos
#metrics.reporter.influxdb.retentionPolicy: one_hour
#metrics.reporter.influxdb.consistency: ANY 
#metrics.reporter.influxdb.connectTimeout: 60000
#metrics.reporter.influxdb.writeTimeout: 60000

# 配置prometheus
metrics.reporter.promgateway.class: org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter
metrics.reporter.promgateway.host: 10.42.63.116
metrics.reporter.promgateway.port: 9091 
# jobName直接指定，不需要事先在prometheus中配置
metrics.reporter.promgateway.jobName: tdflink_prom
metrics.reporter.promgateway.randomJobNameSuffix: true
# flink任務關閉後是否刪除prometheus中存儲的metrics，默認false，但設置爲true時，仍然無法有效刪除，詳見https://issues.apache.org/jira/browse/FLINK-11457，可通過pushgateway的webUI或api刪除
metrics.reporter.promgateway.deleteOnShutdown: true

# 收集操作系統指標
# Flag indicating whether Flink should report system resource metrics such as machine's CPU, memory or network usage.
metrics.system-resource: true

3.2 拷貝jar包

從flink-1.10.0/opt中拷貝influxdb和prometheus相應jar包到lib目錄

cp opt/flink-metrics-influxdb-1.10.0.jar ./lib
cp opt/flink-metrics-prometheus-1.10.0.jar ./lib

metric repoter 上報操作系統指標，下載jar包上傳到lib目錄下

jna-4.2.2.jar
jna-platform-4.2.2.jar
oshi-core-3.4.0.jar

3.3 啓動flink任務

# yarn-single-job
/home/admin/flink-1.10.0/bin/flink run -m yarn-cluster -p 100 -yjm 4g -ys 10 -ytm 16g -yqu root.flink -ynm etl_test  \
/home/admin/tiangx/applog_etl/jar_test/applog_etl-1.0-SNAPSHOT-jar-with-dependencies.jar \
--input-topic applog_raw \
--output-topic applog_test \
--bootstrap.servers 10.19.171.177:9092 \
--zookeeper.connect 10.19.171.177:2181 \
--group.id flink_applog_etl_test \
--redis 10.10.152.217 > /dev/null 2>&1 &

3.4 清除prometheus中歷史metrics

flink任務重啓，無法自動清除prometheus中歷史metrics，影響監控使用體驗（會看到已經停止的任務），建議手動清除，有如下兩種方式：

3.4.1 通過pushgateway的webUI刪除所有metrics：

3.4.2 通過pushgateway的api刪除metrics:

curl -X PUT http://localhost:9091/api/v1/admin/wipe

4. Grafana

4.1 安裝並啓動grafana

下載：
docker pull grafana/grafana

啓動：
docker run -d --name=grafana -p 3000:3000 grafana/grafana

首次打開grafana，http://localhost:3000/，
點擊skip跳過密碼驗證，第二次打開grafana需要密碼驗證，默認用戶admin，密碼admin，登錄後會提示修改密碼。

4.2 配置數據源

如下圖：

配置influxdb:

配置prometheus:

4.3 下載grafana模板

https://grafana.com/grafana/dashboards
搜索flink metrics模板並下載

4.3 將模板導入到grafana

點擊“import”，將下載的模板導入，完成後打開dashboard：

下載的grafana dashboard可能需要再次加以調整才能正確顯示。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

使用Flink Metric Reporter 對flink任務指標進行監控

1. influxdb

1.1 啓動

1.2 連接influxdb

1.3 建庫

1.4 建用戶

2. Prometheus

2.1 下載prometheus和pushgateway

2.2 安裝

2.3 配置

2.4 啓動

2.5 驗證

3. flink配置

3.1 修改flink配置文件

3.2 拷貝jar包

3.3 啓動flink任務

3.4 清除prometheus中歷史metrics

3.4.1 通過pushgateway的webUI刪除所有metrics：

3.4.2 通過pushgateway的api刪除metrics:

4. Grafana

4.1 安裝並啓動grafana

4.2 配置數據源

4.3 下載grafana模板

4.3 將模板導入到grafana

Spring Cloud 部署時如何使用 Kubernetes 作爲註冊中心和配置中心

flink 高可用 high-availability 配置的重試次數無效問題

flink yarn-session.sh啓動任務指定的應用名和隊列無效問題

可能是史上覆蓋flinksql功能最全的demo--part2

使用Flink Metric Reporter 對flink任務指標進行監控

推薦一款好用的elasticsearch Web管理工具cerebro

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結