基於 Prometheus、Grafana 的 EMQ X 物聯網 MQTT 服務器可視化運維監控

Prometheus 是由 SoundCloud 開源監控告警解決方案,支持多維 數據模型(時序由 metric 名字和 k/v 的 labels 構成),具備靈活的查詢語句(PromQL),支持多種數據採集 exporters;支持告警管理,基於指標實現告警監控;支持多種統計數據模型,圖形化展示友好,圖形展示除了內置的瀏覽器,也支持 Grafana 集成。

物聯網 MQTT 服務器 EMQ X 提供 emqx_statsd 插件,用於將 EMQ X 運行指標及 Erlang 虛擬機狀態數據輸出到第三方的監控系統如 Prometheus 中。通過 Prometheus 自帶的 node-exporter 還可以採集 Linux 服務器相關指標,實現服務器 + EMQ X 整體運維監控。

本文提供了 Prometheus + Grafana 整套 EMQ X 運維監控方案搭建過程。

安裝與準備

Docker 鏡像下載

# Docker 鏡像包下載
docker pull prom/node-exporter
docker pull prom/prometheus
docker pull prom/pushgateway

啓動 node-exporter

可選,用於收集服務器指標如 CPU、內存、網絡等,如果使用 Docker 安裝則需要映射目標服務器響應的狀態文件:

docker run -d -p 9100:9100 \
  -v "/proc:/host/proc:ro" \
  -v "/sys:/host/sys:ro" \
  -v "/:/rootfs:ro" \
  --net="host" \
  prom/node-exporter

啓動 pushgateway

pushgateway 用於接收 EMQ X 指標推送數據,需要保證 EMQ X 能夠訪問

docker run -d -p 9091:9091 prom/pushgateway

啓動 Prometheus

指定配置文件與監聽端口以啓動 Prometheus:

# 指定配置文件並啓動
docker run -p 9090:9090 \
	-v $PWD/prometheus.yaml:/etc/prometheus/prometheus.yaml \
	-d prom/prometheus \
	--config.file=/etc/prometheus/prometheus.yaml

Prometheus 配置文件 prometheus.yaml 樣例如下,詳細含義請參考 Prometheus 文檔

# prometheus.yaml
global:
  scrape_interval:     10s # 默認抓取時間
  evaluation_interval: 10s # 每10秒評估一次rules

  # 在本機上每一條時間序列上都會默認產生的,主要可以用於聯合查詢、遠程存儲、Alertmanager時使用。
  external_labels:
      monitor: 'emqx-monitor'

# 加載規則,依據 evaluation_interval 來定期評估rule
rule_files:
  # - "first.rules"
  # - "second.rules"
  - "/etc/prometheus/rules/*.rules"

# 數據拉取配置
scrape_configs:
  # 表示在這個配置內的時間序例,每一條都會自動添加上這個{job_name:"prometheus"}的標籤
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['127.0.0.1:9090']

	# 服務器物理機監控
  - job_name: 'node-exporter'
    scrape_interval: 5s
    static_configs:
      # node-exporter 根據實際情況填寫
      - targets: ['192.168.6.11:9100']
        labels:
          instance: wivwiv-local


  # EMQ X Pushgateway 監控
  - job_name: 'pushgateway'
    scrape_interval: 5s
    honor_labels: true
    static_configs:
      # pushgateway 根據實際情況填寫
      - targets: ['192.168.6.11:9091']

啓動 EMQ X statsd 插件

打開 etc/emqx_statsd.conf,確認以下配置:

## pushgateway 地址
statsd.push.gateway.server = http://127.0.0.1:9091
## 數據採集/推送週期(毫秒)
statsd.interval = 15000

啓動插件:

./bin/emqx_ctl load plugins emqx_statsd

效果查看

通過 docker ps -a 命令查看組件是否成功運行,等待數個推送週期後,打開 http://localhost:9090 Prometheus 控制面板查看採集數據。

Prometheus 僅提供簡單圖表數據展示,如需更精美的可視化展示請結合 Grafana 使用。

在這裏插入圖片描述

集成 Grafana

Grafana 是一個開源、通用的度量分析與可視化展示工具,通過數據源(如各類數據庫、開源組件),展示自定義報表、顯示圖表等。

啓動 Grafana

通過 Docker 拉取並啓動 Grafana 鏡像:

docker run -d --name=grafana -p 3000:3000 grafana/grafana

啓動成功後,瀏覽器訪問 http://127.0.0.1:3000 打開 Dashboard 控制檯。

配置 Prometheus 數據源

在 Grafana 中添加數據源,選擇 Prometheus 並填寫正確的地址完成數據源添加。

在這裏插入圖片描述

導入 Grafana 模板數據

emqx_statsd 插件提供了 Grafana 的 Dashboard 的模板文件,這些模板包含了大部分 EMQ X 監控數據的展示。用戶可直接導入到 Grafana 中,用以顯示 EMQ X 的監控狀態的圖標。

模板文件位於emqx_statsd/grafana_template 中,因 EMQ X 版本差異問題,可能存在部分圖表數據顯示錯誤的情況,請用戶手動調整適配。

點擊 Upload.json file 按鈕,導入後選擇對應的文件夾與數據源即可。

在這裏插入圖片描述

效果展示

完成整套系統搭建並運行一段時間後,Prometheus 收集到的數據將展示在 Grafana 上,默認模板展示效果如下:

  • EMQ Dashboard:包含連接、消息、主題、吞吐量歷史統計
  • EMQ:包含客戶端數、訂閱數、主題數、消息數、報文數等業務信息歷史統計
  • ErlangVM:每個 EMQ X 節點 Erlang 虛擬機進程/線程數量,ETS/Mnesia 數據庫使用情況歷史統計

如有其他需求,可以參照 「附:emqx-statsd 所有指標」並結合 Grafana 進行圖標數據編排展示。

在這裏插入圖片描述
在這裏插入圖片描述
在這裏插入圖片描述

告警管理

Prometheus 與 Grafana 均支持指標告警功能,配置告警規則後,服務器會不斷評估設置的規則與當前指標數據,在規則條件符合的時候發送出通知。

篇幅有限,告警相關配置與實踐請關注後續文章。

附:emqx-statsd 所有指標

EMQ X MQTT 服務器通過 Prometheus push gateway 推送指標數據,支持的指標項如下:

# TYPE erlang_vm_ets_limit gauge
erlang_vm_ets_limit 256000
# TYPE erlang_vm_logical_processors gauge
erlang_vm_logical_processors 4
# TYPE erlang_vm_logical_processors_available gauge
erlang_vm_logical_processors_available NaN
# TYPE erlang_vm_logical_processors_online gauge
erlang_vm_logical_processors_online 4
# TYPE erlang_vm_port_count gauge
erlang_vm_port_count 16
# TYPE erlang_vm_port_limit gauge
erlang_vm_port_limit 1048576
# TYPE erlang_vm_process_count gauge
erlang_vm_process_count 320
# TYPE erlang_vm_process_limit gauge
erlang_vm_process_limit 2097152
# TYPE erlang_vm_schedulers gauge
erlang_vm_schedulers 4
# TYPE erlang_vm_schedulers_online gauge
erlang_vm_schedulers_online 4
# TYPE erlang_vm_smp_support untyped
erlang_vm_smp_support 1
# TYPE erlang_vm_threads untyped
erlang_vm_threads 1
# TYPE erlang_vm_thread_pool_size gauge
erlang_vm_thread_pool_size 4
# TYPE erlang_vm_time_correction untyped
erlang_vm_time_correction 1
# TYPE erlang_vm_statistics_context_switches counter
erlang_vm_statistics_context_switches 20767
# TYPE erlang_vm_statistics_garbage_collection_number_of_gcs counter
erlang_vm_statistics_garbage_collection_number_of_gcs 3924
# TYPE erlang_vm_statistics_garbage_collection_words_reclaimed counter
erlang_vm_statistics_garbage_collection_words_reclaimed 6751048
# TYPE erlang_vm_statistics_garbage_collection_bytes_reclaimed counter
erlang_vm_statistics_garbage_collection_bytes_reclaimed 54008384
# TYPE erlang_vm_statistics_bytes_received_total counter
erlang_vm_statistics_bytes_received_total 23332
# TYPE erlang_vm_statistics_bytes_output_total counter
erlang_vm_statistics_bytes_output_total 21266
# TYPE erlang_vm_statistics_reductions_total counter
erlang_vm_statistics_reductions_total 18413181
# TYPE erlang_vm_statistics_run_queues_length_total gauge
erlang_vm_statistics_run_queues_length_total 0
# TYPE erlang_vm_statistics_runtime_milliseconds counter
erlang_vm_statistics_runtime_milliseconds 1782
# TYPE erlang_vm_statistics_wallclock_time_milliseconds counter
erlang_vm_statistics_wallclock_time_milliseconds 68277
# TYPE erlang_vm_memory_atom_bytes_total gauge
erlang_vm_memory_atom_bytes_total{usage="used"} 1507142
erlang_vm_memory_atom_bytes_total{usage="free"} 18787
# TYPE erlang_vm_memory_bytes_total gauge
erlang_vm_memory_bytes_total{kind="system"} 63949544
erlang_vm_memory_bytes_total{kind="processes"} 45457848
# TYPE erlang_vm_dets_tables gauge
erlang_vm_dets_tables 0
# TYPE erlang_vm_ets_tables gauge
erlang_vm_ets_tables 115
# TYPE erlang_vm_memory_processes_bytes_total gauge
erlang_vm_memory_processes_bytes_total{usage="used"} 45457696
erlang_vm_memory_processes_bytes_total{usage="free"} 152
# TYPE erlang_vm_memory_system_bytes_total gauge
erlang_vm_memory_system_bytes_total{usage="atom"} 1525929
erlang_vm_memory_system_bytes_total{usage="binary"} 104504
erlang_vm_memory_system_bytes_total{usage="code"} 26779999
erlang_vm_memory_system_bytes_total{usage="ets"} 7685312
erlang_vm_memory_system_bytes_total{usage="other"} 27853800
# TYPE erlang_mnesia_held_locks gauge
erlang_mnesia_held_locks 0
# TYPE erlang_mnesia_lock_queue gauge
erlang_mnesia_lock_queue 0
# TYPE erlang_mnesia_transaction_participants gauge
erlang_mnesia_transaction_participants 0
# TYPE erlang_mnesia_transaction_coordinators gauge
erlang_mnesia_transaction_coordinators 0
# TYPE erlang_mnesia_failed_transactions counter
erlang_mnesia_failed_transactions 21
# TYPE erlang_mnesia_committed_transactions counter
erlang_mnesia_committed_transactions 128
# TYPE erlang_mnesia_logged_transactions counter
erlang_mnesia_logged_transactions 3
# TYPE erlang_mnesia_restarted_transactions counter
erlang_mnesia_restarted_transactions 0
# TYPE emqx_connections_count gauge
emqx_connections_count 0
# TYPE emqx_connections_max gauge
emqx_connections_max 0
# TYPE emqx_sessions_count gauge
emqx_sessions_count 0
# TYPE emqx_sessions_max gauge
emqx_sessions_max 0
# TYPE emqx_topics_count gauge
emqx_topics_count 0
# TYPE emqx_topics_max gauge
emqx_topics_max 0
# TYPE emqx_suboptions_count gauge
emqx_suboptions_count 0
# TYPE emqx_suboptions_max gauge
emqx_suboptions_max 0
# TYPE emqx_subscribers_count gauge
emqx_subscribers_count 0
# TYPE emqx_subscribers_max gauge
emqx_subscribers_max 0
# TYPE emqx_subscriptions_count gauge
emqx_subscriptions_count 0
# TYPE emqx_subscriptions_max gauge
emqx_subscriptions_max 0
# TYPE emqx_subscriptions_shared_count gauge
emqx_subscriptions_shared_count 0
# TYPE emqx_subscriptions_shared_max gauge
emqx_subscriptions_shared_max 0
# TYPE emqx_routes_count gauge
emqx_routes_count 0
# TYPE emqx_routes_max gauge
emqx_routes_max 0
# TYPE emqx_retained_count gauge
emqx_retained_count 3
# TYPE emqx_retained_max gauge
emqx_retained_max 3
# TYPE emqx_vm_cpu_use gauge
emqx_vm_cpu_use 12.029950083194677
# TYPE emqx_vm_cpu_idle gauge
emqx_vm_cpu_idle 87.97004991680532
# TYPE emqx_vm_run_queue gauge
emqx_vm_run_queue 1
# TYPE emqx_vm_process_messages_in_queues gauge
emqx_vm_process_messages_in_queues 0
# TYPE emqx_bytes_received counter
emqx_bytes_received 0
# TYPE emqx_bytes_sent counter
emqx_bytes_sent 0
# TYPE emqx_packets_received counter
emqx_packets_received 0
# TYPE emqx_packets_sent counter
emqx_packets_sent 0
# TYPE emqx_packets_connect counter
emqx_packets_connect 0
# TYPE emqx_packets_connack_sent counter
emqx_packets_connack_sent 0
# TYPE emqx_packets_connack_error counter
emqx_packets_connack_error 0
# TYPE emqx_packets_connack_auth_error counter
emqx_packets_connack_auth_error 0
# TYPE emqx_packets_publish_received counter
emqx_packets_publish_received 0
# TYPE emqx_packets_publish_sent counter
emqx_packets_publish_sent 0
# TYPE emqx_packets_publish_inuse counter
emqx_packets_publish_inuse 0
# TYPE emqx_packets_publish_error counter
emqx_packets_publish_error 0
# TYPE emqx_packets_publish_auth_error counter
emqx_packets_publish_auth_error 0
# TYPE emqx_packets_publish_dropped counter
emqx_packets_publish_dropped 0
# TYPE emqx_packets_puback_received counter
emqx_packets_puback_received 0
# TYPE emqx_packets_puback_sent counter
emqx_packets_puback_sent 0
# TYPE emqx_packets_puback_inuse counter
emqx_packets_puback_inuse 0
# TYPE emqx_packets_puback_missed counter
emqx_packets_puback_missed 0
# TYPE emqx_packets_pubrec_received counter
emqx_packets_pubrec_received 0
# TYPE emqx_packets_pubrec_sent counter
emqx_packets_pubrec_sent 0
# TYPE emqx_packets_pubrec_inuse counter
emqx_packets_pubrec_inuse 0
# TYPE emqx_packets_pubrec_missed counter
emqx_packets_pubrec_missed 0
# TYPE emqx_packets_pubrel_received counter
emqx_packets_pubrel_received 0
# TYPE emqx_packets_pubrel_sent counter
emqx_packets_pubrel_sent 0
# TYPE emqx_packets_pubrel_missed counter
emqx_packets_pubrel_missed 0
# TYPE emqx_packets_pubcomp_received counter
emqx_packets_pubcomp_received 0
# TYPE emqx_packets_pubcomp_sent counter
emqx_packets_pubcomp_sent 0
# TYPE emqx_packets_pubcomp_inuse counter
emqx_packets_pubcomp_inuse 0
# TYPE emqx_packets_pubcomp_missed counter
emqx_packets_pubcomp_missed 0
# TYPE emqx_packets_subscribe_received counter
emqx_packets_subscribe_received 0
# TYPE emqx_packets_subscribe_error counter
emqx_packets_subscribe_error 0
# TYPE emqx_packets_subscribe_auth_error counter
emqx_packets_subscribe_auth_error 0
# TYPE emqx_packets_suback_sent counter
emqx_packets_suback_sent 0
# TYPE emqx_packets_unsubscribe_received counter
emqx_packets_unsubscribe_received 0
# TYPE emqx_packets_unsubscribe_error counter
emqx_packets_unsubscribe_error 0
# TYPE emqx_packets_unsuback_sent counter
emqx_packets_unsuback_sent 0
# TYPE emqx_packets_pingreq_received counter
emqx_packets_pingreq_received 0
# TYPE emqx_packets_pingresp_sent counter
emqx_packets_pingresp_sent 0
# TYPE emqx_packets_disconnect_received counter
emqx_packets_disconnect_received 0
# TYPE emqx_packets_disconnect_sent counter
emqx_packets_disconnect_sent 0
# TYPE emqx_packets_auth_received counter
emqx_packets_auth_received 0
# TYPE emqx_packets_auth_sent counter
emqx_packets_auth_sent 0
# TYPE emqx_messages_received counter
emqx_messages_received 0
# TYPE emqx_messages_sent counter
emqx_messages_sent 0
# TYPE emqx_messages_qos0_received counter
emqx_messages_qos0_received 0
# TYPE emqx_messages_qos0_sent counter
emqx_messages_qos0_sent 0
# TYPE emqx_messages_qos1_received counter
emqx_messages_qos1_received 0
# TYPE emqx_messages_qos1_sent counter
emqx_messages_qos1_sent 0
# TYPE emqx_messages_qos2_received counter
emqx_messages_qos2_received 0
# TYPE emqx_messages_qos2_sent counter
emqx_messages_qos2_sent 0
# TYPE emqx_messages_publish counter
emqx_messages_publish 0
# TYPE emqx_messages_dropped counter
emqx_messages_dropped 0
# TYPE emqx_messages_dropped_expired counter
emqx_messages_dropped_expired 0
# TYPE emqx_messages_dropped_no_subscribers counter
emqx_messages_dropped_no_subscribers 0
# TYPE emqx_messages_forward counter
emqx_messages_forward 0
# TYPE emqx_messages_retained counter
emqx_messages_retained 2
# TYPE emqx_messages_delayed counter
emqx_messages_delayed 0
# TYPE emqx_messages_delivered counter
emqx_messages_delivered 0
# TYPE emqx_messages_acked counter
emqx_messages_acked 0
# TYPE emqx_delivery_dropped counter
emqx_delivery_dropped 0
# TYPE emqx_delivery_dropped_no_local counter
emqx_delivery_dropped_no_local 0
# TYPE emqx_delivery_dropped_too_large counter
emqx_delivery_dropped_too_large 0
# TYPE emqx_delivery_dropped_qos0_msg counter
emqx_delivery_dropped_qos0_msg 0
# TYPE emqx_delivery_dropped_queue_full counter
emqx_delivery_dropped_queue_full 0
# TYPE emqx_delivery_dropped_expired counter
emqx_delivery_dropped_expired 0
# TYPE emqx_client_connected counter
emqx_client_connected 0
# TYPE emqx_client_authenticate counter
emqx_client_authenticate 0
# TYPE emqx_client_auth_anonymous counter
emqx_client_auth_anonymous 0
# TYPE emqx_client_check_acl counter
emqx_client_check_acl 0
# TYPE emqx_client_subscribe counter
emqx_client_subscribe 0
# TYPE emqx_client_unsubscribe counter
emqx_client_unsubscribe 0
# TYPE emqx_client_disconnected counter
emqx_client_disconnected 0
# TYPE emqx_session_created counter
emqx_session_created 0
# TYPE emqx_session_resumed counter
emqx_session_resumed 0
# TYPE emqx_session_takeovered counter
emqx_session_takeovered 0
# TYPE emqx_session_discarded counter
emqx_session_discarded 0
# TYPE emqx_session_terminated counter
emqx_session_terminated 0

本文作者: EMQ X

原文鏈接:https://www.emqx.io/cn/blog/emqx-prometheus-grafana

版權聲明: 本文爲 EMQ 原創,轉載請註明出處

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章