二進制安裝Prometheus
服務器列表:
服務器名稱 | 操作系統 | IP地址 | 服務 |
---|---|---|---|
test03 | Ubuntu 16.04.4 | 192.168.1.58 | Prometheus, Alertmanager,grafana |
test02 | Ubuntu 16.04.4 | 192.168.1.57 | Node_exporter |
1、安裝prometheus
-
Prometheus官網下載地址:https://prometheus.io/download/
-
下載prometheus
root@test03:~# wget https://github.com/prometheus/prometheus/releases/download/v2.11.0/prometheus-2.11.0.linux-amd64.tar.gz
-
解壓prometheus
root@test03:~# tar xf prometheus-2.11.0.linux-amd64.tar.gz
-
移動到/usr/local/prometheus目錄
root@test03:~# mv prometheus-2.11.0.linux-amd64 /usr/local/prometheus
- 設置prometheus後臺服務啓動
root@test03:~# cat /lib/systemd/system/prometheus.service
[Unit]
Description=https://prometheus.io
[Service]
ExecStart=/usr/local/prometheus/prometheus --config.file="/usr/local/prometheus/prometheus.yml"
[Install]
WantedBy=multi-user.target
-
創建prometheus服務
root@test03:~# systemctl enable prometheus.service Created symlink from /etc/systemd/system/multi-user.target.wants/prometheus.service to /lib/systemd/system/prometheus.service.
-
啓動prometheus服務
root@test03:~# systemctl start prometheus.service
-
查看promethues服務狀態
root@test03:~# systemctl status prometheus.service ● prometheus.service - https://prometheus.io Loaded: loaded (/lib/systemd/system/prometheus.service; enabled; vendor preset: enabled) Active: active (running) since Wed 2019-07-10 11:10:45 CST; 4s ago Main PID: 818 (prometheus) ......
- 訪問:http://192.168.1.58:9090
2、安裝Grafana
-
docker 安裝
root@test03:~# docker run -d -p 3000:3000 grafana/grafana root@test03:~# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES a6ff7bd88b42 grafana/grafana "/run.sh" 43 seconds ago Up 41 seconds 0.0.0.0:3000->3000/tcp peaceful_brattain
-
訪問:http://192.168.1.58:3000
登錄gafana界面:
默認賬號是:admin
默認密碼是:admin
第一次登錄後,提示重新設置密碼 -
添加數據源
- 輸入Prometheus地址
3、監控Linux服務器
- 安裝node_exporter並啓動
root@test02:~# wget https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz
root@test02:~# tar xf node_exporter-0.18.1.linux-amd64.tar.gz
root@test02:~# mv node_exporter-0.18.1.linux-amd64 /usr/local/node_exporter
root@test02:~# cd /usr/local/node_exporter
root@test02:/usr/local/node_exporter# cat /lib/systemd/system/node_exporter.service
[Unit]
Description=https://prometheus.io/docs/guides/node-exporter/
[Service]
ExecStart=/usr/local/node_exporter/node_exporter
[Install]
WantedBy=multi-user.target
root@test02:/usr/local/node_exporter# systemctl enable node_exporter.service
Created symlink from /etc/systemd/system/multi-user.target.wants/node_exporter.service to /lib/systemd/system/node_exporter.service.
root@test02:/usr/local/node_exporter# systemctl start node_exporter.service
root@test02:/usr/local/node_exporter# systemctl status node_exporter.service
● node_exporter.service - https://prometheus.io/docs/guides/node-exporter/
Loaded: loaded (/lib/systemd/system/node_exporter.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2019-07-10 14:23:35 CST; 5s ago
Main PID: 774 (node_exporter)
CGroup: /system.slice/node_exporter.service
└─774 /usr/local/node_exporter/node_exporter
cat /usr/local/prometheus/prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: 'host'
file_sd_configs:
- files: ['/usr/local/prometheus/sd_config/host.yml']
refresh_interval: 5s
- 創建host.yaml文件
root@test03:/usr/local/prometheus/sd_config# cat /usr/local/prometheus/sd_config/host.yml
- targets:
- 192.168.1.57:9100
-
重載配置文件
prometheus_id=`ps -ef |grep prometheus.yml|grep -v grep|awk '{print $2}'` kill -hup $prometheus_id
- 查看Targets host,host組,已存在192.168.1.57被監控端
- grafana導入linux基礎監控模:9276
- 輸入9276後,等待幾秒自動加載模板
- 查看主機資源展示
4、安裝Alertmanager
- 下載Alertmanager
root@test03:~# wget https://github.com/prometheus/alertmanager/releases/download/v0.18.0/alertmanager-0.18.0.linux-amd64.tar.gz
- 解壓alertmanager-0.18.0.linux-amd64.tar.gz 並移動到/usr/local/alertmanager
root@test03:~# tar xf alertmanager-0.18.0.linux-amd64.tar.gz root@test03:~# mv alertmanager-0.18.0.linux-amd64 /usr/local/alertmanager
- 配置alertmanager後臺啓動
root@test03:~# cd /usr/local/alertmanager
root@test03:/usr/local/alertmanager# cat /lib/systemd/system/alertmanager.service
[Unit]
Description=https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/
[Service]
ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml
[Install]
WantedBy=multi-user.target
- 配置郵件告警
root@test03:/usr/local/alertmanager# cat /usr/local/alertmanager/alertmanager.yml
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.163.com:25'
smtp_from: '[email protected]'
smtp_auth_username: '[email protected]'
smtp_auth_password: 'xxxxxx'
smtp_require_tls: false
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1m
receiver: 'mail'
receivers:
- name: 'mail'
email_configs:
- to: '[email protected]'
- 啓動alertmanager
root@test03:/usr/local/alertmanager# systemctl enable alertmanager.service
Created symlink from /etc/systemd/system/multi-user.target.wants/alertmanager.service to /lib/systemd/system/alertmanager.service.
root@test03:/usr/local/alertmanager# systemctl start alertmanager.service
root@test03:/usr/local/alertmanager# systemctl status alertmanager.service
● alertmanager.service - https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/
Loaded: loaded (/lib/systemd/system/alertmanager.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2019-07-10 16:28:20 CST; 2min 15s ago
Main PID: 19847 (alertmanager)
Tasks: 9
Memory: 9.0M
CPU: 290ms
CGroup: /system.slice/alertmanager.service
└─19847 /usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml
- 配置告警信息
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- 127.0.0.1:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "rules/*.yml"
root@test03:/usr/local/prometheus/rules# cat /usr/local/prometheus/rules/targets.yml
groups:
- name: targets
rules:
# Alert for any instance that is unreachable for >5 minutes.
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: error
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."
- 重載Prometheus服務文件,根據Prometheus進程號818發送信號
prometheus_id=`ps -ef |grep prometheus.yml|grep -v grep|awk '{print $2}'` kill -hup $prometheus_id
- 查看告警規則
- 查看告警狀態,(active)表示:活動
- 測試節點停止
root@test02:~# systemctl stop node_exporter.service
- Pending:已觸發閾值,但未滿足告警持續時間
- Firing:已觸發閾值且滿足告警持續時間。警報發送給接收者。
*
- 收到告警郵件