最簡單使用promethus+spring boot admin搭建spring boot應用監控系統

promethus的安裝使用

下載

官網地址 https://prometheus.io/

打開download頁面,選擇操作系統(Operating system),cpu架構(Architecture)後選擇對應的軟件下載即可

prometheus是監控系統,和一個內嵌的時間序列數據庫
alertmanager是告警的組件

因爲是springboot應用的監控所以主要使用這兩個就可以(其他組件提供了探測功能,spring boot自身使用actuator提供這些探測信息)

因爲主要是在linux上使用,所以下面講的都是linux的操作

解壓到prometheus到一個指定的位置,暫定**/app/monitor-admin/prometheus/**

解壓之後有各種各樣的文件

prometheus是可執行文件,要有x權限才能使用
prometheus.yml是配置文件,yml風格如果手寫比較容易出錯,最好使用相應的編輯器編寫

解壓alertmanager到**/app/monitor-admin/prometheus/裏面,所以目錄是/app/monitor-admin/prometheus/alertmanager**

文件命名風格基本一致

alertmanager是可執行文件,要有x權限才能使用
alertmanager.yml是配置文件

啓動

啓動prometheus最好使用如下命令

./prometheus --config.file=./prometheus.yml --web.enable-lifecycle
#--config.file=./prometheus.yml 指定配置
#--web.enable-lifecycle這個參數的主要目的是爲了運行時重載配置

啓動完成後使用ip:9090就可以訪問到Prometheus的頁面了

啓動alertmanager可以使用類似的命令

alertmanager的默認端口是9093

簡單配置

一個簡單的prometheus.yml配置如下

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration 這個是Alertmanager的配置
alerting:
  alertmanagers:
  - static_configs:
    - targets:
       - 127.0.0.1:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
   - "agent_monitor.yml" #這個是頁面上的rules的配置文件
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  # - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
  - job_name: 'agent-server-monitor' #某個job的名字
    metrics_path: '/actuator/prometheus' #Prometheus採集的uri,在這裏是spring boot提供的promethus接口
    scrape_interval: 15s
    static_configs:
    - targets: ['192.168.6.28:8099'] #目標,實際上就是一個應用的ip:port
      labels:
        instance: server #實例
        name: 192.168.6.28:8099 #區分各個實例,這個標籤是可以任意添加的,在這裏我的instance代表的是應用的功能,name作爲區分
    - targets: ['192.168.5.22:8099']
      labels:
        instance: server
        name: 192.168.5.22:8099

agent_monitor.yml 這個配置可以這樣子檢查

promtool check rules /path/to/example.rules.yml

groups:
- name: monitor
  rules:
  - alert: InstanceDown
    expr: up == 0
    for: 1m
    labels:
      serverity: server
    annotations:
      summary: "Instance {{ $labels.name }} down"
      description: "應用 {{ $labels.name }} down" #這個labels就是引用的targets.labels,因此可以獲取到其中自定義的名字
  - alert: process_cpu_usage
    expr: process_cpu_usage > 0.3
    for: 15s
    labels:
      serverity: server
    annotations:
      summary: "app {{ $labels.name }} cpu high"
      description: "應用 {{ $labels.name }} cpu使用率過高 當前值 {{ $value }}" #value就是查詢出來的值
  - alert: system_cpu_usage
    expr: system_cpu_usage > 0
    for: 15s
    labels:
      serverity: server
    annotations:
      summary: "system {{ $labels.name }} cpu high"
      description: "應用 {{ $labels.name }} 所在服務器cpu負載過高 當前值 {{ $value }}"
  - alert: heap_memory_usage
    expr: jvm_memory_used_bytes{area="heap"} * 100/ jvm_memory_used_bytes{area="heap"} > 0
    for: 15s
    labels:
      serverity: server
    annotations:
      summary: "instance {{ $labels.name }} heap_memory_usage high"
      description: "應用 {{ $labels.name }} 的heap內存使用率過高 當前值 {{ $value }}"

具體的配置可以參考這篇博客https://blog.csdn.net/u012394095/article/details/81902630或者自行尋找

這篇博客中的獲取查詢語言也可以在Prometheus的Graph的Console中獲取,具體方式是

在下拉菜單中找到需要的內容
點擊execute
在Console中會出現查詢語句

alertmanager.yml的配置如下,這裏我使用的是web.hook,便於和spring-boot-admin配合使用

global:
  resolve_timeout: 5m

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'web.hook'
receivers:
- name: 'web.hook'
  webhook_configs:
  - url: 'http://127.0.0.1:8080/alarm' #web的接口,這裏使用的是post請求
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

重載配置使用這個命令,當然也可以做成頁面按鈕進行重載

curl -X POST http://localhost:9090/-/reload

在spring-boot中啓用prometheus

prometheus的使用需要一個prometheus的依賴

 <dependency>
     <groupId>io.micrometer</groupId>
     <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

還需要actuator的配合

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

另外2.x的需要手動配置進行端點暴露

最簡單的配置像這樣,具體的詳細配置需要查看官方文檔https://spring.io/projects/spring-boot/#learn用endpoints關鍵詞搜索即可,有一個對應的大章

management:
  endpoints:
    web:
      exposure:
        include: "*"  # * 在yaml 文件屬於關鍵字，所以需要加引號
  endpoint:
    health:
      show-details: always #這個會顯示health的具體內容
  metrics:
    web:
      server:
        auto-time-requests: true

spring-boot-admin

spring-boot-admin分爲兩個部分

一個是server,提供監控的頁面可以查看

一個是client,是被監控的應用的插件

實際上可以認爲是actuator的一個擴展,圖形化的界面

使用spring-boot-admin很簡單,只需要在需要監控的應用上添加admin-client的依賴,然後配置server的地址,server上添加admin-server的依賴,然後enableAdmin即可

server

server依賴

        <!-- https://mvnrepository.com/artifact/de.codecentric/spring-boot-admin-server-ui -->
        <dependency>
            <groupId>de.codecentric</groupId>
            <artifactId>spring-boot-admin-server-ui</artifactId>
            <version>2.1.2</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/de.codecentric/spring-boot-admin-server -->
        <dependency>
            <groupId>de.codecentric</groupId>
            <artifactId>spring-boot-admin-server</artifactId>
            <version>2.1.2</version>
        </dependency>

第一個是圖形依賴,第二個是組件

啓動admin-server使用

@EnableAdminServer

啓動後即可訪問ip:port

對於2.1.2版本,如果啓動後沒有client連接上,則頁面會一直在Loading applications,這是正常情況

啓動一個client後,就會顯示爲一個應用,點擊之後就可以看到actuator的圖形化界面了

長這樣

問題

在2.1.2的spring-boot中,這裏有一個問題

admin-server的後臺會一直報錯,實際上是tomcat的一個bug,切換web容器成jetty就可以了

報錯大概長這樣

具體看這裏https://blog.csdn.net/l1161558158/article/details/86569748

client

client依賴

 <dependency>
     <groupId>de.codecentric</groupId>
     <artifactId>spring-boot-admin-starter-client</artifactId>
     <version>2.1.2</version><!--版本要和springboot的版本對應-->
</dependency>

配置如下

spring:
    boot:
        admin:
            client:
                url: "http://127.0.0.1:8080" #server的地址
                instance:
                    name: agent-server #實例的名字,會顯示在admin的頁面上
                    prefer-ip: true #使用ip,否則的話默認使用主機名

之後啓動應用的時候(沒有開admin-server的時候)會顯示這樣子的一段錯誤,實測不影響使用,其實原理是client發起一次次http的請求

對接spring-admin和Prometheus

說是對接,實際上是prometheus借用spring-admin-server的web通道而已

alertmanager 的webhook使用的是post請求的一個json,這裏我直接定義了兩個Java類,映射到這個json

class AlarmInfo {
    String receiver;
    String status;
    List<Alert> alerts;
    Map<String,String > commonLabels;
    Map<String,String > groupLabels;
    Map<String,String > commonAnnotations;
    Map<String,String > annotations;
    String externalURL;
    String version;
}
 class Alert {
    String status;
    Map<String,String> labels;
    Map<String,String> annotations;
    Date startsAt;
    Date endsAt;
    String generatorURL;
}

然後需要開放一個接口供Prometheus使用

@Slf4j
@RestController
public class ReceiveAlarm {
    private final AlarmService alarmService;

    public ReceiveAlarm(AlarmService alarmService) {
        this.alarmService = alarmService;
    }

//    @ApiOperation(value = "發送告警",notes = "告警會插入到短信表中,然後會發送短信通知,實際上webhook可以做任何通知,電話,釘釘,郵件之類的")
    //實際上Prometheus自身集成了很多告警通知方式,郵件什麼的可以直接使用
//    @ApiImplicitParam(name = "alarmInfo",required = true,dataType = "AlarmInfo")
    @RequestMapping(value = "/alarm")//這裏配置的url就是提供給 alertmanager.yml的webhook
    public void receiveAlarm(@RequestBody AlarmInfo info){
        //這裏是我的業務邏輯
        List<Alert> alerts = info.getAlerts();
        alerts.forEach(alert -> {
            Map<String, String> labels = alert.getLabels();
            String name=labels.get("name");
            String alertName=labels.get("alertname");
            Map<String, String> annotations = alert.getAnnotations();
            String description=annotations.get("description");
            alarmService.sendAlarm(alertName+" "+name,description);
        });
        log.info("{}",info);
    }
}

ps

後續會增加如何自定義端點,進行個性化的監控

最簡單使用promethus+spring boot admin搭建spring boot應用監控系統