kubernetes 部署prometheus筆記 (五)

部署alertmanager

考慮到prometheus需要在配置文件中設置alertmanager監聽地址和端口,因此採用把alertmanager和prometheus部署在同一個pod中的方式,當然也可以另外以單獨pod部署,然後通過service和port的方式來配置,但是不知爲啥,沒測試成功.增加相應的配置到prometheus.yml中:

 prometheus.yml: |-
    global:
      scrape_interval: 90s
      evaluation_interval: 90s
    alerting:
      alertmanagers:
      - static_configs:
        - targets: ["localhost:9093"]
          #- alertmanager:9093
    rule_files:
      - /etc/prometheus/rules.yml

增加alertmanager需要用的告警規則到prometheus.yml中:

 rules.yml: |-
    groups:
    - name: test-rule
      rules:
      - alert: NodeFilesystemUsage
        expr: (node_filesystem_size{device="rootfs"} - node_filesystem_free{device="rootfs"}) / node_filesystem_size{device="rootfs"} * 100 > 80
        for: 2m
        labels:
          team: node
        annotations:
          summary: "{{$labels.instance}}: High Filesystem usage detected"
          description: "{{$labels.instance}}: Filesystem usage is above 80% (current value is: {{ $value }}"
      - alert: NodeMemoryUsage
        expr: (node_memory_MemTotal - (node_memory_MemFree+node_memory_Buffers+node_memory_Cached )) / node_memory_MemTotal * 100 > 80
        for: 2m
        labels:
          team: node
        annotations:
          summary: "{{$labels.instance}}: High Memory usage detected"
          description: "{{$labels.instance}}: Memory usage is above 80% (current value is: {{ $value }}"
      - alert: NodeCPUUsage
        expr: (100 - (avg by (instance) (irate(node_cpu{job="kubernetes-node-exporter",mode="idle"}[5m])) * 100)) > 80
        for: 2m
        labels:
          team: node
        annotations:
          summary: "{{$labels.instance}}: High CPU usage detected"
          description: "{{$labels.instance}}: CPU usage is above 80% (current value is: {{ $value }}"


修改prometheus-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-deployment
  namespace: kube-system
  #annotations:
    # used to scrape app's metrics which deployed in pod
  #  prometheus.io/scrape: 'true'
    # prometheus scrape path, default /metrics
  #  prometheus.io/path: '/metrics'
    # prometheus.io/port relvant port
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      serviceAccountName: prometheus
      securityContext:
        runAsUser: 0
      containers:
      - name: prometheus
        image: prom/prometheus:v2.2.0
        args:
          - "--config.file=/etc/prometheus/prometheus.yml"
          - "--storage.tsdb.path=/prometheus"
        ports:
          - containerPort: 9090
            protocol: TCP
        volumeMounts:
          - name: gluster-volume
            mountPath: /prometheus
          - name: config-volume
            mountPath: /etc/prometheus
      - name: alertmanager
        image: x.x.x.x/library/prom/alertmanager:latest
        args:
          - '--config.file=/etc/alertmanager/config.yml'
        ports:
        - name: alertmanager
          containerPort: 9093
        volumeMounts:
        - name: alert-volume
          mountPath: /etc/alertmanager
      imagePullSecrets:
      - name: my-secret
      volumes:
        - name: gluster-volume
          persistentVolumeClaim:
            claimName: gluster-prometheus
        - name: config-volume
          configMap:
            name: prometheus-server-conf
        - name: alert-volume
          configMap:
            name: alertmanager

準備alertmanager告警需要用到的郵件設置:

kind: ConfigMap
apiVersion: v1
metadata:
  name: alertmanager
  namespace: kube-system
data:
  config.yml: |-
    global:
      smtp_smarthost: 'smtp.163.com:25'
      smtp_from: '[email protected]'
      smtp_auth_username: '[email protected]'
      smtp_auth_password: 'xxxx'


    templates:
      - '/root/alertmanager/template/*.tmpl'

    route:
      group_by: ['alertname', 'cluster', 'service']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 10m
      receiver: default-receiver


    receivers:
    - name: 'default-receiver'
      email_configs:
      - to: '[email protected]'

注意,163的郵箱設置中必須打開SMTP,否則會報如下錯誤:

evel=error ts=2018-04-03T03:39:32.793284112Z caller=notify.go:303 component=dispatcher msg="Error on notify" err="*notify.loginAuth failed: 550 User has no permission"
level=error ts=2018-04-03T03:39:32.793463167Z caller=dispatch.go:266 component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="*notify.loginAuth failed: 550 User has no permission"

進行創建部署即可.




發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章