想好好理解下alertamanager中route的規則解讀,趁着這個機會,就直接拿着官方的demo文件進行解讀.文件內容如下:
routes: - match_re: service: ^(foo1|foo2|baz)$ receiver: team-X-mails routes: - match: severity: critical receiver: team-X-pager - match: service: files receiver: team-Y-mails routes: - match: severity: critical receiver: team-Y-pager - match: service: database receiver: team-DB-pager # Also group alerts by affected database. group_by: [alertname, cluster, database] routes: - match: owner: team-X receiver: team-X-pager continue: true - match: owner: team-Y receiver: team-Y-pager
對文件內容進行分拆分析
- match_re: service: ^(foo1|foo2|baz)$ receiver: team-X-mails routes: - match: severity: critical receiver: team-X-pager 當服務 foo1|foo2|baz出現問題的時候,如果告警的解決的級別是critical,就會發送給team-X-pager組;當沒有匹配到的情況下,默認發送給team-X-mails
- match: service: database receiver: team-DB-pager # Also group alerts by affected database. group_by: [alertname, cluster, database] routes: - match: owner: team-X receiver: team-X-pager continue: true - match: owner: team-Y receiver: team-Y-pager 當服務是database出現問題的時候,如果匹配的標籤是team-X,就會發給team-X-pager;繼續匹配,當匹配的標籤是team-Y,就會發給team-Y-pager;如果都沒有匹配到,則默認發送給team-DB-pager
相關組標籤的解釋
Alertmanager可以對告警通知進行分組,將多條告警合合併爲一個通知。這裏我們可以使用group_by來定義分組規則。基於告警中包含的標籤,如果滿足group_by中定義標籤名稱,那麼這些告警將會合併爲一個通知發送給接收器。
有的時候爲了能夠一次性收集和發送更多的相關信息時,可以通過group_wait參數設置等待時間,如果在等待時間內當前group接收到了新的告警,這些告警將會合併爲一個通知向receiver發送。
而group_interval配置,則用於定義相同的Group之間發送告警通知的時間間隔。