如何用Prometheus和Grafana監控Kubernetes集羣？

Prometheus 是一款免費軟件，用於監控事件和警報工具。它可以幫助在時間戳系列數據庫中記錄實時指標，使用 Http 模型進行 n 次查詢和實時報警。我們可以使用 Prometheus 來監控整個 Kubernetes 集羣。

Prometheus 棧包括：

Prometheus
Alertmanager
kube-state-metrics
node-exporter
Grafana

我們還可以在其中包括警報和儀表板。

Capacity planning
Cluster health
Deployments
k8s cluster rsrc use
k8s node rsrc use
k8s resources cluster
k8s resources namespace
k8s resources pod
kube DNS
kubelet
Nodes
Pods
Statefulset
Kubernetes all-nodes
Kubernetes cluster-all
Kubernetes pods-cluster
Kubernetes resources-requests

警報

Component Down（API Server、Kubelet、Node exporter、Alertmanager 以及 Prometheus 等等）
Pod alerts （Crashloopbackoff、Pending，尚未就緒）
Workload controller alerts（Replicas Mismatch、DaemonSet NotScheduled、DaemonSet MisScheduled、Job Failed 和 Long-running Jobs）
Resources alerts（Cpu overcommit、Memory overcommit、Quota exceeded）
Persistent Volume alerts
Kube API error 和 Client alerts
Prometh
eus configuration error alerts

安裝

第一步：從 GitHub 克隆 Prometheus-grafana 倉庫：

git clone URL to GIT REPO

第二步：創建一個 manifest 文件：

cd Prometheus-grafanaawk ‘FNR==1 {print “ — -”}{print}’ manifests/* > “prometheus_grafana_manifest.yaml”

第三步：安裝 Prometheus-Grafana 棧：

kubectl apply -f prometheus_grafana_manifest.yaml

第四步：爲 Grafana 創建 ingress：

如果集羣中有一個 ingress 控制器，請更新 grafana-ingress.yaml 文件中的域和 ingress 類，並創建 ingress 資源。

kubectl apply -f grafana-ingress.yaml

如果沒有 ingress 控制器，仍然可以使用負載平衡服務或節點端口服務，或使用 Kube-proxy 訪問 grafana 。

Grafana Credentials（憑據）

Grafana 的默認憑據爲：

Username：Cloud
Password：Cloud

Grafana 登陸頁面：

Grafana Nodes 儀表板

你可以根據自己的興趣設置自己的用戶名和密碼。

在更新憑據機密文件中的值之前，必須以 base64 格式對用戶名和密碼進行編碼。

echo “myuser” | base64
bXl1c2VyCg==
echo “HgTf0n9L@wrd” | base64 HgTf0n9L@wrd
GHJKLYuiGFDYH=

現在，我們將使用 manifests 目錄下的 2-grafana-cerdentials-secret.yaml 中用 base64 編碼的用戶名和密碼來“更新 admin-user 和 admin-password 的值”。

apiVersion: v1
kind: Secret
metadata:
  name: grafana
  namespace: prometheus
  labels:
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/component: grafana
type: Opaque
data:
  admin-user: jdvchksojs)==
  admin-password: GHJKLYuiGFDYH=

運行命令：

kubectl apply -f 2-grafana-credentials-secret.yaml

如果 Grafana 已經安裝並正在運行，則必須刪除現有的 Pod。我們將看到一個新的 Pod，具有最新配置和更新配置。

獲取 Grafana 憑據

你可以通過解碼值從 secret 中獲得憑據：

echo "Username: $(kubectl get secret grafana --namespace prometheus \
                 --output=jsonpath='{.data.admin-user}' | base64 --decode)"
echo "Password: $(kubectl get secret grafana --namespace prometheus \
                 --output=jsonpath='{.data.admin-password}' | base64 --decode)"

我們還可以看到，在 Prometheus 中，無需身份驗證即可登錄到 Web 界面。

Prometheus Web 界面：

配置 Alertmanager（警報管理器）在安裝棧時，必須提供警報接收器的詳細信息。

否則，你將永遠不會收到有關集羣狀態變更和資源利用率的通知。

我們可以根據需要更改配置。

Alert Manager 配置了一個以 YAML 格式編寫的配置文件，該文件定義了規則、通知路由和接收器。

下面是 Email、Slack 和 Webhook 接收器的配置示例：

Email ：

global:
  resolve_timeout: 5m
receivers:
  - name: email_config
    email_configs:
      - to: "< to_address >"
        from: "< from_address >"
        smarthost: "< smtp_host:port >"
        auth_username: "< smtp_username >"
        auth_password: "< smtp_password >"
route:
  group_by:
    - job
  receiver: email_config
  group_interval: 5m
  group_wait: 30s
  repeat_interval: 30m

Slack :

global:
  resolve_timeout: 5m
  slack_api_url: "< slack_webhook_url >"
receivers:
  - name: "slack-notifications"
    slack_configs:
      - channel: "#alerts"
route:
  group_by:
    - job
  receiver: "slack-notifications"
  group_interval: 5m
  group_wait: 30s
  repeat_interval: 30m

Web-hook :

global:
  resolve_timeout: 5m
receivers:
  - name: webhook
    webhook_configs:
      - url: "< webhook_url >"
route:
  group_by:
    - job
  repeat_interval: 30m
  group_interval: 5m
  group_wait: 30s
  receiver: webhook

如上所述，在 mainifests 目錄下的 1-alermanager-configmap.yaml 文件中更新配置，並應用配置。

kubectl apply -f 1-alertmanager-configmap.yaml

更新 coonfigmap 後，重啓正在運行的 alertmanager pod。將使用更新後的配置創建一個新的 pod。

原文鏈接：

https://medium.com/faun/how-to-monitor-kubernetes-cluster-with-prometheus-and-grafana-8ec7e060896f

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

如何用Prometheus和Grafana監控Kubernetes集羣？

警報

安裝

Grafana Credentials（憑據）

Grafana Nodes 儀表板

獲取 Grafana 憑據

詐騙（殺豬盤）網站進行滲透測試

Python 潮流週刊#50：我最喜歡的 Python 3.13 新特性！

外行也能讀懂的網絡硬件設備功能原理速成

如何用Prometheus和Grafana監控Kubernetes集羣？

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結