介紹
很多地方提到Prometheus Operator是kubernetes集羣監控的終極解決方案,但是目前Prometheus Operator已經不包含完整功能,完整的解決方案已經變爲kube-prometheus。項目地址爲:
https://github.com/coreos/kube-prometheus
kube-prometheus 是一整套監控解決方案,它使用 Prometheus 採集集羣指標,Grafana 做展示,包含如下組件:
組件 | 功能描述 |
---|---|
The Prometheus Operator | 可以非常簡單的在kubernetes集羣中部署Prometheus服務,並且提供對kubernetes集羣的監控,並且可以配置和管理prometheus |
Highly available Prometheus | 高可用監控工具 |
Highly available Alertmanager | 高可用告警工具,用於接收 Prometheus 發送的告警信息,它支持豐富的告警通知渠道,而且很容易做到告警信息進行去重,降噪,分組等,是一款前衛的告警通知系統。 |
node-exporter | 用於採集服務器層面的運行指標,包括機器的loadavg、filesystem、meminfo等基礎監控,類似於傳統主機監控維度的zabbix-agent |
Prometheus Adapter for Kubernetes Metrics APIs (k8s-prometheus-adapter) | 輪詢Kubernetes API,並將Kubernetes的結構化信息轉換爲metrics |
kube-state-metrics | 收集kubernetes集羣內資源對象數據,制定告警規則。 |
grafana | 用於大規模指標數據的可視化展現,是網絡架構和應用分析中最流行的時序數據展示工具 |
其中 k8s-prometheus-adapter 使用 Prometheus 實現了 metrics.k8s.io 和 custom.metrics.k8s.io API,所以不需要再部署 metrics-server( metrics-server 通過 kube-apiserver 發現所有節點,然後調用 kubelet APIs(通過 https 接口)獲得各節點(Node)和 Pod 的 CPU、Memory 等資源使用情況。 從 Kubernetes 1.12 開始,kubernetes 的安裝腳本移除了 Heapster,從 1.13 開始完全移除了對 Heapster 的支持,Heapster 不再被維護)。
1.部署
1.1下載源碼
cd /etc/kubernetes
git clone https://github.com/coreos/kube-prometheus.git
1.2執行安裝
[root@k8s-m01 kube-prometheus]# pwd
/etc/kubernetes/kube-prometheus
# 安裝 prometheus-operator
[root@k8s-m01 kube-prometheus]# kubectl apply -f manifests/setup
# 安裝 promethes metric adapter
[root@k8s-m01 kube-prometheus]# kubectl apply -f manifests/
1.3 查看資源
[root@k8s-m01 kube-prometheus]# kubectl get pod,svc,ep -n monitoring
NAME READY STATUS RESTARTS AGE
pod/alertmanager-main-0 0/2 ContainerCreating 0 7s
pod/alertmanager-main-1 0/2 ContainerCreating 0 7s
pod/alertmanager-main-2 0/2 ContainerCreating 0 7s
pod/grafana-5c55845445-wvtrj 0/1 Pending 0 5s
pod/kube-state-metrics-957fd6c75-whw8r 0/3 Pending 0 5s
pod/node-exporter-gqsrh 0/2 ContainerCreating 0 5s
pod/node-exporter-qmv8h 0/2 Pending 0 5s
pod/prometheus-adapter-5cdcdf9c8d-whln4 0/1 Pending 0 5s
pod/prometheus-k8s-0 0/3 Pending 0 4s
pod/prometheus-k8s-1 0/3 Pending 0 4s
pod/prometheus-operator-6f98f66b89-46fj6 2/2 Running 0 29s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/alertmanager-main ClusterIP 10.109.157.168 <none> 9093/TCP 7s
service/alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 7s
service/grafana ClusterIP 10.110.108.88 <none> 3000/TCP 6s
service/kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 6s
service/node-exporter ClusterIP None <none> 9100/TCP 6s
service/prometheus-adapter ClusterIP 10.102.225.171 <none> 443/TCP 6s
service/prometheus-k8s ClusterIP 10.100.29.234 <none> 9090/TCP 5s
service/prometheus-operated ClusterIP None <none> 9090/TCP 6s
service/prometheus-operator ClusterIP None <none> 8443/TCP 30s
NAME ENDPOINTS AGE
endpoints/alertmanager-main <none> 7s
endpoints/alertmanager-operated <none> 7s
endpoints/grafana <none> 6s
endpoints/kube-state-metrics <none> 6s
endpoints/node-exporter 6s
endpoints/prometheus-adapter <none> 6s
endpoints/prometheus-k8s <none> 5s
endpoints/prometheus-operated <none> 6s
endpoints/prometheus-operator 10.244.2.3:8443 30s
kube-prometheus 創建的 crd 資源
[root@k8s-m01 kube-prometheus]# kubectl get crd -o wide
NAME CREATED AT
alertmanagers.monitoring.coreos.com 2020-05-24T10:11:01Z
podmonitors.monitoring.coreos.com 2020-05-24T10:11:01Z
prometheuses.monitoring.coreos.com 2020-05-24T10:11:01Z
prometheusrules.monitoring.coreos.com 2020-05-24T10:11:02Z
servicemonitors.monitoring.coreos.com 2020-05-24T10:11:02Z
thanosrulers.monitoring.coreos.com 2020-05-24T10:11:02Z
prometheus 資源定義了 prometheus 服務應該如何運行:
[root@k8s-m01 kube-prometheus]# kubectl -n monitoring get prometheus,alertmanager
NAME VERSION REPLICAS AGE
prometheus.monitoring.coreos.com/k8s v2.17.2 2 16s
NAME VERSION REPLICAS AGE
alertmanager.monitoring.coreos.com/main v0.20.0 3 17s
prometheus 和 alertmanager 都是 statefulset 控制器:
[root@k8s-m01 kube-prometheus]# kubectl get statefulset -o wide -n monitoring
NAME READY AGE CONTAINERS IMAGES
alertmanager-main 0/3 22s alertmanager,config-reloader quay.io/prometheus/alertmanager:v0.20.0,jimmidyson/configmap-reload:v0.3.0
prometheus-k8s 0/2 21s prometheus,prometheus-config-reloader,rules-configmap-reloader quay.io/prometheus/prometheus:v2.17.2,quay.io/coreos/prometheus-config-reloader:v0.39.0,jimmidyson/configmap-reload:v0.3.0
查看node和pod資源使用率:
命令行查看node和pod使用資源信息
[root@k8s-m01 kube-prometheus]# kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k8s-m01 151m 7% 1516Mi 39%
k8s-m02 136m 6% 1299Mi 33%
[root@k8s-m01 kube-prometheus]# kubectl top pod
NAME CPU(cores) MEMORY(bytes)
nginx-ds-9dfb7 0m 1Mi
[root@k8s-m01 kube-prometheus]# kubectl top pod -n kube-system
NAME CPU(cores) MEMORY(bytes)
coredns-66bff467f8-gpz5f 2m 16Mi
coredns-66bff467f8-pfzc6 2m 14Mi
etcd-k8s-m01 23m 99Mi
kube-apiserver-k8s-m01 51m 488Mi
kube-controller-manager-k8s-m01 9m 54Mi
kube-flannel-ds-amd64-xgln9 1m 10Mi
kube-proxy-nx2jf 2m 16Mi
kube-scheduler-k8s-m01 2m 21Mi
清除資源
kubectl delete --ignore-not-found=true -f manifests/ -f manifests/setup
# 強制刪除pod
kubectl delete pod prometheus-k8s-1 -n monitoring --force --grace-period=0
以上各組件說明:
- MerticServer: k8s集羣資源使用情況的聚合器,收集數據給k8s集羣內使用;如kubectl,hpa,scheduler
PrometheusOperator:是一個系統監測和警報工具箱,用來存儲監控數據。
NodeExPorter:用於各個node的關鍵度量指標狀態數據。
kubeStateMetrics:收集k8s集羣內資源對象數據,指定告警規則。
Prometheus:採用pull方式收集apiserver,scheduler,control-manager,kubelet組件數據,通過http協議傳輸。
Grafana:是可視化數據統計和監控平臺。
2.訪問方式
1.nodeport方式
Kubernetes 服務的 NodePort 默認端口範圍是 30000-32767,在某些場合下,這個限制不太適用,我們可以自定義它的端口範圍,操作步驟如下:
編輯 vim /etc/kubernetes/manifests/kube-apiserver.yaml 配置文件,增加配置–service-node-port-range=20000-50000
vim /etc/kubernetes/manifests/kube-apiserver.yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 10.0.0.61:6443
creationTimestamp: null
labels:
component: kube-apiserver
tier: control-plane
name: kube-apiserver
namespace: kube-system
spec:
containers:
- command:
- kube-apiserver
- --advertise-address=10.0.0.61
- --service-node-port-range=20000-50000 #添加這一行
修改grafana-service文件:
cd /etc/kubernetes/kube-prometheus/
cat >manifests/grafana-service.yaml<<EOF
apiVersion: v1
kind: Service
metadata:
labels:
app: grafana
name: grafana
namespace: monitoring
spec:
type: NodePort
ports:
- name: http
port: 3000
targetPort: http
nodePort: 33000
selector:
app: grafana
EOF
kubectl apply -f manifests/grafana-service.yaml
修改Prometheus-service文件:
cd /etc/kubernetes/kube-prometheus/
cat >manifests/prometheus-service.yaml<<EOF
apiVersion: v1
kind: Service
metadata:
labels:
prometheus: k8s
name: prometheus-k8s
namespace: monitoring
spec:
type: NodePort
ports:
- name: web
port: 9090
targetPort: web
nodePort: 39090
selector:
app: prometheus
prometheus: k8s
sessionAffinity: ClientIP
EOF
kubectl apply -f manifests/prometheus-service.yaml
修改alertmanager-service文件:
cd /etc/kubernetes/kube-prometheus/
cat >manifests/alertmanager-service.yaml<<EOF
apiVersion: v1
kind: Service
metadata:
labels:
alertmanager: main
name: alertmanager-main
namespace: monitoring
spec:
type: NodePort
ports:
- name: web
port: 9093
targetPort: web
nodePort: 39093
selector:
alertmanager: main
app: alertmanager
sessionAffinity: ClientIP
EOF
kubectl apply -f manifests/alertmanager-service.yaml
2.Prometheus監控頁面展示
訪問Prometheus web頁面: http://10.0.0.61:39090/
展開Status菜單,查看targets,可以看到只有圖中兩個監控任務沒有對應的目標,這和serviceMonitor資源對象有關
原因分析:
因爲serviceMonitor選擇svc時,是根據labels標籤選取,而在指定的命名空間(kube-system),並沒有對應的標籤。kube-apiserver之所以正常是因爲kube-apiserver 服務 namespace 在default使用默認svc kubernetes。其餘組件服務在kube-system 空間 ,需要單獨創建svc。
解決辦法:
# 查看serviceMonitor選取svc規則
[root@k8s-m01 ~]# cd /etc/kubernetes/kube-prometheus/
[root@k8s-m01 kube-prometheus]# grep -2 selector manifests/prometheus-serviceMonitorKube*
manifests/prometheus-serviceMonitorKubeControllerManager.yaml- matchNames:
manifests/prometheus-serviceMonitorKubeControllerManager.yaml- - kube-system
manifests/prometheus-serviceMonitorKubeControllerManager.yaml: selector:
manifests/prometheus-serviceMonitorKubeControllerManager.yaml- matchLabels:
manifests/prometheus-serviceMonitorKubeControllerManager.yaml- k8s-app: kube-controller-manager
--
manifests/prometheus-serviceMonitorKubelet.yaml- matchNames:
manifests/prometheus-serviceMonitorKubelet.yaml- - kube-system
manifests/prometheus-serviceMonitorKubelet.yaml: selector:
manifests/prometheus-serviceMonitorKubelet.yaml- matchLabels:
manifests/prometheus-serviceMonitorKubelet.yaml- k8s-app: kubelet
--
manifests/prometheus-serviceMonitorKubeScheduler.yaml- matchNames:
manifests/prometheus-serviceMonitorKubeScheduler.yaml- - kube-system
manifests/prometheus-serviceMonitorKubeScheduler.yaml: selector:
manifests/prometheus-serviceMonitorKubeScheduler.yaml- matchLabels:
manifests/prometheus-serviceMonitorKubeScheduler.yaml- k8s-app: kube-scheduler
# 新建prometheus-kubeSchedulerService.yaml
# 新建prometheus-kubeSchedulerService.yaml
$ cd /etc/kubernetes/kube-prometheus/
$ cat >manifests/prometheus-kubeSchedulerService.yaml<<EOF
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-scheduler
labels:
k8s-app: kube-scheduler
spec:
selector:
component: kube-scheduler
type: ClusterIP
clusterIP: None
ports:
- name: http-metrics
port: 10251
targetPort: 10251
protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
labels:
k8s-app: kube-scheduler
name: kube-scheduler
namespace: kube-system
subsets:
- addresses:
- ip: 10.0.0.61
ports:
- name: http-metrics
port: 10251
protocol: TCP
EOF
kubectl apply -f manifests/prometheus-kubeSchedulerService.yaml
# 同理新建prometheus-kubeControllerManagerService.yaml
cat >manifests/prometheus-kubeControllerManagerService.yaml<<EOF
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-controller-manager
labels:
k8s-app: kube-controller-manager
spec:
selector:
component: kube-controller-manager
type: ClusterIP
clusterIP: None
ports:
- name: http-metrics
port: 10252
targetPort: 10252
protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
labels:
k8s-app: kube-controller-manager
name: kube-controller-manager
namespace: kube-system
subsets:
- addresses:
- ip: 10.0.0.61
ports:
- name: http-metrics
port: 10252
protocol: TCP
EOF
kubectl apply -f manifests/prometheus-kubeControllerManagerService.yaml
3.訪問alertmanager
訪問alertmanager web頁面: http://10.0.0.61:39093/
4.訪問grafana
1.設置grafana時間爲UTC時間:
2.查看自帶的一些模板:
本身自帶很多模板;當然也可以去grafana官網下載 https://grafana.com/grafana/dashboards 或者自己寫。
eyJhbGciOiJSUzI1NiIsImtpZCI6IjhqcUpJc2dtV3dRODB2bmEwM3NZcGJIR3JLZ0toenRmdGJicGQ4ZmtXa1UifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXNoYm9hcmQtYWRtaW4tdG9rZW4tcGJuajUiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGFzaGJvYXJkLWFkbWluIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiMmZlZDYyZWItOGY1MS00MDk2LWI3MTQtYTYyNjQyOTY1MTczIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmRhc2hib2FyZC1hZG1pbiJ9.iaT-wKWtr2Qb9bJPaZbriHqocQVuHhvNNF6ffZjBnM9S0I_52zFjh02IpGjYouQf7lv1PQ_BaQ1S1TyxRgqK5H0KoEse_4Id-aglFLtpUWzgPVFp1_pQFcoEtqAp3oLNTt5-gij5jLBkyfD19Jayom4c8QOi5tcxFJVVTjx4k1A13xbnqR-Vu-Oo48fdTSmzSlpZlgeAoWJfh4mbXx8E5FhI8c15XHnVVNFML42SRVRMCiD5x2cx0q_6ZrZVA2qeGjmd2H4dNhmXpZrUtgt_Zebep5srwhe0CS02nlTLcRE5wITiNcqGicMDDn1X6rKFbinOcPTfWXueE2XxoPniBA