介绍
很多地方提到Prometheus Operator是kubernetes集群监控的终极解决方案,但是目前Prometheus Operator已经不包含完整功能,完整的解决方案已经变为kube-prometheus。项目地址为:
https://github.com/coreos/kube-prometheus
kube-prometheus 是一整套监控解决方案,它使用 Prometheus 采集集群指标,Grafana 做展示,包含如下组件:
组件 | 功能描述 |
---|---|
The Prometheus Operator | 可以非常简单的在kubernetes集群中部署Prometheus服务,并且提供对kubernetes集群的监控,并且可以配置和管理prometheus |
Highly available Prometheus | 高可用监控工具 |
Highly available Alertmanager | 高可用告警工具,用于接收 Prometheus 发送的告警信息,它支持丰富的告警通知渠道,而且很容易做到告警信息进行去重,降噪,分组等,是一款前卫的告警通知系统。 |
node-exporter | 用于采集服务器层面的运行指标,包括机器的loadavg、filesystem、meminfo等基础监控,类似于传统主机监控维度的zabbix-agent |
Prometheus Adapter for Kubernetes Metrics APIs (k8s-prometheus-adapter) | 轮询Kubernetes API,并将Kubernetes的结构化信息转换为metrics |
kube-state-metrics | 收集kubernetes集群内资源对象数据,制定告警规则。 |
grafana | 用于大规模指标数据的可视化展现,是网络架构和应用分析中最流行的时序数据展示工具 |
其中 k8s-prometheus-adapter 使用 Prometheus 实现了 metrics.k8s.io 和 custom.metrics.k8s.io API,所以不需要再部署 metrics-server( metrics-server 通过 kube-apiserver 发现所有节点,然后调用 kubelet APIs(通过 https 接口)获得各节点(Node)和 Pod 的 CPU、Memory 等资源使用情况。 从 Kubernetes 1.12 开始,kubernetes 的安装脚本移除了 Heapster,从 1.13 开始完全移除了对 Heapster 的支持,Heapster 不再被维护)。
1.部署
1.1下载源码
cd /etc/kubernetes
git clone https://github.com/coreos/kube-prometheus.git
1.2执行安装
[root@k8s-m01 kube-prometheus]# pwd
/etc/kubernetes/kube-prometheus
# 安装 prometheus-operator
[root@k8s-m01 kube-prometheus]# kubectl apply -f manifests/setup
# 安装 promethes metric adapter
[root@k8s-m01 kube-prometheus]# kubectl apply -f manifests/
1.3 查看资源
[root@k8s-m01 kube-prometheus]# kubectl get pod,svc,ep -n monitoring
NAME READY STATUS RESTARTS AGE
pod/alertmanager-main-0 0/2 ContainerCreating 0 7s
pod/alertmanager-main-1 0/2 ContainerCreating 0 7s
pod/alertmanager-main-2 0/2 ContainerCreating 0 7s
pod/grafana-5c55845445-wvtrj 0/1 Pending 0 5s
pod/kube-state-metrics-957fd6c75-whw8r 0/3 Pending 0 5s
pod/node-exporter-gqsrh 0/2 ContainerCreating 0 5s
pod/node-exporter-qmv8h 0/2 Pending 0 5s
pod/prometheus-adapter-5cdcdf9c8d-whln4 0/1 Pending 0 5s
pod/prometheus-k8s-0 0/3 Pending 0 4s
pod/prometheus-k8s-1 0/3 Pending 0 4s
pod/prometheus-operator-6f98f66b89-46fj6 2/2 Running 0 29s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/alertmanager-main ClusterIP 10.109.157.168 <none> 9093/TCP 7s
service/alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 7s
service/grafana ClusterIP 10.110.108.88 <none> 3000/TCP 6s
service/kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 6s
service/node-exporter ClusterIP None <none> 9100/TCP 6s
service/prometheus-adapter ClusterIP 10.102.225.171 <none> 443/TCP 6s
service/prometheus-k8s ClusterIP 10.100.29.234 <none> 9090/TCP 5s
service/prometheus-operated ClusterIP None <none> 9090/TCP 6s
service/prometheus-operator ClusterIP None <none> 8443/TCP 30s
NAME ENDPOINTS AGE
endpoints/alertmanager-main <none> 7s
endpoints/alertmanager-operated <none> 7s
endpoints/grafana <none> 6s
endpoints/kube-state-metrics <none> 6s
endpoints/node-exporter 6s
endpoints/prometheus-adapter <none> 6s
endpoints/prometheus-k8s <none> 5s
endpoints/prometheus-operated <none> 6s
endpoints/prometheus-operator 10.244.2.3:8443 30s
kube-prometheus 创建的 crd 资源
[root@k8s-m01 kube-prometheus]# kubectl get crd -o wide
NAME CREATED AT
alertmanagers.monitoring.coreos.com 2020-05-24T10:11:01Z
podmonitors.monitoring.coreos.com 2020-05-24T10:11:01Z
prometheuses.monitoring.coreos.com 2020-05-24T10:11:01Z
prometheusrules.monitoring.coreos.com 2020-05-24T10:11:02Z
servicemonitors.monitoring.coreos.com 2020-05-24T10:11:02Z
thanosrulers.monitoring.coreos.com 2020-05-24T10:11:02Z
prometheus 资源定义了 prometheus 服务应该如何运行:
[root@k8s-m01 kube-prometheus]# kubectl -n monitoring get prometheus,alertmanager
NAME VERSION REPLICAS AGE
prometheus.monitoring.coreos.com/k8s v2.17.2 2 16s
NAME VERSION REPLICAS AGE
alertmanager.monitoring.coreos.com/main v0.20.0 3 17s
prometheus 和 alertmanager 都是 statefulset 控制器:
[root@k8s-m01 kube-prometheus]# kubectl get statefulset -o wide -n monitoring
NAME READY AGE CONTAINERS IMAGES
alertmanager-main 0/3 22s alertmanager,config-reloader quay.io/prometheus/alertmanager:v0.20.0,jimmidyson/configmap-reload:v0.3.0
prometheus-k8s 0/2 21s prometheus,prometheus-config-reloader,rules-configmap-reloader quay.io/prometheus/prometheus:v2.17.2,quay.io/coreos/prometheus-config-reloader:v0.39.0,jimmidyson/configmap-reload:v0.3.0
查看node和pod资源使用率:
命令行查看node和pod使用资源信息
[root@k8s-m01 kube-prometheus]# kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k8s-m01 151m 7% 1516Mi 39%
k8s-m02 136m 6% 1299Mi 33%
[root@k8s-m01 kube-prometheus]# kubectl top pod
NAME CPU(cores) MEMORY(bytes)
nginx-ds-9dfb7 0m 1Mi
[root@k8s-m01 kube-prometheus]# kubectl top pod -n kube-system
NAME CPU(cores) MEMORY(bytes)
coredns-66bff467f8-gpz5f 2m 16Mi
coredns-66bff467f8-pfzc6 2m 14Mi
etcd-k8s-m01 23m 99Mi
kube-apiserver-k8s-m01 51m 488Mi
kube-controller-manager-k8s-m01 9m 54Mi
kube-flannel-ds-amd64-xgln9 1m 10Mi
kube-proxy-nx2jf 2m 16Mi
kube-scheduler-k8s-m01 2m 21Mi
清除资源
kubectl delete --ignore-not-found=true -f manifests/ -f manifests/setup
# 强制删除pod
kubectl delete pod prometheus-k8s-1 -n monitoring --force --grace-period=0
以上各组件说明:
- MerticServer: k8s集群资源使用情况的聚合器,收集数据给k8s集群内使用;如kubectl,hpa,scheduler
PrometheusOperator:是一个系统监测和警报工具箱,用来存储监控数据。
NodeExPorter:用于各个node的关键度量指标状态数据。
kubeStateMetrics:收集k8s集群内资源对象数据,指定告警规则。
Prometheus:采用pull方式收集apiserver,scheduler,control-manager,kubelet组件数据,通过http协议传输。
Grafana:是可视化数据统计和监控平台。
2.访问方式
1.nodeport方式
Kubernetes 服务的 NodePort 默认端口范围是 30000-32767,在某些场合下,这个限制不太适用,我们可以自定义它的端口范围,操作步骤如下:
编辑 vim /etc/kubernetes/manifests/kube-apiserver.yaml 配置文件,增加配置–service-node-port-range=20000-50000
vim /etc/kubernetes/manifests/kube-apiserver.yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 10.0.0.61:6443
creationTimestamp: null
labels:
component: kube-apiserver
tier: control-plane
name: kube-apiserver
namespace: kube-system
spec:
containers:
- command:
- kube-apiserver
- --advertise-address=10.0.0.61
- --service-node-port-range=20000-50000 #添加这一行
修改grafana-service文件:
cd /etc/kubernetes/kube-prometheus/
cat >manifests/grafana-service.yaml<<EOF
apiVersion: v1
kind: Service
metadata:
labels:
app: grafana
name: grafana
namespace: monitoring
spec:
type: NodePort
ports:
- name: http
port: 3000
targetPort: http
nodePort: 33000
selector:
app: grafana
EOF
kubectl apply -f manifests/grafana-service.yaml
修改Prometheus-service文件:
cd /etc/kubernetes/kube-prometheus/
cat >manifests/prometheus-service.yaml<<EOF
apiVersion: v1
kind: Service
metadata:
labels:
prometheus: k8s
name: prometheus-k8s
namespace: monitoring
spec:
type: NodePort
ports:
- name: web
port: 9090
targetPort: web
nodePort: 39090
selector:
app: prometheus
prometheus: k8s
sessionAffinity: ClientIP
EOF
kubectl apply -f manifests/prometheus-service.yaml
修改alertmanager-service文件:
cd /etc/kubernetes/kube-prometheus/
cat >manifests/alertmanager-service.yaml<<EOF
apiVersion: v1
kind: Service
metadata:
labels:
alertmanager: main
name: alertmanager-main
namespace: monitoring
spec:
type: NodePort
ports:
- name: web
port: 9093
targetPort: web
nodePort: 39093
selector:
alertmanager: main
app: alertmanager
sessionAffinity: ClientIP
EOF
kubectl apply -f manifests/alertmanager-service.yaml
2.Prometheus监控页面展示
访问Prometheus web页面: http://10.0.0.61:39090/
展开Status菜单,查看targets,可以看到只有图中两个监控任务没有对应的目标,这和serviceMonitor资源对象有关
原因分析:
因为serviceMonitor选择svc时,是根据labels标签选取,而在指定的命名空间(kube-system),并没有对应的标签。kube-apiserver之所以正常是因为kube-apiserver 服务 namespace 在default使用默认svc kubernetes。其余组件服务在kube-system 空间 ,需要单独创建svc。
解决办法:
# 查看serviceMonitor选取svc规则
[root@k8s-m01 ~]# cd /etc/kubernetes/kube-prometheus/
[root@k8s-m01 kube-prometheus]# grep -2 selector manifests/prometheus-serviceMonitorKube*
manifests/prometheus-serviceMonitorKubeControllerManager.yaml- matchNames:
manifests/prometheus-serviceMonitorKubeControllerManager.yaml- - kube-system
manifests/prometheus-serviceMonitorKubeControllerManager.yaml: selector:
manifests/prometheus-serviceMonitorKubeControllerManager.yaml- matchLabels:
manifests/prometheus-serviceMonitorKubeControllerManager.yaml- k8s-app: kube-controller-manager
--
manifests/prometheus-serviceMonitorKubelet.yaml- matchNames:
manifests/prometheus-serviceMonitorKubelet.yaml- - kube-system
manifests/prometheus-serviceMonitorKubelet.yaml: selector:
manifests/prometheus-serviceMonitorKubelet.yaml- matchLabels:
manifests/prometheus-serviceMonitorKubelet.yaml- k8s-app: kubelet
--
manifests/prometheus-serviceMonitorKubeScheduler.yaml- matchNames:
manifests/prometheus-serviceMonitorKubeScheduler.yaml- - kube-system
manifests/prometheus-serviceMonitorKubeScheduler.yaml: selector:
manifests/prometheus-serviceMonitorKubeScheduler.yaml- matchLabels:
manifests/prometheus-serviceMonitorKubeScheduler.yaml- k8s-app: kube-scheduler
# 新建prometheus-kubeSchedulerService.yaml
# 新建prometheus-kubeSchedulerService.yaml
$ cd /etc/kubernetes/kube-prometheus/
$ cat >manifests/prometheus-kubeSchedulerService.yaml<<EOF
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-scheduler
labels:
k8s-app: kube-scheduler
spec:
selector:
component: kube-scheduler
type: ClusterIP
clusterIP: None
ports:
- name: http-metrics
port: 10251
targetPort: 10251
protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
labels:
k8s-app: kube-scheduler
name: kube-scheduler
namespace: kube-system
subsets:
- addresses:
- ip: 10.0.0.61
ports:
- name: http-metrics
port: 10251
protocol: TCP
EOF
kubectl apply -f manifests/prometheus-kubeSchedulerService.yaml
# 同理新建prometheus-kubeControllerManagerService.yaml
cat >manifests/prometheus-kubeControllerManagerService.yaml<<EOF
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-controller-manager
labels:
k8s-app: kube-controller-manager
spec:
selector:
component: kube-controller-manager
type: ClusterIP
clusterIP: None
ports:
- name: http-metrics
port: 10252
targetPort: 10252
protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
labels:
k8s-app: kube-controller-manager
name: kube-controller-manager
namespace: kube-system
subsets:
- addresses:
- ip: 10.0.0.61
ports:
- name: http-metrics
port: 10252
protocol: TCP
EOF
kubectl apply -f manifests/prometheus-kubeControllerManagerService.yaml
3.访问alertmanager
访问alertmanager web页面: http://10.0.0.61:39093/
4.访问grafana
1.设置grafana时间为UTC时间:
2.查看自带的一些模板:
本身自带很多模板;当然也可以去grafana官网下载 https://grafana.com/grafana/dashboards 或者自己写。
eyJhbGciOiJSUzI1NiIsImtpZCI6IjhqcUpJc2dtV3dRODB2bmEwM3NZcGJIR3JLZ0toenRmdGJicGQ4ZmtXa1UifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXNoYm9hcmQtYWRtaW4tdG9rZW4tcGJuajUiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGFzaGJvYXJkLWFkbWluIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiMmZlZDYyZWItOGY1MS00MDk2LWI3MTQtYTYyNjQyOTY1MTczIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmRhc2hib2FyZC1hZG1pbiJ9.iaT-wKWtr2Qb9bJPaZbriHqocQVuHhvNNF6ffZjBnM9S0I_52zFjh02IpGjYouQf7lv1PQ_BaQ1S1TyxRgqK5H0KoEse_4Id-aglFLtpUWzgPVFp1_pQFcoEtqAp3oLNTt5-gij5jLBkyfD19Jayom4c8QOi5tcxFJVVTjx4k1A13xbnqR-Vu-Oo48fdTSmzSlpZlgeAoWJfh4mbXx8E5FhI8c15XHnVVNFML42SRVRMCiD5x2cx0q_6ZrZVA2qeGjmd2H4dNhmXpZrUtgt_Zebep5srwhe0CS02nlTLcRE5wITiNcqGicMDDn1X6rKFbinOcPTfWXueE2XxoPniBA