Kube-prometheus监控

介绍

很多地方提到Prometheus Operator是kubernetes集群监控的终极解决方案,但是目前Prometheus Operator已经不包含完整功能,完整的解决方案已经变为kube-prometheus。项目地址为:

https://github.com/coreos/kube-prometheus

kube-prometheus 是一整套监控解决方案,它使用 Prometheus 采集集群指标,Grafana 做展示,包含如下组件:

组件 功能描述
The Prometheus Operator 可以非常简单的在kubernetes集群中部署Prometheus服务,并且提供对kubernetes集群的监控,并且可以配置和管理prometheus
Highly available Prometheus 高可用监控工具
Highly available Alertmanager 高可用告警工具,用于接收 Prometheus 发送的告警信息,它支持丰富的告警通知渠道,而且很容易做到告警信息进行去重,降噪,分组等,是一款前卫的告警通知系统。
node-exporter 用于采集服务器层面的运行指标,包括机器的loadavg、filesystem、meminfo等基础监控,类似于传统主机监控维度的zabbix-agent
Prometheus Adapter for Kubernetes Metrics APIs (k8s-prometheus-adapter) 轮询Kubernetes API,并将Kubernetes的结构化信息转换为metrics
kube-state-metrics 收集kubernetes集群内资源对象数据,制定告警规则。
grafana 用于大规模指标数据的可视化展现,是网络架构和应用分析中最流行的时序数据展示工具

其中 k8s-prometheus-adapter 使用 Prometheus 实现了 metrics.k8s.io 和 custom.metrics.k8s.io API,所以不需要再部署 metrics-server( metrics-server 通过 kube-apiserver 发现所有节点,然后调用 kubelet APIs(通过 https 接口)获得各节点(Node)和 Pod 的 CPU、Memory 等资源使用情况。 从 Kubernetes 1.12 开始,kubernetes 的安装脚本移除了 Heapster,从 1.13 开始完全移除了对 Heapster 的支持,Heapster 不再被维护)。

1.部署

1.1下载源码

cd /etc/kubernetes
git clone https://github.com/coreos/kube-prometheus.git

1.2执行安装

[root@k8s-m01 kube-prometheus]# pwd
/etc/kubernetes/kube-prometheus
# 安装 prometheus-operator
[root@k8s-m01 kube-prometheus]# kubectl apply -f manifests/setup
# 安装 promethes metric adapter
[root@k8s-m01 kube-prometheus]# kubectl apply -f manifests/

1.3 查看资源

[root@k8s-m01 kube-prometheus]# kubectl get pod,svc,ep -n monitoring
NAME                                       READY   STATUS              RESTARTS   AGE
pod/alertmanager-main-0                    0/2     ContainerCreating   0          7s
pod/alertmanager-main-1                    0/2     ContainerCreating   0          7s
pod/alertmanager-main-2                    0/2     ContainerCreating   0          7s
pod/grafana-5c55845445-wvtrj               0/1     Pending             0          5s
pod/kube-state-metrics-957fd6c75-whw8r     0/3     Pending             0          5s
pod/node-exporter-gqsrh                    0/2     ContainerCreating   0          5s
pod/node-exporter-qmv8h                    0/2     Pending             0          5s
pod/prometheus-adapter-5cdcdf9c8d-whln4    0/1     Pending             0          5s
pod/prometheus-k8s-0                       0/3     Pending             0          4s
pod/prometheus-k8s-1                       0/3     Pending             0          4s
pod/prometheus-operator-6f98f66b89-46fj6   2/2     Running             0          29s

NAME                            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/alertmanager-main       ClusterIP   10.109.157.168   <none>        9093/TCP                     7s
service/alertmanager-operated   ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP   7s
service/grafana                 ClusterIP   10.110.108.88    <none>        3000/TCP                     6s
service/kube-state-metrics      ClusterIP   None             <none>        8443/TCP,9443/TCP            6s
service/node-exporter           ClusterIP   None             <none>        9100/TCP                     6s
service/prometheus-adapter      ClusterIP   10.102.225.171   <none>        443/TCP                      6s
service/prometheus-k8s          ClusterIP   10.100.29.234    <none>        9090/TCP                     5s
service/prometheus-operated     ClusterIP   None             <none>        9090/TCP                     6s
service/prometheus-operator     ClusterIP   None             <none>        8443/TCP                     30s

NAME                              ENDPOINTS         AGE
endpoints/alertmanager-main       <none>            7s
endpoints/alertmanager-operated   <none>            7s
endpoints/grafana                 <none>            6s
endpoints/kube-state-metrics      <none>            6s
endpoints/node-exporter                             6s
endpoints/prometheus-adapter      <none>            6s
endpoints/prometheus-k8s          <none>            5s
endpoints/prometheus-operated     <none>            6s
endpoints/prometheus-operator     10.244.2.3:8443   30s


kube-prometheus 创建的 crd 资源

[root@k8s-m01 kube-prometheus]# kubectl get crd -o wide
NAME                                    CREATED AT
alertmanagers.monitoring.coreos.com     2020-05-24T10:11:01Z
podmonitors.monitoring.coreos.com       2020-05-24T10:11:01Z
prometheuses.monitoring.coreos.com      2020-05-24T10:11:01Z
prometheusrules.monitoring.coreos.com   2020-05-24T10:11:02Z
servicemonitors.monitoring.coreos.com   2020-05-24T10:11:02Z
thanosrulers.monitoring.coreos.com      2020-05-24T10:11:02Z

prometheus 资源定义了 prometheus 服务应该如何运行:

[root@k8s-m01 kube-prometheus]# kubectl -n monitoring get prometheus,alertmanager
NAME                                   VERSION   REPLICAS   AGE
prometheus.monitoring.coreos.com/k8s   v2.17.2   2          16s

NAME                                      VERSION   REPLICAS   AGE
alertmanager.monitoring.coreos.com/main   v0.20.0   3          17s

prometheus 和 alertmanager 都是 statefulset 控制器:

[root@k8s-m01 kube-prometheus]# kubectl get statefulset -o wide -n monitoring
NAME                READY   AGE   CONTAINERS                                                       IMAGES
alertmanager-main   0/3     22s   alertmanager,config-reloader                                     quay.io/prometheus/alertmanager:v0.20.0,jimmidyson/configmap-reload:v0.3.0
prometheus-k8s      0/2     21s   prometheus,prometheus-config-reloader,rules-configmap-reloader   quay.io/prometheus/prometheus:v2.17.2,quay.io/coreos/prometheus-config-reloader:v0.39.0,jimmidyson/configmap-reload:v0.3.0

查看node和pod资源使用率:

https://10.0.0.61:6443/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/
在这里插入图片描述

命令行查看node和pod使用资源信息

[root@k8s-m01 kube-prometheus]# kubectl top node
NAME      CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
k8s-m01   151m         7%     1516Mi          39%       
k8s-m02   136m         6%     1299Mi          33% 

[root@k8s-m01 kube-prometheus]# kubectl top pod
NAME             CPU(cores)   MEMORY(bytes)   
nginx-ds-9dfb7   0m           1Mi 

[root@k8s-m01 kube-prometheus]# kubectl top pod -n kube-system
NAME                              CPU(cores)   MEMORY(bytes)   
coredns-66bff467f8-gpz5f          2m           16Mi            
coredns-66bff467f8-pfzc6          2m           14Mi            
etcd-k8s-m01                      23m          99Mi            
kube-apiserver-k8s-m01            51m          488Mi           
kube-controller-manager-k8s-m01   9m           54Mi            
kube-flannel-ds-amd64-xgln9       1m           10Mi            
kube-proxy-nx2jf                  2m           16Mi            
kube-scheduler-k8s-m01            2m           21Mi  

清除资源

kubectl delete --ignore-not-found=true -f manifests/ -f manifests/setup
# 强制删除pod
kubectl delete pod prometheus-k8s-1 -n monitoring --force --grace-period=0

以上各组件说明:

  • MerticServer: k8s集群资源使用情况的聚合器,收集数据给k8s集群内使用;如kubectl,hpa,scheduler
    PrometheusOperator:是一个系统监测和警报工具箱,用来存储监控数据。
    NodeExPorter:用于各个node的关键度量指标状态数据。
    kubeStateMetrics:收集k8s集群内资源对象数据,指定告警规则。
    Prometheus:采用pull方式收集apiserver,scheduler,control-manager,kubelet组件数据,通过http协议传输。
    Grafana:是可视化数据统计和监控平台。

2.访问方式

1.nodeport方式

Kubernetes 服务的 NodePort 默认端口范围是 30000-32767,在某些场合下,这个限制不太适用,我们可以自定义它的端口范围,操作步骤如下:

编辑 vim /etc/kubernetes/manifests/kube-apiserver.yaml 配置文件,增加配置–service-node-port-range=20000-50000

vim /etc/kubernetes/manifests/kube-apiserver.yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 10.0.0.61:6443
  creationTimestamp: null
  labels:
    component: kube-apiserver
    tier: control-plane
  name: kube-apiserver
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-apiserver
    - --advertise-address=10.0.0.61
    - --service-node-port-range=20000-50000  #添加这一行

修改grafana-service文件:

cd /etc/kubernetes/kube-prometheus/
cat >manifests/grafana-service.yaml<<EOF
apiVersion: v1
kind: Service
metadata:
  labels:
    app: grafana
  name: grafana
  namespace: monitoring
spec:
  type: NodePort
  ports:
  - name: http
    port: 3000
    targetPort: http
    nodePort: 33000
  selector:
    app: grafana
EOF
kubectl apply -f manifests/grafana-service.yaml

修改Prometheus-service文件:

cd /etc/kubernetes/kube-prometheus/
cat >manifests/prometheus-service.yaml<<EOF
apiVersion: v1
kind: Service
metadata:
  labels:
    prometheus: k8s
  name: prometheus-k8s
  namespace: monitoring
spec:
  type: NodePort
  ports:
  - name: web
    port: 9090
    targetPort: web
    nodePort: 39090
  selector:
    app: prometheus
    prometheus: k8s
  sessionAffinity: ClientIP
EOF
kubectl apply -f manifests/prometheus-service.yaml

修改alertmanager-service文件:

cd /etc/kubernetes/kube-prometheus/
cat >manifests/alertmanager-service.yaml<<EOF
apiVersion: v1
kind: Service
metadata:
  labels:
    alertmanager: main
  name: alertmanager-main
  namespace: monitoring
spec:
  type: NodePort
  ports:
  - name: web
    port: 9093
    targetPort: web
    nodePort: 39093
  selector:
    alertmanager: main
    app: alertmanager
  sessionAffinity: ClientIP
EOF
kubectl apply -f manifests/alertmanager-service.yaml

2.Prometheus监控页面展示

访问Prometheus web页面: http://10.0.0.61:39090/
展开Status菜单,查看targets,可以看到只有图中两个监控任务没有对应的目标,这和serviceMonitor资源对象有关
在这里插入图片描述
原因分析:

因为serviceMonitor选择svc时,是根据labels标签选取,而在指定的命名空间(kube-system),并没有对应的标签。kube-apiserver之所以正常是因为kube-apiserver 服务 namespace 在default使用默认svc kubernetes。其余组件服务在kube-system 空间 ,需要单独创建svc。

解决办法:

# 查看serviceMonitor选取svc规则
[root@k8s-m01 ~]# cd /etc/kubernetes/kube-prometheus/
[root@k8s-m01 kube-prometheus]# grep -2 selector manifests/prometheus-serviceMonitorKube*
manifests/prometheus-serviceMonitorKubeControllerManager.yaml-    matchNames:
manifests/prometheus-serviceMonitorKubeControllerManager.yaml-    - kube-system
manifests/prometheus-serviceMonitorKubeControllerManager.yaml:  selector:
manifests/prometheus-serviceMonitorKubeControllerManager.yaml-    matchLabels:
manifests/prometheus-serviceMonitorKubeControllerManager.yaml-      k8s-app: kube-controller-manager
--
manifests/prometheus-serviceMonitorKubelet.yaml-    matchNames:
manifests/prometheus-serviceMonitorKubelet.yaml-    - kube-system
manifests/prometheus-serviceMonitorKubelet.yaml:  selector:
manifests/prometheus-serviceMonitorKubelet.yaml-    matchLabels:
manifests/prometheus-serviceMonitorKubelet.yaml-      k8s-app: kubelet
--
manifests/prometheus-serviceMonitorKubeScheduler.yaml-    matchNames:
manifests/prometheus-serviceMonitorKubeScheduler.yaml-    - kube-system
manifests/prometheus-serviceMonitorKubeScheduler.yaml:  selector:
manifests/prometheus-serviceMonitorKubeScheduler.yaml-    matchLabels:
manifests/prometheus-serviceMonitorKubeScheduler.yaml-      k8s-app: kube-scheduler


# 新建prometheus-kubeSchedulerService.yaml
# 新建prometheus-kubeSchedulerService.yaml
$ cd /etc/kubernetes/kube-prometheus/
$ cat >manifests/prometheus-kubeSchedulerService.yaml<<EOF
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-scheduler
  labels:
    k8s-app: kube-scheduler
spec:
  selector:
    component: kube-scheduler
  type: ClusterIP
  clusterIP: None
  ports:
  - name: http-metrics
    port: 10251
    targetPort: 10251
    protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
  labels:
    k8s-app: kube-scheduler
  name: kube-scheduler
  namespace: kube-system
subsets:
- addresses:
  - ip: 10.0.0.61
  ports:
  - name: http-metrics
    port: 10251
    protocol: TCP
EOF
kubectl apply -f manifests/prometheus-kubeSchedulerService.yaml

# 同理新建prometheus-kubeControllerManagerService.yaml
cat >manifests/prometheus-kubeControllerManagerService.yaml<<EOF  
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-controller-manager
  labels:
    k8s-app: kube-controller-manager
spec:
  selector:
    component: kube-controller-manager
  type: ClusterIP
  clusterIP: None
  ports:
  - name: http-metrics
    port: 10252
    targetPort: 10252
    protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
  labels:
    k8s-app: kube-controller-manager
  name: kube-controller-manager
  namespace: kube-system
subsets:
- addresses:
  - ip: 10.0.0.61
  ports:
  - name: http-metrics
    port: 10252
    protocol: TCP
EOF
kubectl apply -f manifests/prometheus-kubeControllerManagerService.yaml

3.访问alertmanager

访问alertmanager web页面: http://10.0.0.61:39093/
在这里插入图片描述

4.访问grafana

http://10.0.0.61:33000/
在这里插入图片描述

1.设置grafana时间为UTC时间:

在这里插入图片描述

2.查看自带的一些模板:

在这里插入图片描述
在这里插入图片描述
本身自带很多模板;当然也可以去grafana官网下载 https://grafana.com/grafana/dashboards 或者自己写。

eyJhbGciOiJSUzI1NiIsImtpZCI6IjhqcUpJc2dtV3dRODB2bmEwM3NZcGJIR3JLZ0toenRmdGJicGQ4ZmtXa1UifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXNoYm9hcmQtYWRtaW4tdG9rZW4tcGJuajUiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGFzaGJvYXJkLWFkbWluIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiMmZlZDYyZWItOGY1MS00MDk2LWI3MTQtYTYyNjQyOTY1MTczIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmRhc2hib2FyZC1hZG1pbiJ9.iaT-wKWtr2Qb9bJPaZbriHqocQVuHhvNNF6ffZjBnM9S0I_52zFjh02IpGjYouQf7lv1PQ_BaQ1S1TyxRgqK5H0KoEse_4Id-aglFLtpUWzgPVFp1_pQFcoEtqAp3oLNTt5-gij5jLBkyfD19Jayom4c8QOi5tcxFJVVTjx4k1A13xbnqR-Vu-Oo48fdTSmzSlpZlgeAoWJfh4mbXx8E5FhI8c15XHnVVNFML42SRVRMCiD5x2cx0q_6ZrZVA2qeGjmd2H4dNhmXpZrUtgt_Zebep5srwhe0CS02nlTLcRE5wITiNcqGicMDDn1X6rKFbinOcPTfWXueE2XxoPniBA
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章