在日常工作中,有时我们需要了解当前正在运行的集群中每个服务实际占用资源情况,此时我们就需要Kubernetes的集群资源采集服务Metrics-Server,通过Metrics-Server采集Node和Pod的内存、磁盘、CPU、网络的使用情况。
metrics-server架构示意图:
一、metrics-server部署
1. 下载资源文件
$ git clone -b release-0.3 https://github.com/kubernetes-incubator/metrics-server.git
2、修改配置文件
$ cd metrics-server/deploy/1.8+/
$ vim metrics-server-deployment.yaml
修改后内容如下:
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-server
namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
labels:
k8s-app: metrics-server
spec:
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
spec:
serviceAccountName: metrics-server
volumes:
# mount in tmp so we can safely use from-scratch images and/or read-only containers
- name: tmp-dir
emptyDir: {}
containers:
- name: metrics-server
image: mirrorgooglecontainers/metrics-server-amd64:v0.3.2
# image: k8s.gcr.io/metrics-server-amd64:v0.3.2
imagePullPolicy: IfNotPresent
command:
- /metrics-server
- --kubelet-preferred-address-types=InternalIP
- --kubelet-insecure-tls
volumeMounts:
- name: tmp-dir
mountPath: /tmp
注:主要修改以下3个地方,修改原因①是被墙的镜像;②镜像拉取策略;③添加命令和相关参数:
containers:
- name: metrics-server
image: mirrorgooglecontainers/metrics-server-amd64:v0.3.2
# image: k8s.gcr.io/metrics-server-amd64:v0.3.2
imagePullPolicy: IfNotPresent
command:
- /metrics-server
- --kubelet-preferred-address-types=InternalIP
- --kubelet-insecure-tls
修改完成后保存并退出。
3. 应用所有配置文件到系统中
$ kubectl apply -f .
过个一两分钟(下载镜像和获取数据都会耗时)检查metrics-server的状态,命令如下:
[root@k8s-master 1.8+]# kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
calico-node-b78m4 1/1 Running 0 176m
calico-node-r5mlj 1/1 Running 0 3h6m
calico-node-z5tdh 1/1 Running 0 176m
coredns-fb8b8dccf-6mgks 1/1 Running 0 3h21m
coredns-fb8b8dccf-cbtlx 1/1 Running 0 3h21m
etcd-k8s-master 1/1 Running 0 3h20m
kube-apiserver-k8s-master 1/1 Running 0 3h20m
kube-controller-manager-k8s-master 1/1 Running 0 3h20m
kube-proxy-c9xd2 1/1 Running 0 3h21m
kube-proxy-fp2r2 1/1 Running 0 176m
kube-proxy-lrsw7 1/1 Running 0 176m
kube-scheduler-k8s-master 1/1 Running 0 3h20m
metrics-server-7579f696d8-pgcc4 1/1 Running 0 99s
[root@k8s-master 1.8+]# kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k8s-master 179m 8% 1660Mi 43%
k8s-node1 81m 4% 908Mi 23%
k8s-node2 78m 3% 1036Mi 26%
metrics-server已经正常running,并且能够获取节点的信息。
二、metrics-server监控
1、监控全部节点
$ kubectl top node
2、监控指定节点
$ kubectl top node dev
3、监控指定命名空间Pod
$ kubectl top po -n oas-dev
4、监控全部命名空间节点
$ kubectl top po -A
5、节点内存使用率倒序
$ kubectl top node | sort -n -r -k 5
6、Pod内存使用率倒序
$ kubectl top po -A | sort -n -r -k 3
7、Pod指定命名空间内存使用率倒序
$ kubectl top po -n oas-dev | sort -n -r -k 3
8、Pod指定命名空间CPU使用率倒序
$ kubectl top po -n oas-dev | sort -n -r -k 2
9、Pod指定命名空间CPU使用率正序
$ kubectl top po -n oas-dev | sort -n -k 2
到此集群资源监控工具 metrics-server 介绍完成。