简介
- 调度器通过 kubernetes 的
watch
机制来发现集群中新创建且尚未被调度到 Node上的 Pod。调度器会将发现的每一个未调度的 Pod 调度到一个合适的 Node 上来运行。 - kube-scheduler 是 Kubernetes 集群的默认调度器,并且是集群控制面的一部分。如果你真的希望或者有这方面的需求,kube-scheduler 在设计上是允许你自己写一个调度组件并替换原有的 kube-scheduler。
- 在做调度决定时需要考虑的因素包括:单独和整体的资源请求、硬件/软件/策略限制、亲和以及反亲和要求、数据局域性、负载间的干扰等等。
- 默认策略可以参考:https://kubernetes.io/zh/docs/concepts/scheduling/kube-scheduler/
- 调度框架:https://kubernetes.io/zh/docs/concepts/scheduling-eviction/scheduling-framework/
根据nodeName调度
- nodeName 是节点选择约束的最简单方法,但一般不推荐。如果 nodeName 在PodSpec 中指定了,则它优先于其他的节点选择方法。
- 使用 nodeName 来选择节点的一些限制:
- 如果指定的节点不存在。
- 如果指定的节点没有资源来容纳 pod,则pod 调度失败。
- 云环境中的节点名称并非总是可预测或稳定的。是会变的
这三个条件任意一个不满足就运行不起来。
[root@server2 scheduler]# vim pod.yml
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
# nodeName: server4 /先注释掉结点名,看它现在那里运行
[root@server2 scheduler]# kubectl apply -f pod.yml
pod/nginx created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 1/1 Running 0 13s 10.244.141.255 server3 <none> <none>
/ 运行在了server3上,那末如果不指定的话,这个 pod会依然运行在server3上,这是k8s的默认机制
[root@server2 scheduler]# vim pod.yml
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
nodeName: server4 /指定结点名,让运行在server4撒谎嗯
[root@server2 scheduler]# kubectl delete pod nginx
pod "nginx" deleted
kubec app -f [root@server2 scheduler]# kubectl apply -f pod.yml
kpod/nginx created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 1/1 Running 0 5s 10.244.22.48 server4 <none> <none>
就运行再了server4上
虽然方便,但是会收到各种限制,比如我们的server4上主机只有1个cpu :
vi mpod[root@server2 scheduler]# vim pod.yml
[root@server2 scheduler]# cat pod.yml
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
resources:
requests:
cpu: 2 /需要最低10个cpu
nodeName: server4
[root@server2 scheduler]# kubectl apply -f pod.yml
kubepod/nginx created
[root@server2 scheduler]# kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx 0/1 OutOfcpu 0 2s
就运行不起来了,换作内存也一样。
根据nodeSelector调度
nodeSelector 是节点选择约束的最简单推荐形式。
给选择的节点添加标签:
[root@server2 scheduler]# kubectl label nodes server4 disktype=ssd
node/server4 labeled
[root@server2 scheduler]# kubectl get nodes --show-labels
...
server4 Ready <none> 18d v1.18.3 disktype=ssd 标签出现
[root@server2 scheduler]# vim pod.yml
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
nodeSelector:
disktype: ssd
[root@server2 scheduler]# kubectl apply -f pod.yml
pod/nginx created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 1/1 Running 0 6s 10.244.22.49 server4 <none> <none>
运行在了server4上。
当标签不存在时:
[root@server2 scheduler]# vim pod.yml
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
nodeSelector:
disktype: sata /这个标签再任何结点都是不存在的
[root@server2 scheduler]# kubectl delete pod nginx
pod "nginx" deleted
[root@server2 scheduler]# kubectl apply -f pod.yml
kpod/nginx created
[root@server2 scheduler]# kubectlg get pod -o wide
-bash: kubectlg: command not found
[root@server2 scheduler]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 0/1 Pending 0 12s <none> <none> <none> <none>
[root@server2 scheduler]# kubectl describe pod nginx
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling <unknown> default-scheduler 0/3 nodes are available: 3 node(s) didn't match node selector.
Warning FailedScheduling <unknown> default-scheduler 0/3 nodes are available: 3 node(s) didn't match node selector.
无法匹配结点标签。
这就导致再不小心删除了标签后,pod无法运行的情况。那我们如何保证即使标签不存在了,也能正常运行哪?
结点的亲和与反亲和
- 亲和与反亲和
- nodeSelector 提供了一种非常简单的方法来将 pod 约束到具有特定标签的节点上。亲和/反亲和功能极大地扩展了你可以表达约束的类型。
- 你可以发现规则是“软”/“偏好”,而不是硬性要求,因此,如果调度器无法满足该要求,仍然调度该 pod
- 你可以使用节点上的 pod 的标签来
约束
,而不是使用节点本身的标签,来允许哪些 pod可以
或者不可以被
放置在一起。
结点亲和
- 节点亲和
-
requiredDuringSchedulingIgnoredDuringExecution 必须满足
-
preferredDuringSchedulingIgnoredDuringExecution 倾向满足,意为不满足也没有关系
-
IgnoreDuringExecution 表示如果在Pod运行期间Node的标签发生变化,导致
亲和性策略不能满足,则继续运行当前的Pod。 -
参考:https://kubernetes.io/zh/docs/concepts/configuration/assign-pod-node/
-
结点亲和性示例:
[root@server2 scheduler]# vim pod.yml
apiVersion: v1
kind: Pod
metadata:
name: node-affinity
spec:
containers:
- name: nginx
image: nginx
affinity: /亲和
nodeAffinity: /结点亲和
requiredDuringSchedulingIgnoredDuringExecution: /必须满足
nodeSelectorTerms: /结点选择术语
- matchExpressions: /匹配表达式
- key: disktype /键
operator: In /规则匹配条件,值再以下之中
values: /值
- ssd
nodeaffinity还支持多种规则匹配条件的配置如:
In: | label 的值在列表内 |
NotIn: | label 的值不在列表内 |
Gt: | label 的值大于设置的值,不支持Pod亲和性 |
Lt: | label 的值小于设置的值,不支持pod亲和性 |
Exists: | 设置的label 存在 |
DoesNotExist: | 设置的 label 不存在 |
[root@server2 scheduler]# kubectl apply -f pod.yml
kubepod/node-affinity created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
node-affinity 1/1 Running 0 5s 10.244.22.50 server4 <none> <none>
运行嗯早server4上,因为我们的server4有这个标签,而且我们是强制的。
结点亲和性示例2:
[root@server2 scheduler]# kubectl delete -f pod.yml
pod "node-affinity" deleted
vim po [root@server2 scheduler]# vim pod.yml
[root@server2 scheduler]# cat pod.yml
apiVersion: v1
kind: Pod
metadata:
name: node-affinity
spec:
containers:
- name: nginx
image: nginx
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution: /必须满足的
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: NotIn /值不再下面的列表中
values:
- server4 /即不能调度到server4上
preferredDuringSchedulingIgnoredDuringExecution: /倾向满足的
- weight: 1 /权重为1,权重大的优先
preference:
matchExpressions:
- key: disktype
operator: In
values:
- sata /这个标签再结点中没有
[root@server2 scheduler]# kubectl apply -f pod.yml
pod/node-affinity created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
node-affinity 1/1 Running 0 7s 10.244.141.193 server3 <none> <none>
就运行在了server3上。我们首先规定不能再server4上运行,然后倾向满足结点带有 disktype: sata 的结点,这个标签不存在,所以仍然运行到了server3的结点上。
pod亲和
- pod 亲和性和反亲和性
- podAffinity 主要解决POD可以和哪些POD部署在同一个拓扑域中的问题(拓扑域用主机标签实现,可以是单个主机,也可以是多个主机组成的cluster、zone等。)
- podAntiAffinity主要解决POD不能和哪些POD部署在同一个拓扑域中的问题。它们处理的是Kubernetes集群内部POD和POD之间的关系。
- Pod 间亲和与反亲和在与更高级别的集合(例如 ReplicaSets,StatefulSets,Deployments 等)一起使用时,它们可能更加有用。可以轻松配置一组应位于相同定义拓扑(例如,节点)中的工作负载。
pod亲和性示例:
[root@server2 scheduler]# vim pod.yml
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
app: nginx /指定标签,后面会用到
spec:
containers:
- name: nginx
image: nginx
[root@server2 scheduler]# kubectl apply -f pod.yml
kpod/nginx created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 1/1 Running 0 5s 10.244.141.195 server3 <none> <none>
这个pod运行在server3上
再加一个pod进去.
[root@server2 scheduler]# vim pod.yml
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
app: nginx /pod标签
spec:
containers:
- name: nginx
image: nginx
---
apiVersion: v1
kind: Pod
metadata:
name: mysql
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:5.7
env:
- name: "MYSQL_ROOT_PASSWORD"
value: "westos"
affinity:
podAffinity: /pod亲和
requiredDuringSchedulingIgnoredDuringExecution: /强制要求
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx 和标签为 app: nginx的pod运行在一起
topologyKey: kubernetes.io/hostname /定义一个调度域,hostname是结点层面,也可以是集群层面。
刚在nginx这个pod运行在了server3上,由于亲和性,mysql这个pod也会运行在server3上.
[root@server2 scheduler]# kubectl apply -f pod.yml
pod/nginx unchanged
pod/mysql created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mysql 1/1 Running 0 7s 10.244.141.194 server3 <none> <none>
nginx 1/1 Running 0 11m 10.244.141.195 server3 <none> <none>
运行再了server3上。可见mysql这个pod是跟着app: nginx 这个标签走的,那里有这个标签,pod就运行在那里。
我们添加nodeName选择,将nginx运行在server4上:
[root@server2 scheduler]# vim pod.yml
spec:
containers:
- name: nginx
image: nginx
nodeName: server4
其它不变
[root@server2 scheduler]# kubectl apply -f pod.yml
pod/nginx created
pod/mysql created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mysql 1/1 Running 0 6s 10.244.22.53 server4 <none> <none>
nginx 1/1 Running 0 6s 10.244.22.52 server4 <none> <none>
mysql也跟着运行在server4上
如果我们改成NotIn哪?
[root@server2 scheduler]# kubectl delete -f pod.yml
pod "nginx" deleted
pod "mysql" deleted
[root@server2 scheduler]# vim pod.yml
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
---
apiVersion: v1
kind: Pod
metadata:
name: mysql
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:5.7
env:
- name: "MYSQL_ROOT_PASSWORD"
value: "westos"
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: NotIn /这里改为notin
values:
- nginx /就是步和有 app:nginx这个标签的pod运行再一起
topologyKey: kubernetes.io/hostname
~
[root@server2 scheduler]# kubectl apply -f pod.yml
pod/nginx created
pod/mysql created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mysql 1/1 Running 0 7s 10.244.22.51 server4 <none> <none>
nginx 1/1 Running 0 7s 10.244.141.197 server3 <none> <none>
果然分开运行了。
pod反亲和性示例:
[root@server2 scheduler]# vim pod.yml
[root@server2 scheduler]# cat pod.yml
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
---
apiVersion: v1
kind: Pod
metadata:
name: mysql
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:5.7
env:
- name: "MYSQL_ROOT_PASSWORD"
value: "westos"
affinity:
podAntiAffinity: /值有这里发生了更改,把podAffinity变成了podAntiAffinity
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: kubernetes.io/hostname
这样的话这两个pod就会是相互排斥的,始终不会处于同一个结点内。
[root@server2 scheduler]# kubectl apply -f pod.yml
pod/nginx created
pod/mysql created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mysql 0/1 ContainerCreating 0 4s <none> server4 <none> <none>
nginx 0/1 ContainerCreating 0 4s <none> server3 <none> <none>
场景:再deployment的情况下,比如有三个副本,都占用80端口的话肯定会起冲突,反亲和就解决了这个问题。
污点
-
NodeAffinity节点亲和性,是Pod上定义的一种属性,使Pod能够按我们的要求调度到某个Node上,而Taints则恰恰相反,它可以让
Node 拒绝运行Pod
,甚至驱逐Pod。 -
Taints(污点)
是Node的一个属性,设置了Taints后,所以Kubernetes是不会
将Pod调度到这个Node上的,于是Kubernetes就给Pod设置了个属性Tolerations(容忍)
,只要Pod能够容忍Node上的污点,那么Kubernetes就会忽略Node上的污点,就能够(不是必须)把Pod调度过去。
[root@server2 scheduler]# kubectl describe nodes server2 |grep Taints
Taints: node-role.kubernetes.io/master:NoSchedule
[root@server2 scheduler]# kubectl describe nodes server3 |grep Taints
Taints: <none>
[root@server2 scheduler]# kubectl describe nodes server4 |grep Taints
Taints: <none>
我们的master结点就是存在污点的,而且没有设置容忍,所以不参加调度.而server3和4没有,就可以参加调度。
去掉master结点的taints
:
[root@server2 scheduler]# kubectl taint node server2 node-role.kubernetes.io/master:NoSchedule-
node/server2 untainted
[root@server2 scheduler]# kubectl describe nodes server2 |grep Taints
Taints: <none>
然后,server2就可以参加调度了。
[root@server2 scheduler]# vim deployment.yml
apiVersion: apps/v1
kind: Deployment 加一个deployment控制器
metadata:
name: deployment-v1
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myapp:v1
affinity:
podAntiAffinity: /pod反亲和策略
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- myapp
topologyKey: kubernetes.io/hostname
[root@server2 scheduler]# kubectl apply -f deployment.yml
kdeployment.apps/deployment-v1 created
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment-v1-6498765b4b-2l6mc 1/1 Running 0 12s 10.244.22.56 server4 <none> <none>
deployment-v1-6498765b4b-mt5x2 1/1 Running 0 12s 10.244.179.81 server2 <none> <none>
deployment-v1-6498765b4b-scvzq 1/1 Running 0 12s 10.244.141.204 server3 <none> <none>
server2 也参与调度了。由于我们设置的是强制类型的,所以再创建多个副本的时候就会处以等待状态,因为反亲和策略,不会和有myapp 标签的pod运行在一个结点。
[root@server2 scheduler]# kubectl get pod
NAME READY STATUS RESTARTS AGE
deployment-v1-6498765b4b-2l6mc 1/1 Running 0 43m
deployment-v1-6498765b4b-lvmr4 0/1 Pending 0 1s
deployment-v1-6498765b4b-mt5x2 1/1 Running 0 43m
deployment-v1-6498765b4b-qjtq2 0/1 Pending 0 1s
deployment-v1-6498765b4b-rn5vj 0/1 Pending 0 1s
deployment-v1-6498765b4b-scvzq 1/1 Running 0 43m
给master结点加上taint
:
[root@server2 scheduler]# kubectl taint node server2 node-role.kubernetes.io/master:NoSchedule
node/server2 tainted
[root@server2 scheduler]# kubectl describe nodes server2 |grep Taint
Taints: node-role.kubernetes.io/master:NoSchedule
- 其中[effect] 可取值: [ NoSchedule | PreferNoSchedule | NoExecute ]
- NoSchedule:POD 不会被调度到标记为 taints 节点。
- PreferNoSchedule:NoSchedule 的软策略版本。
- NoExecute:该选项意味着一旦 Taint 生效,如该节点内正在运行的 POD 没有对应Tolerate (容忍)设置,会直接被逐出。
[root@server2 scheduler]# kubectl apply -f deployment.yml
deployment.apps/deployment-v1 created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment-v1-6498765b4b-9l5s7 0/1 Pending 0 13s <none> <none> <none> <none>
deployment-v1-6498765b4b-jdjvz 1/1 Running 0 13s 10.244.22.57 server4 <none> <none>
deployment-v1-6498765b4b-zzdlq 1/1 Running 0 13s 10.244.141.200 server3 <none> <none>
再次执行我么发现当前有两个pod正常运行,分别再server3和4上,由于server2上有污点,并且集群中,没有多余的结点了,所以又一个pod会处于pend(等待)状态。
master结点添加容忍
- tolerations中定义的key、value、effect,要与node上设置的taint保持一直:
- 如果 operator 是 Exists ,value可以省略。
- 如果 operator 是 Equal ,则key与value之间的关系必须相等。
- 如果不指定operator属性,则默认值为Equal。
- 还有两个特殊值:
- 当不指定key,再配合Exists 就能匹配所有的key与value ,可以容忍所有污点。
- 当不指定effect ,则匹配所有的effect。
[root@server2 scheduler]# vim deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment-v1
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myapp:v1
tolerations:
- operator: "Exists" /存在污点就容忍。//未指定key,配合exists使用
effect: "NoSchedule"
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- myapp
topologyKey: kubernetes.io/hostname
[root@server2 scheduler]# kubectl apply -f deployment.yml
deployment.apps/deployment-v1 created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment-v1-bb99b54bc-5ngtf 1/1 Running 0 9s 10.244.22.58 server4 <none> <none>
deployment-v1-bb99b54bc-l4sf5 1/1 Running 0 9s 10.244.179.82 server2 <none> <none>
deployment-v1-bb99b54bc-t7v9p 1/1 Running 0 9s 10.244.141.205 server3 <none> <none>
添加容忍后,msater结点也可以进行调度了,容忍了污点。
我们刚才给master加了noschedual的污点效果,我们还可以加另外两种。
当前三个结点正常运行
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment-v1-bb99b54bc-d2m7c 1/1 Running 0 7s 10.244.22.2 server4 <none> <none>
deployment-v1-bb99b54bc-jfwfp 1/1 Running 0 7s 10.244.179.84 server2 <none> <none>
deployment-v1-bb99b54bc-ntlsr 1/1 Running 0 7s 10.244.141.210 server3 <none> <none>
添加NoExecute(驱离)的污点,此时我们的资源清单中仍然容忍的NoSchedule的效果:
[root@server2 scheduler]# kubectl taint nodes server2 key1=v1:NoExecute
node/server2 tainted
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment-v1-bb99b54bc-2p2zz 0/1 Pending 0 1s <none> <none> <none> <none>
deployment-v1-bb99b54bc-d2m7c 1/1 Running 0 42s 10.244.22.2 server4 <none> <none>
deployment-v1-bb99b54bc-jfwfp 1/1 Terminating 0 42s 10.244.179.84 server2 <none> <none>
deployment-v1-bb99b54bc-ntlsr 1/1 Running 0 42s 10.244.141.210 server3 <none> <none>
我们可以看到server2上的原来的pod已经被驱逐了,deployment控制器自动的重新拉起了一个pod处于pend状态
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment-v1-bb99b54bc-2p2zz 0/1 Pending 0 5s <none> <none> <none> <none>
deployment-v1-bb99b54bc-d2m7c 1/1 Running 0 46s 10.244.22.2 server4 <none> <none>
deployment-v1-bb99b54bc-ntlsr 1/1 Running 0 46s 10.244.141.210 server3 <none> <none>
我们可以:
[root@server2 scheduler]# vim deployment.yml
tolerations:
- operator: "Exists"
# effect: "NoSchedule" /注释掉这里,就可以匹配所有的类型
[root@server2 scheduler]# kubectl apply -f deployment.yml
kubecdeployment.apps/deployment-v1 configured
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment-v1-68679b84bf-ddmlm 0/1 Pending 0 1s <none> <none> <none> <none>
deployment-v1-68679b84bf-qhd7s 1/1 Running 0 3s 10.244.179.85 server2 <none> <none>
deployment-v1-bb99b54bc-d2m7c 1/1 Running 0 11m 10.244.22.2 server4 <none> <none>
deployment-v1-bb99b54bc-ntlsr 1/1 Running 0 11m 10.244.141.210 server3 <none> <none>
server上就运行起来了。
影响pod调度的其它因素
影响Pod调度的指令还有:cordon、drain、delete,后期创建的pod都不会被调度到该节点上,但操作的暴力程度不一样。
cordon 停止调度
:
影响最小,只会将node
调为SchedulingDisabled
(停止调度),新创建pod,不会被调度到该节点,节点原有pod不受影响
,仍正常对外提供服务。
[root@server2 scheduler]# kubectl delete -f deployment.yml
deployment.apps "deployment-v1" deleted
[root@server2 scheduler]# kubectl cordon server3
node/server3 cordoned
[root@server2 scheduler]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
server2 Ready master 18d v1.18.3
server3 Ready,SchedulingDisabled <none> 18d v1.18.3 /出现了停止调度的标签
server4 Ready <none> 18d v1.18.3
我们注释掉刚才的容忍和反亲和,方便我们再开启一个pod。容忍的效果包含这个 cordon ,会对实验有影响.
[root@server2 scheduler]# vim deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment-v1
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myapp:v1
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment-v1-7449b5b68f-cqmkq 0/1 ContainerCreating 0 4s <none> server4 <none> <none>
deployment-v1-7449b5b68f-m442w 1/1 Running 0 4s 10.244.22.3 server4 <none> <none>
deployment-v1-7449b5b68f-rtct6 0/1 ContainerCreating 0 4s <none> server4 <none> <none>
[root@server2 scheduler]# vim deployment.yml
spec:
replicas: 6 /副本数增加至6个
[root@server2 scheduler]# kubectl apply -f deployment.yml
deployment.apps/deployment-v1 configured
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment-v1-7449b5b68f-2smn9 0/1 ContainerCreating 0 2s <none> server4 <none> <none>
deployment-v1-7449b5b68f-9n2w8 0/1 ContainerCreating 0 2s <none> server4 <none> <none>
deployment-v1-7449b5b68f-cqmkq 1/1 Running 0 23s 10.244.22.4 server4 <none> <none>
deployment-v1-7449b5b68f-fgvdf 0/1 ContainerCreating 0 2s <none> server4 <none> <none>
deployment-v1-7449b5b68f-m442w 1/1 Running 0 23s 10.244.22.3 server4 <none> <none>
deployment-v1-7449b5b68f-rtct6 1/1 Running 0 23s 10.244.22.5 server4 <none> <none>
此时我们发现,由于没有容忍,而server3 处于SchedulingDisabled 的状态,所有的的pod都运行再了server4上。
drain 驱逐节点:
这种方式是首先驱逐node上的pod,在其他节点重新创建,然后将节点调为SchedulingDisabled。
不适用,因为每个结点都有用daemonset控制器控制的pod:
[root@server2 scheduler]# kubectl delete -f deployment.yml
kubec undeployment.apps "deployment-v1" deleted
[root@server2 scheduler]# kubectl uncordon server3 /恢复
node/server3 uncordoned
[root@server2 scheduler]# kubectl apply -f deployment.yml
deployment.apps/deployment-v1 created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment-v1-7449b5b68f-7glbp 1/1 Running 0 6s 10.244.141.212 server3 <none> <none>
deployment-v1-7449b5b68f-7pt5j 1/1 Running 0 6s 10.244.141.214 server3 <none> <none>
deployment-v1-7449b5b68f-m292z 1/1 Running 0 6s 10.244.22.9 server4 <none> <none>
deployment-v1-7449b5b68f-qdb9x 1/1 Running 0 6s 10.244.141.213 server3 <none> <none>
当前有三个pod都运行在server3上.
[root@server2 scheduler]# kubectl drain server3
node/server3 cordoned
error: unable to drain node "server3", aborting command...
There are pending nodes to be drained:
server3
error: cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/calico-node-smn7b, kube-system/kube-proxy-9nr5k
//提示有daemonset方式的pod,让我们加 --ignore-daemonsets 参数
[root@server2 scheduler]# kubectl drain server3 --ignore-daemonsets
node/server3 already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-smn7b, kube-system/kube-proxy-9nr5k
。。。
[root@server2 scheduler]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
server2 Ready master 18d v1.18.3
server3 Ready,SchedulingDisabled <none> 18d v1.18.3 /处于disable状态
server4 Ready <none> 18d v1.18.3
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment-v1-7449b5b68f-68bld 1/1 Running 0 18s 10.244.22.10 server4 <none> <none>
deployment-v1-7449b5b68f-9t9v9 1/1 Running 0 18s 10.244.22.12 server4 <none> <none>
deployment-v1-7449b5b68f-m292z 1/1 Running 0 55s 10.244.22.9 server4 <none> <none>
deployment-v1-7449b5b68f-tjh4x 1/1 Running 0 18s 10.244.22.13 server4 <none> <none>
所有的pod都取运行到了server4上。
delete 删除节点
最暴力的一个,首先用drain
驱逐node上的pod,在其他节点重新创建,然后再用delete
从master节点删除该node,master失去对其控制,如要恢复调度,需进入node节点,重启kubelet服务
[root@server2 scheduler]# kubectl delete nodes server3
node "server3" deleted
[root@server2 scheduler]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
server2 Ready master 18d v1.18.3
server4 Ready <none> 18d v1.18.3
//server3上重启服务
[root@server3 ~]# systemctl restart kubelet.service
[root@server2 scheduler]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
server2 Ready master 18d v1.18.3
server3 NotReady <none> 0s v1.18.3 /没准备好,再启动一些pod
server4 Ready <none> 18d v1.18.3
[root@server2 scheduler]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
server2 Ready master 18d v1.18.3
server3 Ready <none> 48s v1.18.3 /好了
server4 Ready <none> 18d v1.18.3
如果我们像把这个结点彻底从集群中分离出来,我们就可以再结点上执行 kubeadm reset
清除所有信息。