簡介
- 調度器通過 kubernetes 的
watch
機制來發現集羣中新創建且尚未被調度到 Node上的 Pod。調度器會將發現的每一個未調度的 Pod 調度到一個合適的 Node 上來運行。 - kube-scheduler 是 Kubernetes 集羣的默認調度器,並且是集羣控制面的一部分。如果你真的希望或者有這方面的需求,kube-scheduler 在設計上是允許你自己寫一個調度組件並替換原有的 kube-scheduler。
- 在做調度決定時需要考慮的因素包括:單獨和整體的資源請求、硬件/軟件/策略限制、親和以及反親和要求、數據局域性、負載間的干擾等等。
- 默認策略可以參考:https://kubernetes.io/zh/docs/concepts/scheduling/kube-scheduler/
- 調度框架:https://kubernetes.io/zh/docs/concepts/scheduling-eviction/scheduling-framework/
根據nodeName調度
- nodeName 是節點選擇約束的最簡單方法,但一般不推薦。如果 nodeName 在PodSpec 中指定了,則它優先於其他的節點選擇方法。
- 使用 nodeName 來選擇節點的一些限制:
- 如果指定的節點不存在。
- 如果指定的節點沒有資源來容納 pod,則pod 調度失敗。
- 雲環境中的節點名稱並非總是可預測或穩定的。是會變的
這三個條件任意一個不滿足就運行不起來。
[root@server2 scheduler]# vim pod.yml
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
# nodeName: server4 /先註釋掉結點名,看它現在那裏運行
[root@server2 scheduler]# kubectl apply -f pod.yml
pod/nginx created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 1/1 Running 0 13s 10.244.141.255 server3 <none> <none>
/ 運行在了server3上,那末如果不指定的話,這個 pod會依然運行在server3上,這是k8s的默認機制
[root@server2 scheduler]# vim pod.yml
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
nodeName: server4 /指定結點名,讓運行在server4撒謊嗯
[root@server2 scheduler]# kubectl delete pod nginx
pod "nginx" deleted
kubec app -f [root@server2 scheduler]# kubectl apply -f pod.yml
kpod/nginx created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 1/1 Running 0 5s 10.244.22.48 server4 <none> <none>
就運行再了server4上
雖然方便,但是會收到各種限制,比如我們的server4上主機只有1個cpu :
vi mpod[root@server2 scheduler]# vim pod.yml
[root@server2 scheduler]# cat pod.yml
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
resources:
requests:
cpu: 2 /需要最低10個cpu
nodeName: server4
[root@server2 scheduler]# kubectl apply -f pod.yml
kubepod/nginx created
[root@server2 scheduler]# kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx 0/1 OutOfcpu 0 2s
就運行不起來了,換作內存也一樣。
根據nodeSelector調度
nodeSelector 是節點選擇約束的最簡單推薦形式。
給選擇的節點添加標籤:
[root@server2 scheduler]# kubectl label nodes server4 disktype=ssd
node/server4 labeled
[root@server2 scheduler]# kubectl get nodes --show-labels
...
server4 Ready <none> 18d v1.18.3 disktype=ssd 標籤出現
[root@server2 scheduler]# vim pod.yml
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
nodeSelector:
disktype: ssd
[root@server2 scheduler]# kubectl apply -f pod.yml
pod/nginx created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 1/1 Running 0 6s 10.244.22.49 server4 <none> <none>
運行在了server4上。
當標籤不存在時:
[root@server2 scheduler]# vim pod.yml
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
nodeSelector:
disktype: sata /這個標籤再任何結點都是不存在的
[root@server2 scheduler]# kubectl delete pod nginx
pod "nginx" deleted
[root@server2 scheduler]# kubectl apply -f pod.yml
kpod/nginx created
[root@server2 scheduler]# kubectlg get pod -o wide
-bash: kubectlg: command not found
[root@server2 scheduler]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 0/1 Pending 0 12s <none> <none> <none> <none>
[root@server2 scheduler]# kubectl describe pod nginx
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling <unknown> default-scheduler 0/3 nodes are available: 3 node(s) didn't match node selector.
Warning FailedScheduling <unknown> default-scheduler 0/3 nodes are available: 3 node(s) didn't match node selector.
無法匹配結點標籤。
這就導致再不小心刪除了標籤後,pod無法運行的情況。那我們如何保證即使標籤不存在了,也能正常運行哪?
結點的親和與反親和
- 親和與反親和
- nodeSelector 提供了一種非常簡單的方法來將 pod 約束到具有特定標籤的節點上。親和/反親和功能極大地擴展了你可以表達約束的類型。
- 你可以發現規則是“軟”/“偏好”,而不是硬性要求,因此,如果調度器無法滿足該要求,仍然調度該 pod
- 你可以使用節點上的 pod 的標籤來
約束
,而不是使用節點本身的標籤,來允許哪些 pod可以
或者不可以被
放置在一起。
結點親和
- 節點親和
-
requiredDuringSchedulingIgnoredDuringExecution 必須滿足
-
preferredDuringSchedulingIgnoredDuringExecution 傾向滿足,意爲不滿足也沒有關係
-
IgnoreDuringExecution 表示如果在Pod運行期間Node的標籤發生變化,導致
親和性策略不能滿足,則繼續運行當前的Pod。 -
參考:https://kubernetes.io/zh/docs/concepts/configuration/assign-pod-node/
-
結點親和性示例:
[root@server2 scheduler]# vim pod.yml
apiVersion: v1
kind: Pod
metadata:
name: node-affinity
spec:
containers:
- name: nginx
image: nginx
affinity: /親和
nodeAffinity: /結點親和
requiredDuringSchedulingIgnoredDuringExecution: /必須滿足
nodeSelectorTerms: /結點選擇術語
- matchExpressions: /匹配表達式
- key: disktype /鍵
operator: In /規則匹配條件,值再以下之中
values: /值
- ssd
nodeaffinity還支持多種規則匹配條件的配置如:
In: | label 的值在列表內 |
NotIn: | label 的值不在列表內 |
Gt: | label 的值大於設置的值,不支持Pod親和性 |
Lt: | label 的值小於設置的值,不支持pod親和性 |
Exists: | 設置的label 存在 |
DoesNotExist: | 設置的 label 不存在 |
[root@server2 scheduler]# kubectl apply -f pod.yml
kubepod/node-affinity created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
node-affinity 1/1 Running 0 5s 10.244.22.50 server4 <none> <none>
運行嗯早server4上,因爲我們的server4有這個標籤,而且我們是強制的。
結點親和性示例2:
[root@server2 scheduler]# kubectl delete -f pod.yml
pod "node-affinity" deleted
vim po [root@server2 scheduler]# vim pod.yml
[root@server2 scheduler]# cat pod.yml
apiVersion: v1
kind: Pod
metadata:
name: node-affinity
spec:
containers:
- name: nginx
image: nginx
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution: /必須滿足的
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: NotIn /值不再下面的列表中
values:
- server4 /即不能調度到server4上
preferredDuringSchedulingIgnoredDuringExecution: /傾向滿足的
- weight: 1 /權重爲1,權重大的優先
preference:
matchExpressions:
- key: disktype
operator: In
values:
- sata /這個標籤再結點中沒有
[root@server2 scheduler]# kubectl apply -f pod.yml
pod/node-affinity created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
node-affinity 1/1 Running 0 7s 10.244.141.193 server3 <none> <none>
就運行在了server3上。我們首先規定不能再server4上運行,然後傾向滿足結點帶有 disktype: sata 的結點,這個標籤不存在,所以仍然運行到了server3的結點上。
pod親和
- pod 親和性和反親和性
- podAffinity 主要解決POD可以和哪些POD部署在同一個拓撲域中的問題(拓撲域用主機標籤實現,可以是單個主機,也可以是多個主機組成的cluster、zone等。)
- podAntiAffinity主要解決POD不能和哪些POD部署在同一個拓撲域中的問題。它們處理的是Kubernetes集羣內部POD和POD之間的關係。
- Pod 間親和與反親和在與更高級別的集合(例如 ReplicaSets,StatefulSets,Deployments 等)一起使用時,它們可能更加有用。可以輕鬆配置一組應位於相同定義拓撲(例如,節點)中的工作負載。
pod親和性示例:
[root@server2 scheduler]# vim pod.yml
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
app: nginx /指定標籤,後面會用到
spec:
containers:
- name: nginx
image: nginx
[root@server2 scheduler]# kubectl apply -f pod.yml
kpod/nginx created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 1/1 Running 0 5s 10.244.141.195 server3 <none> <none>
這個pod運行在server3上
再加一個pod進去.
[root@server2 scheduler]# vim pod.yml
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
app: nginx /pod標籤
spec:
containers:
- name: nginx
image: nginx
---
apiVersion: v1
kind: Pod
metadata:
name: mysql
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:5.7
env:
- name: "MYSQL_ROOT_PASSWORD"
value: "westos"
affinity:
podAffinity: /pod親和
requiredDuringSchedulingIgnoredDuringExecution: /強制要求
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx 和標籤爲 app: nginx的pod運行在一起
topologyKey: kubernetes.io/hostname /定義一個調度域,hostname是結點層面,也可以是集羣層面。
剛在nginx這個pod運行在了server3上,由於親和性,mysql這個pod也會運行在server3上.
[root@server2 scheduler]# kubectl apply -f pod.yml
pod/nginx unchanged
pod/mysql created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mysql 1/1 Running 0 7s 10.244.141.194 server3 <none> <none>
nginx 1/1 Running 0 11m 10.244.141.195 server3 <none> <none>
運行再了server3上。可見mysql這個pod是跟着app: nginx 這個標籤走的,那裏有這個標籤,pod就運行在那裏。
我們添加nodeName選擇,將nginx運行在server4上:
[root@server2 scheduler]# vim pod.yml
spec:
containers:
- name: nginx
image: nginx
nodeName: server4
其它不變
[root@server2 scheduler]# kubectl apply -f pod.yml
pod/nginx created
pod/mysql created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mysql 1/1 Running 0 6s 10.244.22.53 server4 <none> <none>
nginx 1/1 Running 0 6s 10.244.22.52 server4 <none> <none>
mysql也跟着運行在server4上
如果我們改成NotIn哪?
[root@server2 scheduler]# kubectl delete -f pod.yml
pod "nginx" deleted
pod "mysql" deleted
[root@server2 scheduler]# vim pod.yml
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
---
apiVersion: v1
kind: Pod
metadata:
name: mysql
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:5.7
env:
- name: "MYSQL_ROOT_PASSWORD"
value: "westos"
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: NotIn /這裏改爲notin
values:
- nginx /就是步和有 app:nginx這個標籤的pod運行再一起
topologyKey: kubernetes.io/hostname
~
[root@server2 scheduler]# kubectl apply -f pod.yml
pod/nginx created
pod/mysql created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mysql 1/1 Running 0 7s 10.244.22.51 server4 <none> <none>
nginx 1/1 Running 0 7s 10.244.141.197 server3 <none> <none>
果然分開運行了。
pod反親和性示例:
[root@server2 scheduler]# vim pod.yml
[root@server2 scheduler]# cat pod.yml
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
---
apiVersion: v1
kind: Pod
metadata:
name: mysql
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:5.7
env:
- name: "MYSQL_ROOT_PASSWORD"
value: "westos"
affinity:
podAntiAffinity: /值有這裏發生了更改,把podAffinity變成了podAntiAffinity
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: kubernetes.io/hostname
這樣的話這兩個pod就會是相互排斥的,始終不會處於同一個結點內。
[root@server2 scheduler]# kubectl apply -f pod.yml
pod/nginx created
pod/mysql created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mysql 0/1 ContainerCreating 0 4s <none> server4 <none> <none>
nginx 0/1 ContainerCreating 0 4s <none> server3 <none> <none>
場景:再deployment的情況下,比如有三個副本,都佔用80端口的話肯定會起衝突,反親和就解決了這個問題。
污點
-
NodeAffinity節點親和性,是Pod上定義的一種屬性,使Pod能夠按我們的要求調度到某個Node上,而Taints則恰恰相反,它可以讓
Node 拒絕運行Pod
,甚至驅逐Pod。 -
Taints(污點)
是Node的一個屬性,設置了Taints後,所以Kubernetes是不會
將Pod調度到這個Node上的,於是Kubernetes就給Pod設置了個屬性Tolerations(容忍)
,只要Pod能夠容忍Node上的污點,那麼Kubernetes就會忽略Node上的污點,就能夠(不是必須)把Pod調度過去。
[root@server2 scheduler]# kubectl describe nodes server2 |grep Taints
Taints: node-role.kubernetes.io/master:NoSchedule
[root@server2 scheduler]# kubectl describe nodes server3 |grep Taints
Taints: <none>
[root@server2 scheduler]# kubectl describe nodes server4 |grep Taints
Taints: <none>
我們的master結點就是存在污點的,而且沒有設置容忍,所以不參加調度.而server3和4沒有,就可以參加調度。
去掉master結點的taints
:
[root@server2 scheduler]# kubectl taint node server2 node-role.kubernetes.io/master:NoSchedule-
node/server2 untainted
[root@server2 scheduler]# kubectl describe nodes server2 |grep Taints
Taints: <none>
然後,server2就可以參加調度了。
[root@server2 scheduler]# vim deployment.yml
apiVersion: apps/v1
kind: Deployment 加一個deployment控制器
metadata:
name: deployment-v1
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myapp:v1
affinity:
podAntiAffinity: /pod反親和策略
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- myapp
topologyKey: kubernetes.io/hostname
[root@server2 scheduler]# kubectl apply -f deployment.yml
kdeployment.apps/deployment-v1 created
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment-v1-6498765b4b-2l6mc 1/1 Running 0 12s 10.244.22.56 server4 <none> <none>
deployment-v1-6498765b4b-mt5x2 1/1 Running 0 12s 10.244.179.81 server2 <none> <none>
deployment-v1-6498765b4b-scvzq 1/1 Running 0 12s 10.244.141.204 server3 <none> <none>
server2 也參與調度了。由於我們設置的是強制類型的,所以再創建多個副本的時候就會處以等待狀態,因爲反親和策略,不會和有myapp 標籤的pod運行在一個結點。
[root@server2 scheduler]# kubectl get pod
NAME READY STATUS RESTARTS AGE
deployment-v1-6498765b4b-2l6mc 1/1 Running 0 43m
deployment-v1-6498765b4b-lvmr4 0/1 Pending 0 1s
deployment-v1-6498765b4b-mt5x2 1/1 Running 0 43m
deployment-v1-6498765b4b-qjtq2 0/1 Pending 0 1s
deployment-v1-6498765b4b-rn5vj 0/1 Pending 0 1s
deployment-v1-6498765b4b-scvzq 1/1 Running 0 43m
給master結點加上taint
:
[root@server2 scheduler]# kubectl taint node server2 node-role.kubernetes.io/master:NoSchedule
node/server2 tainted
[root@server2 scheduler]# kubectl describe nodes server2 |grep Taint
Taints: node-role.kubernetes.io/master:NoSchedule
- 其中[effect] 可取值: [ NoSchedule | PreferNoSchedule | NoExecute ]
- NoSchedule:POD 不會被調度到標記爲 taints 節點。
- PreferNoSchedule:NoSchedule 的軟策略版本。
- NoExecute:該選項意味着一旦 Taint 生效,如該節點內正在運行的 POD 沒有對應Tolerate (容忍)設置,會直接被逐出。
[root@server2 scheduler]# kubectl apply -f deployment.yml
deployment.apps/deployment-v1 created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment-v1-6498765b4b-9l5s7 0/1 Pending 0 13s <none> <none> <none> <none>
deployment-v1-6498765b4b-jdjvz 1/1 Running 0 13s 10.244.22.57 server4 <none> <none>
deployment-v1-6498765b4b-zzdlq 1/1 Running 0 13s 10.244.141.200 server3 <none> <none>
再次執行我麼發現當前有兩個pod正常運行,分別再server3和4上,由於server2上有污點,並且集羣中,沒有多餘的結點了,所以又一個pod會處於pend(等待)狀態。
master結點添加容忍
- tolerations中定義的key、value、effect,要與node上設置的taint保持一直:
- 如果 operator 是 Exists ,value可以省略。
- 如果 operator 是 Equal ,則key與value之間的關係必須相等。
- 如果不指定operator屬性,則默認值爲Equal。
- 還有兩個特殊值:
- 當不指定key,再配合Exists 就能匹配所有的key與value ,可以容忍所有污點。
- 當不指定effect ,則匹配所有的effect。
[root@server2 scheduler]# vim deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment-v1
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myapp:v1
tolerations:
- operator: "Exists" /存在污點就容忍。//未指定key,配合exists使用
effect: "NoSchedule"
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- myapp
topologyKey: kubernetes.io/hostname
[root@server2 scheduler]# kubectl apply -f deployment.yml
deployment.apps/deployment-v1 created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment-v1-bb99b54bc-5ngtf 1/1 Running 0 9s 10.244.22.58 server4 <none> <none>
deployment-v1-bb99b54bc-l4sf5 1/1 Running 0 9s 10.244.179.82 server2 <none> <none>
deployment-v1-bb99b54bc-t7v9p 1/1 Running 0 9s 10.244.141.205 server3 <none> <none>
添加容忍後,msater結點也可以進行調度了,容忍了污點。
我們剛纔給master加了noschedual的污點效果,我們還可以加另外兩種。
當前三個結點正常運行
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment-v1-bb99b54bc-d2m7c 1/1 Running 0 7s 10.244.22.2 server4 <none> <none>
deployment-v1-bb99b54bc-jfwfp 1/1 Running 0 7s 10.244.179.84 server2 <none> <none>
deployment-v1-bb99b54bc-ntlsr 1/1 Running 0 7s 10.244.141.210 server3 <none> <none>
添加NoExecute(驅離)的污點,此時我們的資源清單中仍然容忍的NoSchedule的效果:
[root@server2 scheduler]# kubectl taint nodes server2 key1=v1:NoExecute
node/server2 tainted
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment-v1-bb99b54bc-2p2zz 0/1 Pending 0 1s <none> <none> <none> <none>
deployment-v1-bb99b54bc-d2m7c 1/1 Running 0 42s 10.244.22.2 server4 <none> <none>
deployment-v1-bb99b54bc-jfwfp 1/1 Terminating 0 42s 10.244.179.84 server2 <none> <none>
deployment-v1-bb99b54bc-ntlsr 1/1 Running 0 42s 10.244.141.210 server3 <none> <none>
我們可以看到server2上的原來的pod已經被驅逐了,deployment控制器自動的重新拉起了一個pod處於pend狀態
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment-v1-bb99b54bc-2p2zz 0/1 Pending 0 5s <none> <none> <none> <none>
deployment-v1-bb99b54bc-d2m7c 1/1 Running 0 46s 10.244.22.2 server4 <none> <none>
deployment-v1-bb99b54bc-ntlsr 1/1 Running 0 46s 10.244.141.210 server3 <none> <none>
我們可以:
[root@server2 scheduler]# vim deployment.yml
tolerations:
- operator: "Exists"
# effect: "NoSchedule" /註釋掉這裏,就可以匹配所有的類型
[root@server2 scheduler]# kubectl apply -f deployment.yml
kubecdeployment.apps/deployment-v1 configured
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment-v1-68679b84bf-ddmlm 0/1 Pending 0 1s <none> <none> <none> <none>
deployment-v1-68679b84bf-qhd7s 1/1 Running 0 3s 10.244.179.85 server2 <none> <none>
deployment-v1-bb99b54bc-d2m7c 1/1 Running 0 11m 10.244.22.2 server4 <none> <none>
deployment-v1-bb99b54bc-ntlsr 1/1 Running 0 11m 10.244.141.210 server3 <none> <none>
server上就運行起來了。
影響pod調度的其它因素
影響Pod調度的指令還有:cordon、drain、delete,後期創建的pod都不會被調度到該節點上,但操作的暴力程度不一樣。
cordon 停止調度
:
影響最小,只會將node
調爲SchedulingDisabled
(停止調度),新創建pod,不會被調度到該節點,節點原有pod不受影響
,仍正常對外提供服務。
[root@server2 scheduler]# kubectl delete -f deployment.yml
deployment.apps "deployment-v1" deleted
[root@server2 scheduler]# kubectl cordon server3
node/server3 cordoned
[root@server2 scheduler]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
server2 Ready master 18d v1.18.3
server3 Ready,SchedulingDisabled <none> 18d v1.18.3 /出現了停止調度的標籤
server4 Ready <none> 18d v1.18.3
我們註釋掉剛纔的容忍和反親和,方便我們再開啓一個pod。容忍的效果包含這個 cordon ,會對實驗有影響.
[root@server2 scheduler]# vim deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment-v1
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myapp:v1
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment-v1-7449b5b68f-cqmkq 0/1 ContainerCreating 0 4s <none> server4 <none> <none>
deployment-v1-7449b5b68f-m442w 1/1 Running 0 4s 10.244.22.3 server4 <none> <none>
deployment-v1-7449b5b68f-rtct6 0/1 ContainerCreating 0 4s <none> server4 <none> <none>
[root@server2 scheduler]# vim deployment.yml
spec:
replicas: 6 /副本數增加至6個
[root@server2 scheduler]# kubectl apply -f deployment.yml
deployment.apps/deployment-v1 configured
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment-v1-7449b5b68f-2smn9 0/1 ContainerCreating 0 2s <none> server4 <none> <none>
deployment-v1-7449b5b68f-9n2w8 0/1 ContainerCreating 0 2s <none> server4 <none> <none>
deployment-v1-7449b5b68f-cqmkq 1/1 Running 0 23s 10.244.22.4 server4 <none> <none>
deployment-v1-7449b5b68f-fgvdf 0/1 ContainerCreating 0 2s <none> server4 <none> <none>
deployment-v1-7449b5b68f-m442w 1/1 Running 0 23s 10.244.22.3 server4 <none> <none>
deployment-v1-7449b5b68f-rtct6 1/1 Running 0 23s 10.244.22.5 server4 <none> <none>
此時我們發現,由於沒有容忍,而server3 處於SchedulingDisabled 的狀態,所有的的pod都運行再了server4上。
drain 驅逐節點:
這種方式是首先驅逐node上的pod,在其他節點重新創建,然後將節點調爲SchedulingDisabled。
不適用,因爲每個結點都有用daemonset控制器控制的pod:
[root@server2 scheduler]# kubectl delete -f deployment.yml
kubec undeployment.apps "deployment-v1" deleted
[root@server2 scheduler]# kubectl uncordon server3 /恢復
node/server3 uncordoned
[root@server2 scheduler]# kubectl apply -f deployment.yml
deployment.apps/deployment-v1 created
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment-v1-7449b5b68f-7glbp 1/1 Running 0 6s 10.244.141.212 server3 <none> <none>
deployment-v1-7449b5b68f-7pt5j 1/1 Running 0 6s 10.244.141.214 server3 <none> <none>
deployment-v1-7449b5b68f-m292z 1/1 Running 0 6s 10.244.22.9 server4 <none> <none>
deployment-v1-7449b5b68f-qdb9x 1/1 Running 0 6s 10.244.141.213 server3 <none> <none>
當前有三個pod都運行在server3上.
[root@server2 scheduler]# kubectl drain server3
node/server3 cordoned
error: unable to drain node "server3", aborting command...
There are pending nodes to be drained:
server3
error: cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/calico-node-smn7b, kube-system/kube-proxy-9nr5k
//提示有daemonset方式的pod,讓我們加 --ignore-daemonsets 參數
[root@server2 scheduler]# kubectl drain server3 --ignore-daemonsets
node/server3 already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-smn7b, kube-system/kube-proxy-9nr5k
。。。
[root@server2 scheduler]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
server2 Ready master 18d v1.18.3
server3 Ready,SchedulingDisabled <none> 18d v1.18.3 /處於disable狀態
server4 Ready <none> 18d v1.18.3
[root@server2 scheduler]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment-v1-7449b5b68f-68bld 1/1 Running 0 18s 10.244.22.10 server4 <none> <none>
deployment-v1-7449b5b68f-9t9v9 1/1 Running 0 18s 10.244.22.12 server4 <none> <none>
deployment-v1-7449b5b68f-m292z 1/1 Running 0 55s 10.244.22.9 server4 <none> <none>
deployment-v1-7449b5b68f-tjh4x 1/1 Running 0 18s 10.244.22.13 server4 <none> <none>
所有的pod都取運行到了server4上。
delete 刪除節點
最暴力的一個,首先用drain
驅逐node上的pod,在其他節點重新創建,然後再用delete
從master節點刪除該node,master失去對其控制,如要恢復調度,需進入node節點,重啓kubelet服務
[root@server2 scheduler]# kubectl delete nodes server3
node "server3" deleted
[root@server2 scheduler]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
server2 Ready master 18d v1.18.3
server4 Ready <none> 18d v1.18.3
//server3上重啓服務
[root@server3 ~]# systemctl restart kubelet.service
[root@server2 scheduler]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
server2 Ready master 18d v1.18.3
server3 NotReady <none> 0s v1.18.3 /沒準備好,再啓動一些pod
server4 Ready <none> 18d v1.18.3
[root@server2 scheduler]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
server2 Ready master 18d v1.18.3
server3 Ready <none> 48s v1.18.3 /好了
server4 Ready <none> 18d v1.18.3
如果我們像把這個結點徹底從集羣中分離出來,我們就可以再結點上執行 kubeadm reset
清除所有信息。