kubernetes -- 結點調度控制的幾種方式

簡介

  • 調度器通過 kubernetes 的 watch 機制來發現集羣中新創建且尚未被調度到 Node上的 Pod。調度器會將發現的每一個未調度的 Pod 調度到一個合適的 Node 上來運行。
  • kube-scheduler 是 Kubernetes 集羣的默認調度器,並且是集羣控制面的一部分。如果你真的希望或者有這方面的需求,kube-scheduler 在設計上是允許你自己寫一個調度組件並替換原有的 kube-scheduler。
  • 在做調度決定時需要考慮的因素包括:單獨和整體的資源請求、硬件/軟件/策略限制、親和以及反親和要求、數據局域性、負載間的干擾等等。
  • 默認策略可以參考:https://kubernetes.io/zh/docs/concepts/scheduling/kube-scheduler/
  • 調度框架:https://kubernetes.io/zh/docs/concepts/scheduling-eviction/scheduling-framework/

根據nodeName調度

  • nodeName 是節點選擇約束的最簡單方法,但一般不推薦。如果 nodeName 在PodSpec 中指定了,則它優先於其他的節點選擇方法。
  • 使用 nodeName 來選擇節點的一些限制:
    • 如果指定的節點不存在。
    • 如果指定的節點沒有資源來容納 pod,則pod 調度失敗。
    • 雲環境中的節點名稱並非總是可預測或穩定的。是會變的

這三個條件任意一個不滿足就運行不起來。

[root@server2 scheduler]# vim pod.yml 

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
#  nodeName: server4			/先註釋掉結點名,看它現在那裏運行
[root@server2 scheduler]# kubectl apply -f pod.yml 
pod/nginx created

[root@server2 scheduler]# kubectl get pod -owide
NAME    READY   STATUS    RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
nginx   1/1     Running   0          13s   10.244.141.255   server3   <none>           <none>
/ 運行在了server3上,那末如果不指定的話,這個 pod會依然運行在server3上,這是k8s的默認機制
[root@server2 scheduler]# vim pod.yml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
  nodeName: server4			/指定結點名,讓運行在server4撒謊嗯

[root@server2 scheduler]# kubectl delete pod nginx 
pod "nginx" deleted
kubec	app	-f	[root@server2 scheduler]# kubectl apply -f pod.yml 
kpod/nginx created
[root@server2 scheduler]# kubectl get pod -owide
NAME    READY   STATUS    RESTARTS   AGE   IP             NODE      NOMINATED NODE   READINESS GATES
nginx   1/1     Running   0          5s    10.244.22.48   server4   <none>           <none>
就運行再了server4上

雖然方便,但是會收到各種限制,比如我們的server4上主機只有1個cpu :

vi mpod[root@server2 scheduler]# vim pod.yml 
[root@server2 scheduler]# cat pod.yml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
    resources:
      requests:
        cpu: 2			/需要最低10個cpu
  nodeName: server4

[root@server2 scheduler]# kubectl apply -f pod.yml 
kubepod/nginx created
[root@server2 scheduler]# kubectl get pod
NAME    READY   STATUS     RESTARTS   AGE
nginx   0/1     OutOfcpu   0          2s

就運行不起來了,換作內存也一樣。

根據nodeSelector調度

nodeSelector 是節點選擇約束的最簡單推薦形式。

給選擇的節點添加標籤:

[root@server2 scheduler]# kubectl label nodes server4 disktype=ssd
node/server4 labeled
[root@server2 scheduler]# kubectl get nodes --show-labels 
...
server4   Ready    <none>   18d   v1.18.3   disktype=ssd		標籤出現

[root@server2 scheduler]# vim pod.yml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
  nodeSelector:
    disktype: ssd
[root@server2 scheduler]# kubectl apply -f pod.yml 
pod/nginx created
[root@server2 scheduler]# kubectl get pod -owide
NAME    READY   STATUS    RESTARTS   AGE   IP             NODE      NOMINATED NODE   READINESS GATES
nginx   1/1     Running   0          6s    10.244.22.49   server4   <none>           <none>

運行在了server4上。

當標籤不存在時:

[root@server2 scheduler]# vim pod.yml 

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
  nodeSelector:
    disktype: sata			/這個標籤再任何結點都是不存在的
[root@server2 scheduler]# kubectl delete pod nginx 
pod "nginx" deleted
[root@server2 scheduler]# kubectl apply -f pod.yml 
kpod/nginx created
[root@server2 scheduler]# kubectlg get pod -o wide
-bash: kubectlg: command not found
[root@server2 scheduler]# kubectl get pod -o wide
NAME    READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
nginx   0/1     Pending   0          12s   <none>   <none>   <none>           <none>
[root@server2 scheduler]# kubectl describe pod nginx 

Events:
  Type     Reason            Age        From               Message
  ----     ------            ----       ----               -------
  Warning  FailedScheduling  <unknown>  default-scheduler  0/3 nodes are available: 3 node(s) didn't match node selector.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/3 nodes are available: 3 node(s) didn't match node selector.

無法匹配結點標籤。

這就導致再不小心刪除了標籤後,pod無法運行的情況。那我們如何保證即使標籤不存在了,也能正常運行哪?

結點的親和與反親和

  • 親和與反親和
    • nodeSelector 提供了一種非常簡單的方法來將 pod 約束到具有特定標籤的節點上。親和/反親和功能極大地擴展了你可以表達約束的類型。
    • 你可以發現規則是“軟”/“偏好”,而不是硬性要求,因此,如果調度器無法滿足該要求,仍然調度該 pod
    • 你可以使用節點上的 pod 的標籤來約束,而不是使用節點本身的標籤,來允許哪些 pod 可以或者不可以被放置在一起。

結點親和

  • 節點親和
    • requiredDuringSchedulingIgnoredDuringExecution 必須滿足

    • preferredDuringSchedulingIgnoredDuringExecution 傾向滿足,意爲不滿足也沒有關係

    • IgnoreDuringExecution 表示如果在Pod運行期間Node的標籤發生變化,導致
      親和性策略不能滿足,則繼續運行當前的Pod。

    • 參考:https://kubernetes.io/zh/docs/concepts/configuration/assign-pod-node/

結點親和性示例:

[root@server2 scheduler]# vim pod.yml 
apiVersion: v1
kind: Pod
metadata:
  name: node-affinity
spec:
  containers:
  - name: nginx
    image: nginx
  affinity:			/親和
    nodeAffinity:			/結點親和
      requiredDuringSchedulingIgnoredDuringExecution:			/必須滿足
        nodeSelectorTerms:		/結點選擇術語
        - matchExpressions:			/匹配表達式
          - key: disktype			/鍵
            operator: In			/規則匹配條件,值再以下之中
            values:			/值
            - ssd

nodeaffinity還支持多種規則匹配條件的配置如:

In: label 的值在列表內
NotIn: label 的值不在列表內
Gt: label 的值大於設置的值,不支持Pod親和性
Lt: label 的值小於設置的值,不支持pod親和性
Exists: 設置的label 存在
DoesNotExist: 設置的 label 不存在
[root@server2 scheduler]# kubectl apply -f pod.yml 
kubepod/node-affinity created
[root@server2 scheduler]# kubectl get pod -owide
NAME            READY   STATUS    RESTARTS   AGE   IP             NODE      NOMINATED NODE   READINESS GATES
node-affinity   1/1     Running   0          5s    10.244.22.50   server4   <none>           <none>

運行嗯早server4上,因爲我們的server4有這個標籤,而且我們是強制的。

結點親和性示例2:

[root@server2 scheduler]# kubectl delete -f pod.yml 
pod "node-affinity" deleted
vim po	[root@server2 scheduler]# vim pod.yml 
[root@server2 scheduler]# cat pod.yml 
apiVersion: v1
kind: Pod
metadata:
  name: node-affinity
spec:
  containers:
  - name: nginx
    image: nginx
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:			/必須滿足的
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: NotIn			/值不再下面的列表中
            values:	
            - server4				/即不能調度到server4上
      preferredDuringSchedulingIgnoredDuringExecution:		/傾向滿足的
      - weight: 1		/權重爲1,權重大的優先
        preference:
          matchExpressions:
          - key: disktype
            operator: In
            values:
            - sata			/這個標籤再結點中沒有
[root@server2 scheduler]# kubectl apply -f pod.yml 
pod/node-affinity created
[root@server2 scheduler]# kubectl get pod -owide
NAME            READY   STATUS    RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
node-affinity   1/1     Running   0          7s    10.244.141.193   server3   <none>           <none>

就運行在了server3上。我們首先規定不能再server4上運行,然後傾向滿足結點帶有 disktype: sata 的結點,這個標籤不存在,所以仍然運行到了server3的結點上。

pod親和

  • pod 親和性和反親和性
    • podAffinity 主要解決POD可以和哪些POD部署在同一個拓撲域中的問題(拓撲域用主機標籤實現,可以是單個主機,也可以是多個主機組成的cluster、zone等。)
    • podAntiAffinity主要解決POD不能和哪些POD部署在同一個拓撲域中的問題。它們處理的是Kubernetes集羣內部POD和POD之間的關係。
    • Pod 間親和與反親和在與更高級別的集合(例如 ReplicaSets,StatefulSets,Deployments 等)一起使用時,它們可能更加有用。可以輕鬆配置一組應位於相同定義拓撲(例如,節點)中的工作負載。

pod親和性示例:

[root@server2 scheduler]# vim pod.yml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    app: nginx			/指定標籤,後面會用到
spec:
  containers:
  - name: nginx
    image: nginx
[root@server2 scheduler]# kubectl apply -f pod.yml 
kpod/nginx created
[root@server2 scheduler]# kubectl get pod -owide
NAME    READY   STATUS    RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
nginx   1/1     Running   0          5s    10.244.141.195   server3   <none>           <none>
這個pod運行在server3上

再加一個pod進去.

[root@server2 scheduler]# vim pod.yml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    app: nginx				/pod標籤
spec:
  containers:
  - name: nginx
    image: nginx

---
apiVersion: v1
kind: Pod
metadata:
  name: mysql
  labels:
    app: mysql
spec:
  containers:
  - name: mysql
    image: mysql:5.7
    env:
    - name: "MYSQL_ROOT_PASSWORD"
      value: "westos"
  affinity:
    podAffinity:			/pod親和
      requiredDuringSchedulingIgnoredDuringExecution:			/強制要求
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - nginx				和標籤爲 app: nginx的pod運行在一起
        topologyKey: kubernetes.io/hostname			/定義一個調度域,hostname是結點層面,也可以是集羣層面。

剛在nginx這個pod運行在了server3上,由於親和性,mysql這個pod也會運行在server3上.

[root@server2 scheduler]# kubectl apply -f pod.yml 
pod/nginx unchanged
pod/mysql created
[root@server2 scheduler]# kubectl get pod -owide
NAME    READY   STATUS    RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
mysql   1/1     Running   0          7s    10.244.141.194   server3   <none>           <none>
nginx   1/1     Running   0          11m   10.244.141.195   server3   <none>           <none>

運行再了server3上。可見mysql這個pod是跟着app: nginx 這個標籤走的,那裏有這個標籤,pod就運行在那裏。
我們添加nodeName選擇,將nginx運行在server4上:

[root@server2 scheduler]# vim pod.yml 
spec:
  containers:
  - name: nginx
    image: nginx
  nodeName: server4
其它不變
[root@server2 scheduler]# kubectl apply -f pod.yml 
pod/nginx created
pod/mysql created
[root@server2 scheduler]# kubectl get pod -owide
NAME    READY   STATUS    RESTARTS   AGE   IP             NODE      NOMINATED NODE   READINESS GATES
mysql   1/1     Running   0          6s    10.244.22.53   server4   <none>           <none>
nginx   1/1     Running   0          6s    10.244.22.52   server4   <none>           <none>

mysql也跟着運行在server4上

如果我們改成NotIn哪?

[root@server2 scheduler]# kubectl delete -f pod.yml 
pod "nginx" deleted
pod "mysql" deleted
[root@server2 scheduler]# vim pod.yml
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  containers:
  - name: nginx
    image: nginx

---
apiVersion: v1
kind: Pod
metadata:
  name: mysql
  labels:
    app: mysql
spec:
  containers:
  - name: mysql
    image: mysql:5.7
    env:
    - name: "MYSQL_ROOT_PASSWORD"
      value: "westos"
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: NotIn				/這裏改爲notin
            values:
            - nginx		/就是步和有 app:nginx這個標籤的pod運行再一起
        topologyKey: kubernetes.io/hostname
~                                             
[root@server2 scheduler]# kubectl apply -f pod.yml 
pod/nginx created
pod/mysql created
[root@server2 scheduler]# kubectl get pod -owide
NAME    READY   STATUS    RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
mysql   1/1     Running   0          7s    10.244.22.51     server4   <none>           <none>
nginx   1/1     Running   0          7s    10.244.141.197   server3   <none>           <none>

果然分開運行了。
pod反親和性示例:

[root@server2 scheduler]# vim pod.yml 
[root@server2 scheduler]# cat pod.yml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  containers:
  - name: nginx
    image: nginx

---
apiVersion: v1
kind: Pod
metadata:
  name: mysql
  labels:
    app: mysql
spec:
  containers:
  - name: mysql
    image: mysql:5.7
    env:
    - name: "MYSQL_ROOT_PASSWORD"
      value: "westos"
  affinity:
    podAntiAffinity:			/值有這裏發生了更改,把podAffinity變成了podAntiAffinity
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - nginx
        topologyKey: kubernetes.io/hostname

這樣的話這兩個pod就會是相互排斥的,始終不會處於同一個結點內。

[root@server2 scheduler]# kubectl apply -f pod.yml 
pod/nginx created
pod/mysql created
[root@server2 scheduler]# kubectl get pod -owide
NAME    READY   STATUS              RESTARTS   AGE   IP       NODE      NOMINATED NODE   READINESS GATES
mysql   0/1     ContainerCreating   0          4s    <none>   server4   <none>           <none>
nginx   0/1     ContainerCreating   0          4s    <none>   server3   <none>           <none>

場景:再deployment的情況下,比如有三個副本,都佔用80端口的話肯定會起衝突,反親和就解決了這個問題。

污點

  • NodeAffinity節點親和性,是Pod上定義的一種屬性,使Pod能夠按我們的要求調度到某個Node上,而Taints則恰恰相反,它可以讓Node 拒絕運行Pod,甚至驅逐Pod。

  • Taints(污點)是Node的一個屬性,設置了Taints後,所以Kubernetes是不會將Pod調度到這個Node上的,於是Kubernetes就給Pod設置了個屬性Tolerations(容忍),只要Pod能夠容忍Node上的污點,那麼Kubernetes就會忽略Node上的污點,就能夠(不是必須)把Pod調度過去。

[root@server2 scheduler]# kubectl describe nodes server2 |grep Taints
Taints:             node-role.kubernetes.io/master:NoSchedule
[root@server2 scheduler]# kubectl describe nodes server3 |grep Taints
Taints:             <none>
[root@server2 scheduler]# kubectl describe nodes server4 |grep Taints
Taints:             <none>

我們的master結點就是存在污點的,而且沒有設置容忍,所以不參加調度.而server3和4沒有,就可以參加調度。

去掉master結點的taints

[root@server2 scheduler]# kubectl taint node server2 node-role.kubernetes.io/master:NoSchedule-
node/server2 untainted
[root@server2 scheduler]# kubectl describe nodes server2 |grep Taints
Taints:             <none>

然後,server2就可以參加調度了。

[root@server2 scheduler]# vim deployment.yml 
apiVersion: apps/v1
kind: Deployment			加一個deployment控制器
metadata:
  name: deployment-v1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: myapp:v1
      affinity:
        podAntiAffinity:				/pod反親和策略
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - myapp
            topologyKey: kubernetes.io/hostname
 
[root@server2 scheduler]# kubectl apply -f deployment.yml 
kdeployment.apps/deployment-v1 created
NAME                             READY   STATUS    RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
deployment-v1-6498765b4b-2l6mc   1/1     Running   0          12s   10.244.22.56     server4   <none>           <none>
deployment-v1-6498765b4b-mt5x2   1/1     Running   0          12s   10.244.179.81    server2   <none>           <none>
deployment-v1-6498765b4b-scvzq   1/1     Running   0          12s   10.244.141.204   server3   <none>           <none>

server2 也參與調度了。由於我們設置的是強制類型的,所以再創建多個副本的時候就會處以等待狀態,因爲反親和策略,不會和有myapp 標籤的pod運行在一個結點。

[root@server2 scheduler]# kubectl get pod
NAME                             READY   STATUS    RESTARTS   AGE
deployment-v1-6498765b4b-2l6mc   1/1     Running   0          43m
deployment-v1-6498765b4b-lvmr4   0/1     Pending   0          1s
deployment-v1-6498765b4b-mt5x2   1/1     Running   0          43m
deployment-v1-6498765b4b-qjtq2   0/1     Pending   0          1s
deployment-v1-6498765b4b-rn5vj   0/1     Pending   0          1s
deployment-v1-6498765b4b-scvzq   1/1     Running   0          43m

給master結點加上taint

[root@server2 scheduler]# kubectl taint node server2 node-role.kubernetes.io/master:NoSchedule
node/server2 tainted

[root@server2 scheduler]# kubectl describe nodes server2 |grep Taint
Taints:             node-role.kubernetes.io/master:NoSchedule
  • 其中[effect] 可取值: [ NoSchedule | PreferNoSchedule | NoExecute ]
    • NoSchedule:POD 不會被調度到標記爲 taints 節點。
    • PreferNoSchedule:NoSchedule 的軟策略版本。
    • NoExecute:該選項意味着一旦 Taint 生效,如該節點內正在運行的 POD 沒有對應Tolerate (容忍)設置,會直接被逐出。
[root@server2 scheduler]# kubectl apply -f deployment.yml 
deployment.apps/deployment-v1 created

[root@server2 scheduler]# kubectl get pod -owide
NAME                             READY   STATUS    RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
deployment-v1-6498765b4b-9l5s7   0/1     Pending   0          13s   <none>           <none>    <none>           <none>
deployment-v1-6498765b4b-jdjvz   1/1     Running   0          13s   10.244.22.57     server4   <none>           <none>
deployment-v1-6498765b4b-zzdlq   1/1     Running   0          13s   10.244.141.200   server3   <none>           <none>

再次執行我麼發現當前有兩個pod正常運行,分別再server3和4上,由於server2上有污點,並且集羣中,沒有多餘的結點了,所以又一個pod會處於pend(等待)狀態。
master結點添加容忍

  • tolerations中定義的key、value、effect,要與node上設置的taint保持一直:
    • 如果 operator 是 Exists ,value可以省略。
    • 如果 operator 是 Equal ,則key與value之間的關係必須相等。
    • 如果不指定operator屬性,則默認值爲Equal。
  • 還有兩個特殊值:
    • 當不指定key,再配合Exists 就能匹配所有的key與value ,可以容忍所有污點。
    • 當不指定effect ,則匹配所有的effect。
[root@server2 scheduler]# vim deployment.yml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment-v1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: myapp:v1
      tolerations:
      - operator: "Exists"		/存在污點就容忍。//未指定key,配合exists使用
        effect: "NoSchedule"
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - myapp
            topologyKey: kubernetes.io/hostname
[root@server2 scheduler]# kubectl apply -f deployment.yml 
deployment.apps/deployment-v1 created
[root@server2 scheduler]# kubectl get pod -owide
NAME                            READY   STATUS    RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
deployment-v1-bb99b54bc-5ngtf   1/1     Running   0          9s    10.244.22.58     server4   <none>           <none>
deployment-v1-bb99b54bc-l4sf5   1/1     Running   0          9s    10.244.179.82    server2   <none>           <none>
deployment-v1-bb99b54bc-t7v9p   1/1     Running   0          9s    10.244.141.205   server3   <none>           <none>

添加容忍後,msater結點也可以進行調度了,容忍了污點。

我們剛纔給master加了noschedual的污點效果,我們還可以加另外兩種。

當前三個結點正常運行
[root@server2 scheduler]# kubectl get pod -owide
NAME                            READY   STATUS    RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
deployment-v1-bb99b54bc-d2m7c   1/1     Running   0          7s    10.244.22.2      server4   <none>           <none>
deployment-v1-bb99b54bc-jfwfp   1/1     Running   0          7s    10.244.179.84    server2   <none>           <none>
deployment-v1-bb99b54bc-ntlsr   1/1     Running   0          7s    10.244.141.210   server3   <none>           <none>

添加NoExecute(驅離)的污點,此時我們的資源清單中仍然容忍的NoSchedule的效果:
[root@server2 scheduler]# kubectl taint nodes server2 key1=v1:NoExecute
node/server2 tainted
[root@server2 scheduler]# kubectl get pod -owide
NAME                            READY   STATUS        RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
deployment-v1-bb99b54bc-2p2zz   0/1     Pending       0          1s    <none>           <none>    <none>           <none>
deployment-v1-bb99b54bc-d2m7c   1/1     Running       0          42s   10.244.22.2      server4   <none>           <none>
deployment-v1-bb99b54bc-jfwfp   1/1     Terminating   0          42s   10.244.179.84    server2   <none>           <none>
deployment-v1-bb99b54bc-ntlsr   1/1     Running       0          42s   10.244.141.210   server3   <none>           <none>
我們可以看到server2上的原來的pod已經被驅逐了,deployment控制器自動的重新拉起了一個pod處於pend狀態

[root@server2 scheduler]# kubectl get pod -owide
NAME                            READY   STATUS    RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
deployment-v1-bb99b54bc-2p2zz   0/1     Pending   0          5s    <none>           <none>    <none>           <none>
deployment-v1-bb99b54bc-d2m7c   1/1     Running   0          46s   10.244.22.2      server4   <none>           <none>
deployment-v1-bb99b54bc-ntlsr   1/1     Running   0          46s   10.244.141.210   server3   <none>           <none>

我們可以:

[root@server2 scheduler]# vim deployment.yml 
      tolerations:
      - operator: "Exists"
      #  effect: "NoSchedule"		/註釋掉這裏,就可以匹配所有的類型

[root@server2 scheduler]# kubectl apply -f deployment.yml 
kubecdeployment.apps/deployment-v1 configured
[root@server2 scheduler]# kubectl get pod -owide
NAME                             READY   STATUS    RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
deployment-v1-68679b84bf-ddmlm   0/1     Pending   0          1s    <none>           <none>    <none>           <none>
deployment-v1-68679b84bf-qhd7s   1/1     Running   0          3s    10.244.179.85    server2   <none>           <none>
deployment-v1-bb99b54bc-d2m7c    1/1     Running   0          11m   10.244.22.2      server4   <none>           <none>
deployment-v1-bb99b54bc-ntlsr    1/1     Running   0          11m   10.244.141.210   server3   <none>           <none>

server上就運行起來了。

影響pod調度的其它因素

影響Pod調度的指令還有:cordon、drain、delete,後期創建的pod都不會被調度到該節點上,但操作的暴力程度不一樣。

  • cordon 停止調度:

影響最小,只會將node調爲SchedulingDisabled(停止調度),新創建pod,不會被調度到該節點,節點原有pod不受影響,仍正常對外提供服務。

[root@server2 scheduler]# kubectl delete -f deployment.yml 
deployment.apps "deployment-v1" deleted

[root@server2 scheduler]# kubectl cordon server3
node/server3 cordoned
[root@server2 scheduler]# kubectl get nodes
NAME      STATUS                     ROLES    AGE   VERSION
server2   Ready                      master   18d   v1.18.3
server3   Ready,SchedulingDisabled   <none>   18d   v1.18.3		/出現了停止調度的標籤
server4   Ready                      <none>   18d   v1.18.3

我們註釋掉剛纔的容忍和反親和,方便我們再開啓一個pod。容忍的效果包含這個 cordon ,會對實驗有影響.

[root@server2 scheduler]# vim deployment.yml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment-v1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: myapp:v1

[root@server2 scheduler]# kubectl get pod -owide
NAME                             READY   STATUS              RESTARTS   AGE   IP            NODE      NOMINATED NODE   READINESS GATES
deployment-v1-7449b5b68f-cqmkq   0/1     ContainerCreating   0          4s    <none>        server4   <none>           <none>
deployment-v1-7449b5b68f-m442w   1/1     Running             0          4s    10.244.22.3   server4   <none>           <none>
deployment-v1-7449b5b68f-rtct6   0/1     ContainerCreating   0          4s    <none>        server4   <none>           <none>
[root@server2 scheduler]# vim deployment.yml 
spec:
  replicas: 6			/副本數增加至6個
[root@server2 scheduler]# kubectl apply -f deployment.yml 
deployment.apps/deployment-v1 configured
[root@server2 scheduler]# kubectl get pod -owide
NAME                             READY   STATUS              RESTARTS   AGE   IP            NODE      NOMINATED NODE   READINESS GATES
deployment-v1-7449b5b68f-2smn9   0/1     ContainerCreating   0          2s    <none>        server4   <none>           <none>
deployment-v1-7449b5b68f-9n2w8   0/1     ContainerCreating   0          2s    <none>        server4   <none>           <none>
deployment-v1-7449b5b68f-cqmkq   1/1     Running             0          23s   10.244.22.4   server4   <none>           <none>
deployment-v1-7449b5b68f-fgvdf   0/1     ContainerCreating   0          2s    <none>        server4   <none>           <none>
deployment-v1-7449b5b68f-m442w   1/1     Running             0          23s   10.244.22.3   server4   <none>           <none>
deployment-v1-7449b5b68f-rtct6   1/1     Running             0          23s   10.244.22.5   server4   <none>           <none>

此時我們發現,由於沒有容忍,而server3 處於SchedulingDisabled 的狀態,所有的的pod都運行再了server4上。

  • drain 驅逐節點:
    這種方式是首先驅逐node上的pod,在其他節點重新創建,然後將節點調爲SchedulingDisabled。
    不適用,因爲每個結點都有用daemonset控制器控制的pod:
[root@server2 scheduler]# kubectl delete -f deployment.yml 
kubec	undeployment.apps "deployment-v1" deleted
[root@server2 scheduler]# kubectl uncordon server3			/恢復
node/server3 uncordoned
[root@server2 scheduler]# kubectl apply -f  deployment.yml 
deployment.apps/deployment-v1 created
[root@server2 scheduler]# kubectl get pod -owide
NAME                             READY   STATUS    RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
deployment-v1-7449b5b68f-7glbp   1/1     Running   0          6s    10.244.141.212   server3   <none>           <none>
deployment-v1-7449b5b68f-7pt5j   1/1     Running   0          6s    10.244.141.214   server3   <none>           <none>
deployment-v1-7449b5b68f-m292z   1/1     Running   0          6s    10.244.22.9      server4   <none>           <none>
deployment-v1-7449b5b68f-qdb9x   1/1     Running   0          6s    10.244.141.213   server3   <none>           <none>
當前有三個pod都運行在server3上.

[root@server2 scheduler]# kubectl drain server3
node/server3 cordoned
error: unable to drain node "server3", aborting command...

There are pending nodes to be drained:
 server3
error: cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/calico-node-smn7b, kube-system/kube-proxy-9nr5k	
//提示有daemonset方式的pod,讓我們加 --ignore-daemonsets  參數
[root@server2 scheduler]# kubectl drain server3 --ignore-daemonsets
node/server3 already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-smn7b, kube-system/kube-proxy-9nr5k
。。。

[root@server2 scheduler]# kubectl get nodes
NAME      STATUS                     ROLES    AGE   VERSION
server2   Ready                      master   18d   v1.18.3		
server3   Ready,SchedulingDisabled   <none>   18d   v1.18.3		/處於disable狀態
server4   Ready                      <none>   18d   v1.18.3
[root@server2 scheduler]# kubectl get pod -owide
NAME                             READY   STATUS    RESTARTS   AGE   IP             NODE      NOMINATED NODE   READINESS GATES
deployment-v1-7449b5b68f-68bld   1/1     Running   0          18s   10.244.22.10   server4   <none>           <none>
deployment-v1-7449b5b68f-9t9v9   1/1     Running   0          18s   10.244.22.12   server4   <none>           <none>
deployment-v1-7449b5b68f-m292z   1/1     Running   0          55s   10.244.22.9    server4   <none>           <none>
deployment-v1-7449b5b68f-tjh4x   1/1     Running   0          18s   10.244.22.13   server4   <none>           <none>

所有的pod都取運行到了server4上。
  • delete 刪除節點

最暴力的一個,首先用drain 驅逐node上的pod,在其他節點重新創建,然後再用delete從master節點刪除該node,master失去對其控制,如要恢復調度,需進入node節點,重啓kubelet服務

[root@server2 scheduler]# kubectl delete nodes server3
node "server3" deleted
[root@server2 scheduler]# kubectl get nodes
NAME      STATUS   ROLES    AGE   VERSION
server2   Ready    master   18d   v1.18.3
server4   Ready    <none>   18d   v1.18.3

//server3上重啓服務
[root@server3 ~]# systemctl restart kubelet.service 

[root@server2 scheduler]# kubectl get nodes
NAME      STATUS     ROLES    AGE   VERSION
server2   Ready      master   18d   v1.18.3
server3   NotReady   <none>   0s    v1.18.3			/沒準備好,再啓動一些pod
server4   Ready      <none>   18d   v1.18.3

[root@server2 scheduler]# kubectl get nodes
NAME      STATUS   ROLES    AGE   VERSION
server2   Ready    master   18d   v1.18.3
server3   Ready    <none>   48s   v1.18.3		/好了
server4   Ready    <none>   18d   v1.18.3

如果我們像把這個結點徹底從集羣中分離出來,我們就可以再結點上執行 kubeadm reset清除所有信息。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章