k8s基本概念-配置調度策略之(Taints-and-Tolerations)

k8s基本概念-配置調度策略之(Taints-and-Tolerations)

2018/4/12

通過定義 Taints and Tolerations 來達到 node 排斥 pod 的目的

  • 通過一個典型實例來描述 taint 和 toleration 之間的關聯
    • 測試前的集羣狀態
    • 部署app whoami-t1
    • 測試 taint 的用法
    • 測試結果
    • 測試使用 toleration
    • 測試結果
    • 如何移除指定的 taint 呢?
  • 聊一聊 Taints and Tolerations 的細節
    • 概念

通過一個典型實例來描述 taint 和 toleration 之間的關聯

測試前的集羣狀態

部署集羣的時候,你極可能有留意到,集羣中設置爲 master 角色的節點,是不會有任務調度到這裏來執行的,這是爲何呢?

[root@tvm-02 whoami]# kubectl get nodes
NAME     STATUS    ROLES     AGE       VERSION
tvm-01   Ready     master    8d        v1.9.0
tvm-02   Ready     master    8d        v1.9.0
tvm-03   Ready     master    8d        v1.9.0
tvm-04   Ready     <none>    8d        v1.9.0
[root@tvm-02 whoami]# kubectl describe nodes tvm-01 |grep -E '(Roles|Taints)'
Roles:              master
Taints:             node-role.kubernetes.io/master:NoSchedule
[root@tvm-02 whoami]# kubectl describe nodes tvm-02 |grep -E '(Roles|Taints)'
Roles:              master
Taints:             node-role.kubernetes.io/master:NoSchedule
[root@tvm-02 whoami]# kubectl describe nodes tvm-03 |grep -E '(Roles|Taints)'
Roles:              master
Taints:             node-role.kubernetes.io/master:NoSchedule
部署app whoami-t1
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
  name: whoami-t1
  labels:
    app: whoami
spec:
  replicas: 3
  selector:
    matchLabels:
      app: whoami
  template:
    metadata:
      labels:
        app: whoami
    spec:
      containers:
        - name: whoami
          image: opera443399/whoami:0.9
          ports:
            - containerPort: 80

部署後可以發現,所有任務都被調度到 worker 節點上


[root@tvm-02 whoami]# kubectl apply -f app-t1.yaml
deployment "whoami-t1" created
[root@tvm-02 whoami]# kubectl get ds,deploy,svc,pods --all-namespaces -o wide -l app=whoami
NAMESPACE   NAME               DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE       CONTAINERS   IMAGES                   SELECTOR
default     deploy/whoami-t1   3         3         3            3           46s       whoami       opera443399/whoami:0.9   app=whoami

NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
default po/whoami-t1-6cf9cd6bf4-62bhc 1/1 Running 0 46s 172.30.105.1 tvm-04
default po/whoami-t1-6cf9cd6bf4-dss72 1/1 Running 0 46s 172.30.105.2 tvm-04
default po/whoami-t1-6cf9cd6bf4-zvpsk 1/1 Running 0 46s 172.30.105.0 tvm-04


##### 測試 taint 的用法
給 tvm-04 配置一個 taint 來調整調度策略
```bash
[root@tvm-02 whoami]# kubectl taint nodes tvm-04 node-role.kubernetes.io/master=:NoSchedule
node "tvm-04" tainted
##### 符合預期
[root@tvm-02 whoami]# kubectl describe nodes tvm-04 |grep -E '(Roles|Taints)'
Roles:              <none>
Taints:             node-role.kubernetes.io/master:NoSchedule

上述 taint 的指令含義是:
給節點 tvm-04 配置一個 taint (可以理解爲:污點)
其中,這個 taint 的
key 是 node-role.kubernetes.io/master
value 是 ` (值爲空)<br/>taint effect 是NoSchedule<br/>這意味着,沒有任何 pod 可以調度到這個節點上面,除非在這個 pod 的描述文件中有一個對應的toleration` (可以理解爲:設置 pod 容忍了這個污點)

測試結果

我們發現,之前部署的 deploy/whoami-t1 並未被驅逐

[root@tvm-02 whoami]# kubectl get ds,deploy,svc,pods --all-namespaces -o wide -l app=whoami
NAMESPACE   NAME               DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE       CONTAINERS   IMAGES                   SELECTOR
default     deploy/whoami-t1   3         3         3            3           17m       whoami       opera443399/whoami:0.9   app=whoami

NAMESPACE   NAME                            READY     STATUS    RESTARTS   AGE       IP             NODE
default     po/whoami-t1-6cf9cd6bf4-62bhc   1/1       Running   0          17m       172.30.105.1   tvm-04
default     po/whoami-t1-6cf9cd6bf4-dss72   1/1       Running   0          17m       172.30.105.2   tvm-04
default     po/whoami-t1-6cf9cd6bf4-zvpsk   1/1       Running   0          17m       172.30.105.0   tvm-04

接着我們嘗試着再部署一個app whoami-t2

apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
  name: whoami-t2
  labels:
    app: whoami
spec:
  replicas: 3
  selector:
    matchLabels:
      app: whoami
  template:
    metadata:
      labels:
        app: whoami
    spec:
      containers:
        - name: whoami
          image: opera443399/whoami:0.9
          ports:
            - containerPort: 80
      tolerations:
      - key: "node-role.kubernetes.io/master"
        operator: "Equal"
        value: ""
        effect: "NoSched

下述操作表明:策略已經生效,只是舊的 deploy 默認不會受到影響(被強制驅逐)

##### 部署
[root@tvm-02 whoami]# kubectl apply -f app-t2.yaml
deployment "whoami-t2" created
[root@tvm-02 whoami]# kubectl get ds,deploy,svc,pods --all-namespaces -o wide -l app=whoami
NAMESPACE   NAME               DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE       CONTAINERS   IMAGES                   SELECTOR
default     deploy/whoami-t1   3         3         3            3           20m       whoami       opera443399/whoami:0.9   app=whoami
default     deploy/whoami-t2   3         3         3            0           38s       whoami       opera443399/whoami:0.9   app=whoami

NAMESPACE   NAME                            READY     STATUS    RESTARTS   AGE       IP             NODE
default     po/whoami-t1-6cf9cd6bf4-62bhc   1/1       Running   0          20m       172.30.105.1   tvm-04
default     po/whoami-t1-6cf9cd6bf4-dss72   1/1       Running   0          20m       172.30.105.2   tvm-04
default     po/whoami-t1-6cf9cd6bf4-zvpsk   1/1       Running   0          20m       172.30.105.0   tvm-04
default     po/whoami-t2-6cf9cd6bf4-5f9wl   0/1       Pending   0          38s       <none>         <none>
default     po/whoami-t2-6cf9cd6bf4-8l59z   0/1       Pending   0          38s       <none>         <none>
default     po/whoami-t2-6cf9cd6bf4-lqpzp   0/1       Pending   0          38s       <none>         <none>

[root@tvm-02 whoami]# kubectl describe deploy/whoami-t2
Name:                   whoami-t2
(omited)
Annotations:            deployment.kubernetes.io/revision=1
Replicas:               3 desired | 3 updated | 3 total | 0 available | 3 unavailable
(omited)
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      False   MinimumReplicasUnavailable
  Progressing    True    ReplicaSetUpdated
(omited)
[root@tvm-02 whoami]# kubectl describe po/whoami-t2-6cf9cd6bf4-5f9wl
Name:           whoami-t2-6cf9cd6bf4-5f9wl
(omited)
Status:         Pending
IP:
Controlled By:  ReplicaSet/whoami-t2-6cf9cd6bf4
(omited)
Conditions:
  Type           Status
  PodScheduled   False
(omited)
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  27s (x14 over 3m)  default-scheduler  0/4 nodes are available: 4 PodToleratesNodeTaints.
測試使用 toleration

增加 toleration 相關的配置來調度 whoami-t2master 節點上

apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
  name: whoami-t2
  labels:
    app: whoami
spec:
  replicas: 3
  selector:
    matchLabels:
      app: whoami
  template:
    metadata:
      labels:
        app: whoami
    spec:
      containers:
        - name: whoami
          image: opera443399/whoami:0.9
          ports:
            - containerPort: 80
      tolerations:
      - key: "node-role.kubernetes.io/master"
        operator: "Equal"
        value: ""
        effect: "NoSchedule"
測試結果

下述操作表明:之前不可用的節點,調整後,節點處於可用狀態, pod 部署成功

##### 更新
[root@tvm-02 whoami]# kubectl apply -f app-t2.yaml
deployment "whoami-t2" configured
##### 連續 2 次查看狀態
[root@tvm-02 whoami]# kubectl describe deploy/whoami-t2
Name:                   whoami-t2
(omitted)
Annotations:            deployment.kubernetes.io/revision=2
Replicas:               3 desired | 3 updated | 4 total | 2 available | 2 unavailable
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      False   MinimumReplicasUnavailable
  Progressing    True    ReplicaSetUpdated
OldReplicaSets:  whoami-t2-6cf9cd6bf4 (1/1 replicas created)
NewReplicaSet:   whoami-t2-647c9cb7c5 (3/3 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  39m   deployment-controller  Scaled up replica set whoami-t2-6cf9cd6bf4 to 3
  Normal  ScalingReplicaSet  14s   deployment-controller  Scaled up replica set whoami-t2-647c9cb7c5 to 1
  Normal  ScalingReplicaSet  12s   deployment-controller  Scaled down replica set whoami-t2-6cf9cd6bf4 to 2
  Normal  ScalingReplicaSet  12s   deployment-controller  Scaled up replica set whoami-t2-647c9cb7c5 to 2
  Normal  ScalingReplicaSet  6s    deployment-controller  Scaled down replica set whoami-t2-6cf9cd6bf4 to 1
  Normal  ScalingReplicaSet  6s    deployment-controller  Scaled up replica set whoami-t2-647c9cb7c5 to 3
[root@tvm-02 whoami]# kubectl describe deploy/whoami-t2
Name:                   whoami-t2
(omitted)
Annotations:            deployment.kubernetes.io/revision=2
Replicas:               3 desired | 3 updated | 3 total | 3 available | 0 unavailable
(omitted)
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   whoami-t2-647c9cb7c5 (3/3 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  39m   deployment-controller  Scaled up replica set whoami-t2-6cf9cd6bf4 to 3
  Normal  ScalingReplicaSet  28s   deployment-controller  Scaled up replica set whoami-t2-647c9cb7c5 to 1
  Normal  ScalingReplicaSet  26s   deployment-controller  Scaled down replica set whoami-t2-6cf9cd6bf4 to 2
  Normal  ScalingReplicaSet  26s   deployment-controller  Scaled up replica set whoami-t2-647c9cb7c5 to 2
  Normal  ScalingReplicaSet  20s   deployment-controller  Scaled down replica set whoami-t2-6cf9cd6bf4 to 1
  Normal  ScalingReplicaSet  20s   deployment-controller  Scaled up replica set whoami-t2-647c9cb7c5 to 3
  Normal  ScalingReplicaSet  12s   deployment-controller  Scaled down replica set whoami-t2-6cf9cd6bf4 to 0

[root@tvm-02 whoami]# kubectl get ds,deploy,svc,pods --all-namespaces -o wide -l app=whoami
NAMESPACE   NAME               DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE       CONTAINERS   IMAGES                   SELECTOR
default     deploy/whoami-t1   3         3         3            3           1h        whoami       opera443399/whoami:0.9   app=whoami
default     deploy/whoami-t2   3         3         3            3           45m       whoami       opera443399/whoami:0.9   app=whoami

NAMESPACE   NAME                            READY     STATUS    RESTARTS   AGE       IP               NODE
default     po/whoami-t1-6cf9cd6bf4-62bhc   1/1       Running   0          1h        172.30.105.1     tvm-04
default     po/whoami-t1-6cf9cd6bf4-dss72   1/1       Running   0          1h        172.30.105.2     tvm-04
default     po/whoami-t1-6cf9cd6bf4-zvpsk   1/1       Running   0          1h        172.30.105.0     tvm-04
default     po/whoami-t2-647c9cb7c5-9b5b6   1/1       Running   0          6m        172.30.105.3     tvm-04
default     po/whoami-t2-647c9cb7c5-kmj6k   1/1       Running   0          6m        172.30.235.129   tvm-01
default     po/whoami-t2-647c9cb7c5-p5gwm   1/1       Running   0          5m        172.30.60.193    tvm-03
如何移除指定的 taint 呢?
[root@tvm-02 whoami]# kubectl taint nodes tvm-04 node-role.kubernetes.io/master:NoSchedule-
node "tvm-04" untainted
##### 符合預期
[root@tvm-02 whoami]# kubectl describe nodes tvm-04 |grep -E '(Roles|Taints)'
Roles:              <none>
Taints:             <none>

聊一聊 Taints and Tolerations 的細節

TaintsNode affinity 是對立的概念,用來允許一個 node 拒絕某一類 pods
Taintstolerations 配合起來可以保證 pods 不會被調度到不合適的 nodes 上幹活
一個 node 上可以有多個 taints
tolerations 應用到 pods 來允許被調度到合適的 nodes 上幹活

概念

示範增加一個 taint 到 node 上的操作:

kubectl taint nodes tvm-04 demo.test.com/app=whoami:NoSchedule

在節點 tvm-04 上配置了一個 taint ,其中:
keydemo.test.com/app
valuewhoami
taint effectNoSchedule

如果要移除 taint 則:

kubectl taint nodes tvm-04 demo.test.com/app:NoSchedule-

然後在 PodSpec 中定義 toleration 使得該 pod 可以被調度到 tvm-04 上,有下述 2 種方式:

tolerations:
- key: "demo.test.com/app"
  operator: "Equal"
  value: "whoami"
  effect: "NoSchedule"
tolerations:
- key: "demo.test.com/app"
  operator: "Exists"
  effect: "NoSchedule"

tainttoleration 要匹配上,需要滿足兩者的 keyseffects 是一致的,且:

  • operatorExists (意味着不用指定 value 的內容)時,或者
  • operatorEqualvalues 也相同

注1: operator 默認值是 Equal 如果不指定的話

注2: 留意下面 2 個使用 Exists 的特例

  • key 爲空且 operatorExists 時,將匹配所有的 keys, valueseffects ,這表明可以 tolerate 所有的 taint
    
    tolerations:
  • operator: "Exists"

  • effect 爲空將匹配 demo.test.com/app 這個 key 對應的所有的 effects
    
    tolerations:
  • key: "demo.test.com/app"
    operator: "Exists"

上述 effect 使用的是 NoSchedule ,其實還可以使用其他的調度策略,例如:

  • PreferNoSchedule : 這意味着不是一個強制必須的調度策略(儘量不去滿足不合要求的 pod 調度到 node 上來)
  • NoExecute : 後續解釋

可以在同一個 node 上使用多個 taints ,也可以在同一個 pod 上使用多個 tolerations ,而 k8s 在處理 taints and tolerations 時類似一個過濾器:

  • 對比一個 node 上所有的 taints
  • 忽略掉和 pod 中 toleration 匹配的 taints
  • 遺留下來未被忽略掉的所有 taints 將對 pod 產生 effect

尤其是:

  • 至少有 1 個未被忽略的 tainteffectNoSchedule 時,則 k8s 不會將該 pod 調度到這個 node 上
  • 不滿足上述場景,但至少有 1 個未被忽略的 tainteffectPreferNoSchedule 時,則 k8s 將嘗試不把該 pod 調度到這個 node 上
  • 至少有 1 個未被忽略的 tainteffectNoExecute 時,則 k8s 會立即將該 pod 從該 node 上驅逐(如果已經在該 node 上運行),或着不會將該 pod 調度到這個 node 上(如果還沒在這個 node 上運行)

實例,有下述 node 和 pod 的定義:

kubectl taint nodes tvm-04 key1=value1:NoSchedule
kubectl taint nodes tvm-04 key1=value1:NoExecute
kubectl taint nodes tvm-04 key2=value2:NoSchedule
tolerations:
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoSchedule"
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoExecute"

上述場景中,

  • 該 pod 不會調度到 node 上,因爲第 3 個 taint 不滿足
  • 如果該 pod 已經在該 node 上運行,則不會被驅逐

通常而言,不能 tolerate 一個 effectNoExecute 的 pod 將被立即驅逐,但是,通過指定可選的字段 tolerationSeconds 則可以規定該 pod 延遲到一個時間段後再被驅逐,例如:

tolerations:
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoExecute"
  tolerationSeconds: 3600

也就是說,在 3600 秒後將被驅逐。但是,如果在這個時間點前移除了相關的 taint 則也不會被驅逐
注3:關於被驅逐,如果該 pod 沒有其他地方可以被調度,也不會被驅逐出去(個人實驗結果,請自行驗證)

ZYXW、參考

  1. Taints and Tolerations
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章