k8s基本概念-配置調度策略之(Taints-and-Tolerations)
2018/4/12
通過定義 Taints and Tolerations 來達到 node 排斥 pod 的目的
- 通過一個典型實例來描述 taint 和 toleration 之間的關聯
- 測試前的集羣狀態
- 部署app
whoami-t1
- 測試 taint 的用法
- 測試結果
- 測試使用
toleration
- 測試結果
- 如何移除指定的 taint 呢?
- 聊一聊 Taints and Tolerations 的細節
- 概念
通過一個典型實例來描述 taint 和 toleration 之間的關聯
測試前的集羣狀態
部署集羣的時候,你極可能有留意到,集羣中設置爲 master 角色的節點,是不會有任務調度到這裏來執行的,這是爲何呢?
[root@tvm-02 whoami]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
tvm-01 Ready master 8d v1.9.0
tvm-02 Ready master 8d v1.9.0
tvm-03 Ready master 8d v1.9.0
tvm-04 Ready <none> 8d v1.9.0
[root@tvm-02 whoami]# kubectl describe nodes tvm-01 |grep -E '(Roles|Taints)'
Roles: master
Taints: node-role.kubernetes.io/master:NoSchedule
[root@tvm-02 whoami]# kubectl describe nodes tvm-02 |grep -E '(Roles|Taints)'
Roles: master
Taints: node-role.kubernetes.io/master:NoSchedule
[root@tvm-02 whoami]# kubectl describe nodes tvm-03 |grep -E '(Roles|Taints)'
Roles: master
Taints: node-role.kubernetes.io/master:NoSchedule
部署app whoami-t1
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
name: whoami-t1
labels:
app: whoami
spec:
replicas: 3
selector:
matchLabels:
app: whoami
template:
metadata:
labels:
app: whoami
spec:
containers:
- name: whoami
image: opera443399/whoami:0.9
ports:
- containerPort: 80
部署後可以發現,所有任務都被調度到 worker 節點上
[root@tvm-02 whoami]# kubectl apply -f app-t1.yaml deployment "whoami-t1" created [root@tvm-02 whoami]# kubectl get ds,deploy,svc,pods --all-namespaces -o wide -l app=whoami NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR default deploy/whoami-t1 3 3 3 3 46s whoami opera443399/whoami:0.9 app=whoami
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
default po/whoami-t1-6cf9cd6bf4-62bhc 1/1 Running 0 46s 172.30.105.1 tvm-04
default po/whoami-t1-6cf9cd6bf4-dss72 1/1 Running 0 46s 172.30.105.2 tvm-04
default po/whoami-t1-6cf9cd6bf4-zvpsk 1/1 Running 0 46s 172.30.105.0 tvm-04
##### 測試 taint 的用法
給 tvm-04 配置一個 taint 來調整調度策略
```bash
[root@tvm-02 whoami]# kubectl taint nodes tvm-04 node-role.kubernetes.io/master=:NoSchedule
node "tvm-04" tainted
##### 符合預期
[root@tvm-02 whoami]# kubectl describe nodes tvm-04 |grep -E '(Roles|Taints)'
Roles: <none>
Taints: node-role.kubernetes.io/master:NoSchedule
上述 taint 的指令含義是:
給節點tvm-04
配置一個 taint (可以理解爲:污點)
其中,這個 taint 的
key 是node-role.kubernetes.io/master
value 是` (值爲空)<br/>taint effect 是
NoSchedule<br/>這意味着,沒有任何 pod 可以調度到這個節點上面,除非在這個 pod 的描述文件中有一個對應的
toleration` (可以理解爲:設置 pod 容忍了這個污點)
測試結果
我們發現,之前部署的 deploy/whoami-t1
並未被驅逐
[root@tvm-02 whoami]# kubectl get ds,deploy,svc,pods --all-namespaces -o wide -l app=whoami
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
default deploy/whoami-t1 3 3 3 3 17m whoami opera443399/whoami:0.9 app=whoami
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
default po/whoami-t1-6cf9cd6bf4-62bhc 1/1 Running 0 17m 172.30.105.1 tvm-04
default po/whoami-t1-6cf9cd6bf4-dss72 1/1 Running 0 17m 172.30.105.2 tvm-04
default po/whoami-t1-6cf9cd6bf4-zvpsk 1/1 Running 0 17m 172.30.105.0 tvm-04
接着我們嘗試着再部署一個app whoami-t2
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
name: whoami-t2
labels:
app: whoami
spec:
replicas: 3
selector:
matchLabels:
app: whoami
template:
metadata:
labels:
app: whoami
spec:
containers:
- name: whoami
image: opera443399/whoami:0.9
ports:
- containerPort: 80
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Equal"
value: ""
effect: "NoSched
下述操作表明:策略已經生效,只是舊的 deploy 默認不會受到影響(被強制驅逐)
##### 部署
[root@tvm-02 whoami]# kubectl apply -f app-t2.yaml
deployment "whoami-t2" created
[root@tvm-02 whoami]# kubectl get ds,deploy,svc,pods --all-namespaces -o wide -l app=whoami
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
default deploy/whoami-t1 3 3 3 3 20m whoami opera443399/whoami:0.9 app=whoami
default deploy/whoami-t2 3 3 3 0 38s whoami opera443399/whoami:0.9 app=whoami
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
default po/whoami-t1-6cf9cd6bf4-62bhc 1/1 Running 0 20m 172.30.105.1 tvm-04
default po/whoami-t1-6cf9cd6bf4-dss72 1/1 Running 0 20m 172.30.105.2 tvm-04
default po/whoami-t1-6cf9cd6bf4-zvpsk 1/1 Running 0 20m 172.30.105.0 tvm-04
default po/whoami-t2-6cf9cd6bf4-5f9wl 0/1 Pending 0 38s <none> <none>
default po/whoami-t2-6cf9cd6bf4-8l59z 0/1 Pending 0 38s <none> <none>
default po/whoami-t2-6cf9cd6bf4-lqpzp 0/1 Pending 0 38s <none> <none>
[root@tvm-02 whoami]# kubectl describe deploy/whoami-t2
Name: whoami-t2
(omited)
Annotations: deployment.kubernetes.io/revision=1
Replicas: 3 desired | 3 updated | 3 total | 0 available | 3 unavailable
(omited)
Conditions:
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
Progressing True ReplicaSetUpdated
(omited)
[root@tvm-02 whoami]# kubectl describe po/whoami-t2-6cf9cd6bf4-5f9wl
Name: whoami-t2-6cf9cd6bf4-5f9wl
(omited)
Status: Pending
IP:
Controlled By: ReplicaSet/whoami-t2-6cf9cd6bf4
(omited)
Conditions:
Type Status
PodScheduled False
(omited)
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 27s (x14 over 3m) default-scheduler 0/4 nodes are available: 4 PodToleratesNodeTaints.
測試使用 toleration
增加 toleration
相關的配置來調度 whoami-t2
到 master
節點上
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
name: whoami-t2
labels:
app: whoami
spec:
replicas: 3
selector:
matchLabels:
app: whoami
template:
metadata:
labels:
app: whoami
spec:
containers:
- name: whoami
image: opera443399/whoami:0.9
ports:
- containerPort: 80
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Equal"
value: ""
effect: "NoSchedule"
測試結果
下述操作表明:之前不可用的節點,調整後,節點處於可用狀態, pod 部署成功
##### 更新
[root@tvm-02 whoami]# kubectl apply -f app-t2.yaml
deployment "whoami-t2" configured
##### 連續 2 次查看狀態
[root@tvm-02 whoami]# kubectl describe deploy/whoami-t2
Name: whoami-t2
(omitted)
Annotations: deployment.kubernetes.io/revision=2
Replicas: 3 desired | 3 updated | 4 total | 2 available | 2 unavailable
Conditions:
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
Progressing True ReplicaSetUpdated
OldReplicaSets: whoami-t2-6cf9cd6bf4 (1/1 replicas created)
NewReplicaSet: whoami-t2-647c9cb7c5 (3/3 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 39m deployment-controller Scaled up replica set whoami-t2-6cf9cd6bf4 to 3
Normal ScalingReplicaSet 14s deployment-controller Scaled up replica set whoami-t2-647c9cb7c5 to 1
Normal ScalingReplicaSet 12s deployment-controller Scaled down replica set whoami-t2-6cf9cd6bf4 to 2
Normal ScalingReplicaSet 12s deployment-controller Scaled up replica set whoami-t2-647c9cb7c5 to 2
Normal ScalingReplicaSet 6s deployment-controller Scaled down replica set whoami-t2-6cf9cd6bf4 to 1
Normal ScalingReplicaSet 6s deployment-controller Scaled up replica set whoami-t2-647c9cb7c5 to 3
[root@tvm-02 whoami]# kubectl describe deploy/whoami-t2
Name: whoami-t2
(omitted)
Annotations: deployment.kubernetes.io/revision=2
Replicas: 3 desired | 3 updated | 3 total | 3 available | 0 unavailable
(omitted)
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: whoami-t2-647c9cb7c5 (3/3 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 39m deployment-controller Scaled up replica set whoami-t2-6cf9cd6bf4 to 3
Normal ScalingReplicaSet 28s deployment-controller Scaled up replica set whoami-t2-647c9cb7c5 to 1
Normal ScalingReplicaSet 26s deployment-controller Scaled down replica set whoami-t2-6cf9cd6bf4 to 2
Normal ScalingReplicaSet 26s deployment-controller Scaled up replica set whoami-t2-647c9cb7c5 to 2
Normal ScalingReplicaSet 20s deployment-controller Scaled down replica set whoami-t2-6cf9cd6bf4 to 1
Normal ScalingReplicaSet 20s deployment-controller Scaled up replica set whoami-t2-647c9cb7c5 to 3
Normal ScalingReplicaSet 12s deployment-controller Scaled down replica set whoami-t2-6cf9cd6bf4 to 0
[root@tvm-02 whoami]# kubectl get ds,deploy,svc,pods --all-namespaces -o wide -l app=whoami
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
default deploy/whoami-t1 3 3 3 3 1h whoami opera443399/whoami:0.9 app=whoami
default deploy/whoami-t2 3 3 3 3 45m whoami opera443399/whoami:0.9 app=whoami
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
default po/whoami-t1-6cf9cd6bf4-62bhc 1/1 Running 0 1h 172.30.105.1 tvm-04
default po/whoami-t1-6cf9cd6bf4-dss72 1/1 Running 0 1h 172.30.105.2 tvm-04
default po/whoami-t1-6cf9cd6bf4-zvpsk 1/1 Running 0 1h 172.30.105.0 tvm-04
default po/whoami-t2-647c9cb7c5-9b5b6 1/1 Running 0 6m 172.30.105.3 tvm-04
default po/whoami-t2-647c9cb7c5-kmj6k 1/1 Running 0 6m 172.30.235.129 tvm-01
default po/whoami-t2-647c9cb7c5-p5gwm 1/1 Running 0 5m 172.30.60.193 tvm-03
如何移除指定的 taint 呢?
[root@tvm-02 whoami]# kubectl taint nodes tvm-04 node-role.kubernetes.io/master:NoSchedule-
node "tvm-04" untainted
##### 符合預期
[root@tvm-02 whoami]# kubectl describe nodes tvm-04 |grep -E '(Roles|Taints)'
Roles: <none>
Taints: <none>
聊一聊 Taints and Tolerations 的細節
Taints
和Node affinity
是對立的概念,用來允許一個 node 拒絕某一類 podsTaints
和tolerations
配合起來可以保證 pods 不會被調度到不合適的 nodes 上幹活
一個 node 上可以有多個taints
將tolerations
應用到 pods 來允許被調度到合適的 nodes 上幹活
概念
示範增加一個 taint 到 node 上的操作:
kubectl taint nodes tvm-04 demo.test.com/app=whoami:NoSchedule
在節點 tvm-04
上配置了一個 taint
,其中:key
是 demo.test.com/app
value
是 whoami
taint effect
是 NoSchedule
如果要移除 taint
則:
kubectl taint nodes tvm-04 demo.test.com/app:NoSchedule-
然後在 PodSpec
中定義 toleration
使得該 pod 可以被調度到 tvm-04
上,有下述 2 種方式:
tolerations:
- key: "demo.test.com/app"
operator: "Equal"
value: "whoami"
effect: "NoSchedule"
tolerations:
- key: "demo.test.com/app"
operator: "Exists"
effect: "NoSchedule"
taint
和 toleration
要匹配上,需要滿足兩者的 keys
和 effects
是一致的,且:
- 當
operator
是Exists
(意味着不用指定value
的內容)時,或者 - 當
operator
是Equal
時values
也相同
注1: operator
默認值是 Equal
如果不指定的話
注2: 留意下面 2 個使用 Exists
的特例
- key 爲空且
operator
是Exists
時,將匹配所有的keys
,values
和effects
,這表明可以tolerate
所有的taint
tolerations:
-
operator: "Exists"
effect
爲空將匹配demo.test.com/app
這個key
對應的所有的effects
tolerations:
- key: "demo.test.com/app"
operator: "Exists"
上述 effect
使用的是 NoSchedule
,其實還可以使用其他的調度策略,例如:
- PreferNoSchedule : 這意味着不是一個強制必須的調度策略(儘量不去滿足不合要求的 pod 調度到 node 上來)
- NoExecute : 後續解釋
可以在同一個 node 上使用多個 taints
,也可以在同一個 pod 上使用多個 tolerations
,而 k8s 在處理 taints and tolerations
時類似一個過濾器:
- 對比一個 node 上所有的
taints
- 忽略掉和 pod 中
toleration
匹配的taints
- 遺留下來未被忽略掉的所有
taints
將對 pod 產生effect
尤其是:
- 至少有 1 個未被忽略的
taint
且effect
是NoSchedule
時,則 k8s 不會將該 pod 調度到這個 node 上 - 不滿足上述場景,但至少有 1 個未被忽略的
taint
且effect
是PreferNoSchedule
時,則 k8s 將嘗試不把該 pod 調度到這個 node 上 - 至少有 1 個未被忽略的
taint
且effect
是NoExecute
時,則 k8s 會立即將該 pod 從該 node 上驅逐(如果已經在該 node 上運行),或着不會將該 pod 調度到這個 node 上(如果還沒在這個 node 上運行)
實例,有下述 node 和 pod 的定義:
kubectl taint nodes tvm-04 key1=value1:NoSchedule
kubectl taint nodes tvm-04 key1=value1:NoExecute
kubectl taint nodes tvm-04 key2=value2:NoSchedule
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoExecute"
上述場景中,
- 該 pod 不會調度到 node 上,因爲第 3 個 taint 不滿足
- 如果該 pod 已經在該 node 上運行,則不會被驅逐
通常而言,不能 tolerate
一個 effect
是 NoExecute
的 pod 將被立即驅逐,但是,通過指定可選的字段 tolerationSeconds
則可以規定該 pod 延遲到一個時間段後再被驅逐,例如:
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoExecute"
tolerationSeconds: 3600
也就是說,在 3600 秒後將被驅逐。但是,如果在這個時間點前移除了相關的 taint
則也不會被驅逐
注3:關於被驅逐,如果該 pod 沒有其他地方可以被調度,也不會被驅逐出去(個人實驗結果,請自行驗證)