kubernetes高級調度方式

高級調度設置機制分爲以下兩類：

節點選擇器： nodeSelector , nodeName

節點親和角度： nodeAffinty

調度器的邏輯

1 節點選擇器

nodeSelector 、nodeName、NodeAffinity

如果期望把pod調度到特定節點上，直接給定node名稱即可，這樣對應pod一定只能被調度到對應節點

如果有一類節點都符合條件，則使用nodeSeleteor，給一定的節點打上標籤，在pod的配置中去匹配節點標籤，這樣的方式可以極大的縮小範圍

nodeSelector

例：找到gpu爲ssd的標籤節點

[root@master k8s]# mkdir schedule

[root@master k8s]# cd schedule/

[root@master schedule]# ll

total 0

[root@master schedule]# cp ../pod-demo.yaml .

[root@master schedule]#

apiVersion: v1

kind: Pod

metadata:

  name: pod-demo

  namespace: default

  labels:

    app: myapp

    tier: frontend

spec:

  containers:

    - name: myapp

      image: ikubernetes/myapp:v1

      imagePullPolicy: IfNotPresent

  nodeSelector:            # 調用的是MatchNodeSelector  預選策略，查看ssd 標籤的node是否存在

    disktype: ssd

查看node 標籤

[root@master schedule]# kubectl get nodes --show-labels

NAME                  STATUS    ROLES     AGE       VERSION   LABELS

master.test.k8s.com   Ready     master    2d        v1.11.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=master.test.k8s.com,node-role.kubernetes.io/master=

node1.test.k8s.com    Ready     <none>    2d        v1.11.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=node1.test.k8s.com

node2.test.k8s.com    Ready     <none>    2d        v1.11.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=node2.test.k8s.com

[root@master schedule]#

如果給予其中一個加入標籤，那麼它一定在指定的node中創建pod

如果沒有任何標籤則會處於Pending狀態

[root@master schedule]# kubectl get pods  -o wide

NAME       READY     STATUS    RESTARTS   AGE       IP        NODE      NOMINATED NODE

pod-demo   0/1       Pending   0          47s       <none>    <none>    <none>

[root@master schedule]#

調度是無法成功，它是一種強約束，所以必須滿足其條件

describe查看信息

Events:

  Type     Reason            Age                # From               Message

  ----     ------            ----                #----               -------

  Warning  FailedScheduling  18s (x25 over 1m)  #default-scheduler  0/3 nodes are available: 3 node(s) didn't match node selector.

除非重新打上標籤

[root@master schedule]# kubectl label nodes node1.test.k8s.com disktype=ssd

node/node1.test.k8s.com labeled

再次查看

[root@master schedule]# kubectl get pods -o wide

NAME       READY     STATUS    RESTARTS   AGE       IP             NODE                 NOMINATED NODE

pod-demo   1/1       Running   0          4m        10.244.1.153   node1.test.k8s.com   <none>

[root@master schedule]#

NodeAffinity

Node affinity跟nodeSelector很像，可以限制Pods在特定節點上運行，也可以優先調度到特定節點

使用方式

[root@master schedule]# kubectl explain pods.spec.affinity

KIND:     Pod

VERSION:  v1

RESOURCE: affinity <Object>

[root@master schedule]# kubectl explain pods.spec.affinity.nodeAffinity | grep '<'

RESOURCE: nodeAffinity <Object>

   preferredDuringSchedulingIgnoredDuringExecution  <[]Object>        # 它的值是一個對象列表

   requiredDuringSchedulingIgnoredDuringExecution   <Object>

NodeAffinity的親和性

requiredDuringSchedulingIgnoredDuringExecution 硬親和性比如滿足條件

preferredDuringSchedulingIgnoredDuringExecution 軟親和性，儘量滿足條件，否則找其他節點運行

定義一個硬親和，requiredDuringSchedulingIgnoredDuringExecution

通過區域判定，如果節點中擁有此標籤則在此創建pod

apiVersion: v1

kind: Pod

metadata:

  name: pod-demo

  namespace: default

  labels:

    app: myapp

    tier: frontend

spec:

  containers:

    - name: myapp

      image: ikubernetes/myapp:v1

      imagePullPolicy: IfNotPresent

  affinity:

    nodeAffinity:

      requiredDuringSchedulingIgnoredDuringExecution:

        nodeSelectorTerms:

          matchExpressions:

          - key: zone            # 如果當前key中的value，在node上存在，則創建pod

            operator: In

            value:

              - foo

              - bar

運行時是pending，因爲是硬親和性，當前沒有用於這個標籤

node 軟親和

軟親和性

[root@master schedule]# kubectl explain pods.spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution

KIND:     Pod

VERSION:  v1

看到使用方式

   preference   <Object> -required-

     A node selector term, associated with the corresponding weight.

   weight   <integer> -required-            # 給予權重和對象（定義哪些節點）

     Weight associated with matching the corresponding nodeSelectorTerm, in the

     range 1-100.

[root@master schedule]# cat preferred-pod.yaml

apiVersion: v1

kind: Pod

metadata:

  name: pod-demo

  namespace: default

  labels:

    app: myapp

    tier: frontend

spec:

  containers:

    - name: myapp

      image: ikubernetes/myapp:v1

      imagePullPolicy: IfNotPresent

  affinity:

    nodeAffinity:

      preferredDuringSchedulingIgnoredDuringExecution:

      - preference:

          matchExpressions:

          - key: zone

            operator: In

            values:

            - foo

            - bar

        weight: 60

匹配不到標籤，但是還可以照常運行

[root@master schedule]# kubectl get pods

NAME       READY     STATUS    RESTARTS   AGE

pod-demo   1/1       Running   0          1m

[root@master schedule]#

pod親和性

與node親和性相比，pod親和性並不強制

以節點名稱爲不同位置，那麼很顯然每個節點都不同，因此每個節點都是獨特的位置

所以另一種判斷標準，以標籤爲位置，同樣的標籤爲同一位置，這樣纔可以判斷哪些滿足親和性，以及其他調度屬性

pod 也有軟硬親和性，如下所示

[root@master schedule]# kubectl explain pods.spec.affinity.podAffinity

preferredDuringSchedulingIgnoredDuringExecution

requiredDuringSchedulingIgnoredDuringExecution

[root@master schedule]# kubectl explain pods.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution

  podAffinityTerm    <Object> -required-

  weight    <integer> -required-

[root@master schedule]# kubectl explain pods.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution

  labelSelector

  namespaces

  topologyKey

定義pod

第一個資源

apiVersion: v1

kind: Pod

metadata:

  name: pod-first

  namespace: default

  labels:

    app: myapp

    tier: frontend

spec:

  containers:

    - name: myapp

      image: ikubernetes/myapp:v1

定義多個資源

[root@master schedule]# cat pod-first.yaml

apiVersion: v1

kind: Pod

metadata:

  name: pod-first

  namespace: default

  labels:

    app: myapp

    tier: frontend

spec:

  containers:

    - name: myapp

      image: ikubernetes/myapp:v1

---

apiVersion: v1

kind: Pod

metadata:

  name: pod-second

  labels:

    app: db

    tier: db

spec:

  containers:

    - name: busybox

      image: busybox:latest

      imagePullPolicy: IfNotPresent

      command: ["/bin/sh","-c","sleep 360000"]

每個節點都自動創建一個標籤，名爲當前節點的hostname

[root@master schedule]# kubectl get nodes --show-labels

NAME                  STATUS    ROLES     AGE       VERSION   LABELS

master.test.k8s.com   Ready     master    3d        v1.11.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=master.test.k8s.com,node-role.kubernetes.io/master=

node1.test.k8s.com    Ready     <none>    3d        v1.11.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/hostname=node1.test.k8s.com

node2.test.k8s.com    Ready     <none>    3d        v1.11.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=node2.test.k8s.com

You have new mail in /var/spool/mail/root

接下來定義affinity

topologKey 表示只要是當前hostname，則認爲是同一個位置，只要hostname則認爲同一個位置，每個節點的hostname不同，hostname是一個變量

如下

[root@master schedule]# cat pod-first.yaml

apiVersion: v1

kind: Pod

metadata:

  name: pod-first

  namespace: default

  labels:

    app: myapp

    tier: frontend

spec:

  containers:

    - name: myapp

      image: ikubernetes/myapp:v1

      imagePullPolicy: IfNotPresent

---

apiVersion: v1

kind: Pod

metadata:

  name: pod-second

  labels:

    app: backend

    tier: db

spec:

  containers:

    - name: busybox

      image: busybox:latest

      imagePullPolicy: IfNotPresent

      command: ["/bin/sh","-c","sleep 360000"]

  affinity:

    podAffinity:

      requiredDuringSchedulingIgnoredDuringExecution:          # 定義親和性

      - labelSelector:

          matchExpressions:                                    # 匹配哪個pod，要與pod標籤捆綁在一起

          - {key: app, operator: In, values: ["myapp"]}        # 找到存在pod標籤 app:myapp 的pod 優先選擇

        topologyKey: kubernetes.io/hostname

默認是一個均衡法則，兩個節點自然是一樣的，優先策略：cpu自然均衡並找到最少資源佔用的，

[root@master schedule]# kubectl get pods -o wide

NAME        READY    STATUS    RESTARTS  AGE      IP            NODE                NOMINATED NODE

pod-first    1/1      Running  0          2m        10.244.2.56  node2.test.k8s.com  <none>

pod-second  1/1      Running  0          35s      10.244.2.57  node2.test.k8s.com  <none>

[root@master schedule]#

[root@master schedule]# kubectl describe pod  pod-second

查看調度方式

  ----    ------    ----  ----                        -------

  Normal  Scheduled  3m    default-scheduler            Successfully assigned default/pod-second to node2.test.k8s.com        # 明確告知被調度到node2

  Normal  Pulled    3m    kubelet, node2.test.k8s.com  Container image "busybox:latest" already present on machine

  Normal  Created    3m    kubelet, node2.test.k8s.com  Created container

  Normal  Started    3m    kubelet, node2.test.k8s.com  Started container

如果使用軟親和性，可能會被調度到其他節點，因爲沒有那麼強制的策略

反親和

取反，key的值不能是相同的，二者的值肯定不能是相同的

更改如下

  affinity:

    podAntiAffinity:        # 更改爲反親和

      requiredDuringSchedulingIgnoredDuringExecution:

      - labelSelector:

          matchExpressions:

          - {key: app, operator: In, values: ["myapp"]}

        topologyKey: kubernetes.io/hostname

[root@master schedule]# kubectl apply -f pod-first.yaml

[root@master schedule]# kubectl get pods  -o wide

NAME        READY    STATUS    RESTARTS  AGE      IP            NODE                NOMINATED NODE

pod-first    1/1      Running  0          13s      10.244.1.161  node1.test.k8s.com  <none>

pod-second  1/1      Running  0          13s      10.244.2.58    node2.test.k8s.com  <none>

同樣的，如果pod-first 運行在這個節點，那麼pod-second 一定不能在這個節點

給node標籤

[root@master schedule]# kubectl label nodes node1.test.k8s.com zone=foo

node/node1.test.k8s.com labeled

[root@master schedule]# kubectl label nodes node2.test.k8s.com zone=foo

node/node2.test.k8s.com labeled

[root@master schedule]#

更改toplogKey

重新編輯配置清單

  affinity:

    podAntiAffinity:

      requiredDuringSchedulingIgnoredDuringExecution:

      - labelSelector:

          matchExpressions:

          - {key: app, operator: In, values: ["myapp"]}

        topologyKey: zone            # 排除的node 標籤

重新創建

[root@master schedule]# kubectl get pods -o wide

NAME         READY     STATUS    RESTARTS   AGE       IP            NODE                 NOMINATED NODE

pod-first    1/1       Running   0          5s        10.244.2.59   node2.test.k8s.com   <none>

pod-second   0/1       Pending   0          5s        <none>        <none>               <none>

[root@master schedule]#

pod-second爲Pending，因爲啓動時檢查pod是否存在反親和性，那麼會檢查topologyKey: zone 這個標籤是否存在，如果存在，因爲是反親和性，那麼則不在這個節點上運行

污點調度/容忍調度

後端傾向度，讓pod進行選擇，節點是被動選擇，給予了節點的選擇權，選擇讓那些pod進行調度到節點

污點定義

在node.spec中進行定義

[root@master schedule]# kubectl explain node.spec.taints

查看節點的說明詳細信息

[root@master schedule]# kubectl get nodes node1.test.k8s.com -o yaml

找到spec

spec:

  podCIDR: 10.244.1.0/24

taints是一個對象列表，用於定義節點的污點

定義污點：

關鍵參數：

geffect 要求必須要有當pod不能容忍污點時，採取的行爲是什麼所以是effect：

分別有以下定義：

effect定義對Pod排斥效果

[root@master schedule]# kubectl explain node.spec.taints.effect

KIND:     Node

VERSION:  v1

FIELD:    effect <string>

DESCRIPTION:

     Required. The effect of the taint on pods that do not tolerate the taint.

     Valid effects are NoSchedule, PreferNoSchedule and NoExecute.

NoSchedule	僅影響調度過程對現存的pod對象不產生影響
PreferNoSchedule	不能容忍也不能調度
NoExecute	即影響調度過程也影響當前的pod對象，不能容忍的pod對象將被驅逐

如果一個節點存在污點，那麼一個pod能否調度到這個節點，先去檢查能夠被匹配的污點容忍度

比如第一個污點與第一個容忍度剛好匹配到，那麼剩下的檢查不能被容忍則檢查污點的效果，那麼如果是noschedule ，則如何，如果是noexecute 則又如何

查看node中的污點

[root@master schedule]# kubectl describe node master.test.k8s.com | grep -i taints

Taints:             node-role.kubernetes.io/master:NoSchedule        # pod只要不能容忍這個污點則不能被調度

master是NoSchedule，也就是說爲什麼master上不被調度pod的原因

所以各種pod很多，從來就沒調度到master之上，就是說沒有定義過它的容忍度

比如查看kube-apiserver-master的信息

[root@master schedule]# kubectl describe pod  kube-apiserver-master.test.k8s.com  -n kube-system

看到如下，Tolerations表示容忍度：表示所有污點，NoExecute 表示不能被調度

Tolerations:      :NoExecute        # 容忍度，如果能容忍則NoExecute 可以調度過來，顯然這裏是所有

查看kube-proxy的節點信息

[root@master schedule]# kubectl describe pod  kube-proxy-lncxb  -n kube-system

它的容忍度比較明顯

Tolerations:

                 CriticalAddonsOnly            # 附件

                 node.kubernetes.io/disk-pressure:NoSchedule

                 node.kubernetes.io/memory-pressure:NoSchedule

                 node.kubernetes.io/not-ready:NoExecute

                 node.kubernetes.io/unreachable:NoExecute

以上都會影響容忍度的檢查

添加一個污點容忍

[root@master ~]# kubectl taint node node1.test.k8s.com  node-type=production:NoSchedule    # 節點類型是不能容忍污點被調度

node/node1.test.k8s.com tainted

[root@master ~]# kubectl describe nodes node1.test.k8s.com | grep -i taint

Taints:             node-type=production:NoSchedule

這樣就爲node1 加入了污點，這樣以後就不會被調度到node1上來

定義如下

清單中這3個pod，沒有污點容忍度

apiVersion: apps/v1

kind: Deployment

metadata:

  name: myapp-deploy

  namespace: default

spec:

  replicas: 3

  selector:

    matchLabels:

      app: myapp

      release: cancary

  template:

    metadata:

      labels:

        app: myapp

        release: cancary

    spec:

      containers:

      - name: myapp

        image: ikubernetes/myapp:v2

        ports:

        - name: http

          containerPort: 80

所以他們都在node2上，因爲他們不能容忍node1的污點，因爲沒有定義pod的容忍度

[root@master daemonset]# kubectl apply -f deploy.yaml

[root@master daemonset]# kubectl get pods -o wide

NAME                          READY    STATUS    RESTARTS  AGE      IP            NODE                NOMINATED NODE

myapp-deploy-86c975f8b-7x6m7  1/1      Running  0          4s        10.244.2.62  node2.test.k8s.com  <none>

myapp-deploy-86c975f8b-bk9c7  1/1      Running  0          4s        10.244.2.61  node2.test.k8s.com  <none>

myapp-deploy-86c975f8b-rpd84  1/1      Running  0          4s        10.244.2.60  node2.test.k8s.com  <none>

如果在node2上加入污點查看效果，這個節點類型是dev環境，而且類型是NoExecute

如下所示，pod狀態都爲pending

[root@master daemonset]# kubectl taint node node2.test.k8s.com  node-type=production:NoExecute

node/node2.test.k8s.com tainted

[root@master daemonset]# kubectl get pods -o wide

NAME                          READY    STATUS    RESTARTS  AGE      IP        NODE      NOMINATED NODE

myapp-deploy-86c975f8b-4sd6c  0/1      Pending  0          11s      <none>    <none>    <none>

myapp-deploy-86c975f8b-nf985  0/1      Pending  0          11s      <none>    <none>    <none>

myapp-deploy-86c975f8b-vx2h2  0/1      Pending  0          11s      <none>    <none>    <none>

[root@master daemonset]#

加入pod容忍度

只需要讓其容忍哪些污點即可，每個容忍度都是一個列表中的元素

[root@master daemonset]# kubectl explain pods.spec.tolerations

KIND:     Pod

VERSION:  v1

RESOURCE: tolerations <[]Object>

tolerationSeconds 容忍時間，意思爲如果被驅逐，則等待定義的時間再去被驅逐，默認是0秒

   tolerationSeconds    <integer>

     TolerationSeconds represents the period of time the toleration (which must

     be of effect NoExecute, otherwise this field is ignored) tolerates the

     taint. By default, it is not set, which means tolerate the taint forever

     (do not evict). Zero and negative values will be treated as 0 (evict

     immediately) by the system.

operator 參數

  operator    <string>

    Operator represents a key's relationship to the value. Valid operators are

    # Exists and Equal.

    Defaults to Equal. Exists is equivalent to wildcard for

    value, so that a pod can tolerate all taints of a particular category.

exists 判定污點存在

equal 表示必須精確容忍其容忍值，等值比較

定義容忍度

節點類型哪個污點

[root@master schedule]# cat deploy.yaml

apiVersion: apps/v1

kind: Deployment

metadata:

  name: myapp-deploy

  namespace: default

spec:

  replicas: 3

  selector:

    matchLabels:

      app: myapp

      release: cancary

  template:

    metadata:

      labels:

        app: myapp

        release: cancary

    spec:

      containers:

      - name: myapp

        image: ikubernetes/myapp:v2

        ports:

        - name: http

          containerPort: 80

      tolerations:

      - key: "node-type"

        operator: "Equal"

        value: "production"

        effect: "NoExecute"

        tolerationSeconds: 300

[root@master schedule]# kubectl get pods -o wide

NAME                            READY     STATUS    RESTARTS   AGE       IP            NODE                 NOMINATED NODE

myapp-deploy-595c744cf7-6cll6   1/1       Running   0          16s       10.244.2.65   node2.test.k8s.com   <none>

myapp-deploy-595c744cf7-fwgqr   1/1       Running   0          16s       10.244.2.63   node2.test.k8s.com   <none>

myapp-deploy-595c744cf7-hhdfq   1/1       Running   0          16s       10.244.2.64   node2.test.k8s.com   <none>

kubernetes高級調度方式

函數的定義、解構、及銷燬過程

python操作數據庫

挪窩兒

吐個曹

CMDB 設計（三）名稱簡化及插件化開發

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結