kubernetes -- 结点调度控制的几种方式

简介

  • 调度器通过 kubernetes 的 watch 机制来发现集群中新创建且尚未被调度到 Node上的 Pod。调度器会将发现的每一个未调度的 Pod 调度到一个合适的 Node 上来运行。
  • kube-scheduler 是 Kubernetes 集群的默认调度器,并且是集群控制面的一部分。如果你真的希望或者有这方面的需求,kube-scheduler 在设计上是允许你自己写一个调度组件并替换原有的 kube-scheduler。
  • 在做调度决定时需要考虑的因素包括:单独和整体的资源请求、硬件/软件/策略限制、亲和以及反亲和要求、数据局域性、负载间的干扰等等。
  • 默认策略可以参考:https://kubernetes.io/zh/docs/concepts/scheduling/kube-scheduler/
  • 调度框架:https://kubernetes.io/zh/docs/concepts/scheduling-eviction/scheduling-framework/

根据nodeName调度

  • nodeName 是节点选择约束的最简单方法,但一般不推荐。如果 nodeName 在PodSpec 中指定了,则它优先于其他的节点选择方法。
  • 使用 nodeName 来选择节点的一些限制:
    • 如果指定的节点不存在。
    • 如果指定的节点没有资源来容纳 pod,则pod 调度失败。
    • 云环境中的节点名称并非总是可预测或稳定的。是会变的

这三个条件任意一个不满足就运行不起来。

[root@server2 scheduler]# vim pod.yml 

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
#  nodeName: server4			/先注释掉结点名,看它现在那里运行
[root@server2 scheduler]# kubectl apply -f pod.yml 
pod/nginx created

[root@server2 scheduler]# kubectl get pod -owide
NAME    READY   STATUS    RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
nginx   1/1     Running   0          13s   10.244.141.255   server3   <none>           <none>
/ 运行在了server3上,那末如果不指定的话,这个 pod会依然运行在server3上,这是k8s的默认机制
[root@server2 scheduler]# vim pod.yml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
  nodeName: server4			/指定结点名,让运行在server4撒谎嗯

[root@server2 scheduler]# kubectl delete pod nginx 
pod "nginx" deleted
kubec	app	-f	[root@server2 scheduler]# kubectl apply -f pod.yml 
kpod/nginx created
[root@server2 scheduler]# kubectl get pod -owide
NAME    READY   STATUS    RESTARTS   AGE   IP             NODE      NOMINATED NODE   READINESS GATES
nginx   1/1     Running   0          5s    10.244.22.48   server4   <none>           <none>
就运行再了server4上

虽然方便,但是会收到各种限制,比如我们的server4上主机只有1个cpu :

vi mpod[root@server2 scheduler]# vim pod.yml 
[root@server2 scheduler]# cat pod.yml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
    resources:
      requests:
        cpu: 2			/需要最低10个cpu
  nodeName: server4

[root@server2 scheduler]# kubectl apply -f pod.yml 
kubepod/nginx created
[root@server2 scheduler]# kubectl get pod
NAME    READY   STATUS     RESTARTS   AGE
nginx   0/1     OutOfcpu   0          2s

就运行不起来了,换作内存也一样。

根据nodeSelector调度

nodeSelector 是节点选择约束的最简单推荐形式。

给选择的节点添加标签:

[root@server2 scheduler]# kubectl label nodes server4 disktype=ssd
node/server4 labeled
[root@server2 scheduler]# kubectl get nodes --show-labels 
...
server4   Ready    <none>   18d   v1.18.3   disktype=ssd		标签出现

[root@server2 scheduler]# vim pod.yml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
  nodeSelector:
    disktype: ssd
[root@server2 scheduler]# kubectl apply -f pod.yml 
pod/nginx created
[root@server2 scheduler]# kubectl get pod -owide
NAME    READY   STATUS    RESTARTS   AGE   IP             NODE      NOMINATED NODE   READINESS GATES
nginx   1/1     Running   0          6s    10.244.22.49   server4   <none>           <none>

运行在了server4上。

当标签不存在时:

[root@server2 scheduler]# vim pod.yml 

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
  nodeSelector:
    disktype: sata			/这个标签再任何结点都是不存在的
[root@server2 scheduler]# kubectl delete pod nginx 
pod "nginx" deleted
[root@server2 scheduler]# kubectl apply -f pod.yml 
kpod/nginx created
[root@server2 scheduler]# kubectlg get pod -o wide
-bash: kubectlg: command not found
[root@server2 scheduler]# kubectl get pod -o wide
NAME    READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
nginx   0/1     Pending   0          12s   <none>   <none>   <none>           <none>
[root@server2 scheduler]# kubectl describe pod nginx 

Events:
  Type     Reason            Age        From               Message
  ----     ------            ----       ----               -------
  Warning  FailedScheduling  <unknown>  default-scheduler  0/3 nodes are available: 3 node(s) didn't match node selector.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/3 nodes are available: 3 node(s) didn't match node selector.

无法匹配结点标签。

这就导致再不小心删除了标签后,pod无法运行的情况。那我们如何保证即使标签不存在了,也能正常运行哪?

结点的亲和与反亲和

  • 亲和与反亲和
    • nodeSelector 提供了一种非常简单的方法来将 pod 约束到具有特定标签的节点上。亲和/反亲和功能极大地扩展了你可以表达约束的类型。
    • 你可以发现规则是“软”/“偏好”,而不是硬性要求,因此,如果调度器无法满足该要求,仍然调度该 pod
    • 你可以使用节点上的 pod 的标签来约束,而不是使用节点本身的标签,来允许哪些 pod 可以或者不可以被放置在一起。

结点亲和

  • 节点亲和
    • requiredDuringSchedulingIgnoredDuringExecution 必须满足

    • preferredDuringSchedulingIgnoredDuringExecution 倾向满足,意为不满足也没有关系

    • IgnoreDuringExecution 表示如果在Pod运行期间Node的标签发生变化,导致
      亲和性策略不能满足,则继续运行当前的Pod。

    • 参考:https://kubernetes.io/zh/docs/concepts/configuration/assign-pod-node/

结点亲和性示例:

[root@server2 scheduler]# vim pod.yml 
apiVersion: v1
kind: Pod
metadata:
  name: node-affinity
spec:
  containers:
  - name: nginx
    image: nginx
  affinity:			/亲和
    nodeAffinity:			/结点亲和
      requiredDuringSchedulingIgnoredDuringExecution:			/必须满足
        nodeSelectorTerms:		/结点选择术语
        - matchExpressions:			/匹配表达式
          - key: disktype			/键
            operator: In			/规则匹配条件,值再以下之中
            values:			/值
            - ssd

nodeaffinity还支持多种规则匹配条件的配置如:

In: label 的值在列表内
NotIn: label 的值不在列表内
Gt: label 的值大于设置的值,不支持Pod亲和性
Lt: label 的值小于设置的值,不支持pod亲和性
Exists: 设置的label 存在
DoesNotExist: 设置的 label 不存在
[root@server2 scheduler]# kubectl apply -f pod.yml 
kubepod/node-affinity created
[root@server2 scheduler]# kubectl get pod -owide
NAME            READY   STATUS    RESTARTS   AGE   IP             NODE      NOMINATED NODE   READINESS GATES
node-affinity   1/1     Running   0          5s    10.244.22.50   server4   <none>           <none>

运行嗯早server4上,因为我们的server4有这个标签,而且我们是强制的。

结点亲和性示例2:

[root@server2 scheduler]# kubectl delete -f pod.yml 
pod "node-affinity" deleted
vim po	[root@server2 scheduler]# vim pod.yml 
[root@server2 scheduler]# cat pod.yml 
apiVersion: v1
kind: Pod
metadata:
  name: node-affinity
spec:
  containers:
  - name: nginx
    image: nginx
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:			/必须满足的
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: NotIn			/值不再下面的列表中
            values:	
            - server4				/即不能调度到server4上
      preferredDuringSchedulingIgnoredDuringExecution:		/倾向满足的
      - weight: 1		/权重为1,权重大的优先
        preference:
          matchExpressions:
          - key: disktype
            operator: In
            values:
            - sata			/这个标签再结点中没有
[root@server2 scheduler]# kubectl apply -f pod.yml 
pod/node-affinity created
[root@server2 scheduler]# kubectl get pod -owide
NAME            READY   STATUS    RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
node-affinity   1/1     Running   0          7s    10.244.141.193   server3   <none>           <none>

就运行在了server3上。我们首先规定不能再server4上运行,然后倾向满足结点带有 disktype: sata 的结点,这个标签不存在,所以仍然运行到了server3的结点上。

pod亲和

  • pod 亲和性和反亲和性
    • podAffinity 主要解决POD可以和哪些POD部署在同一个拓扑域中的问题(拓扑域用主机标签实现,可以是单个主机,也可以是多个主机组成的cluster、zone等。)
    • podAntiAffinity主要解决POD不能和哪些POD部署在同一个拓扑域中的问题。它们处理的是Kubernetes集群内部POD和POD之间的关系。
    • Pod 间亲和与反亲和在与更高级别的集合(例如 ReplicaSets,StatefulSets,Deployments 等)一起使用时,它们可能更加有用。可以轻松配置一组应位于相同定义拓扑(例如,节点)中的工作负载。

pod亲和性示例:

[root@server2 scheduler]# vim pod.yml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    app: nginx			/指定标签,后面会用到
spec:
  containers:
  - name: nginx
    image: nginx
[root@server2 scheduler]# kubectl apply -f pod.yml 
kpod/nginx created
[root@server2 scheduler]# kubectl get pod -owide
NAME    READY   STATUS    RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
nginx   1/1     Running   0          5s    10.244.141.195   server3   <none>           <none>
这个pod运行在server3上

再加一个pod进去.

[root@server2 scheduler]# vim pod.yml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    app: nginx				/pod标签
spec:
  containers:
  - name: nginx
    image: nginx

---
apiVersion: v1
kind: Pod
metadata:
  name: mysql
  labels:
    app: mysql
spec:
  containers:
  - name: mysql
    image: mysql:5.7
    env:
    - name: "MYSQL_ROOT_PASSWORD"
      value: "westos"
  affinity:
    podAffinity:			/pod亲和
      requiredDuringSchedulingIgnoredDuringExecution:			/强制要求
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - nginx				和标签为 app: nginx的pod运行在一起
        topologyKey: kubernetes.io/hostname			/定义一个调度域,hostname是结点层面,也可以是集群层面。

刚在nginx这个pod运行在了server3上,由于亲和性,mysql这个pod也会运行在server3上.

[root@server2 scheduler]# kubectl apply -f pod.yml 
pod/nginx unchanged
pod/mysql created
[root@server2 scheduler]# kubectl get pod -owide
NAME    READY   STATUS    RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
mysql   1/1     Running   0          7s    10.244.141.194   server3   <none>           <none>
nginx   1/1     Running   0          11m   10.244.141.195   server3   <none>           <none>

运行再了server3上。可见mysql这个pod是跟着app: nginx 这个标签走的,那里有这个标签,pod就运行在那里。
我们添加nodeName选择,将nginx运行在server4上:

[root@server2 scheduler]# vim pod.yml 
spec:
  containers:
  - name: nginx
    image: nginx
  nodeName: server4
其它不变
[root@server2 scheduler]# kubectl apply -f pod.yml 
pod/nginx created
pod/mysql created
[root@server2 scheduler]# kubectl get pod -owide
NAME    READY   STATUS    RESTARTS   AGE   IP             NODE      NOMINATED NODE   READINESS GATES
mysql   1/1     Running   0          6s    10.244.22.53   server4   <none>           <none>
nginx   1/1     Running   0          6s    10.244.22.52   server4   <none>           <none>

mysql也跟着运行在server4上

如果我们改成NotIn哪?

[root@server2 scheduler]# kubectl delete -f pod.yml 
pod "nginx" deleted
pod "mysql" deleted
[root@server2 scheduler]# vim pod.yml
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  containers:
  - name: nginx
    image: nginx

---
apiVersion: v1
kind: Pod
metadata:
  name: mysql
  labels:
    app: mysql
spec:
  containers:
  - name: mysql
    image: mysql:5.7
    env:
    - name: "MYSQL_ROOT_PASSWORD"
      value: "westos"
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: NotIn				/这里改为notin
            values:
            - nginx		/就是步和有 app:nginx这个标签的pod运行再一起
        topologyKey: kubernetes.io/hostname
~                                             
[root@server2 scheduler]# kubectl apply -f pod.yml 
pod/nginx created
pod/mysql created
[root@server2 scheduler]# kubectl get pod -owide
NAME    READY   STATUS    RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
mysql   1/1     Running   0          7s    10.244.22.51     server4   <none>           <none>
nginx   1/1     Running   0          7s    10.244.141.197   server3   <none>           <none>

果然分开运行了。
pod反亲和性示例:

[root@server2 scheduler]# vim pod.yml 
[root@server2 scheduler]# cat pod.yml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  containers:
  - name: nginx
    image: nginx

---
apiVersion: v1
kind: Pod
metadata:
  name: mysql
  labels:
    app: mysql
spec:
  containers:
  - name: mysql
    image: mysql:5.7
    env:
    - name: "MYSQL_ROOT_PASSWORD"
      value: "westos"
  affinity:
    podAntiAffinity:			/值有这里发生了更改,把podAffinity变成了podAntiAffinity
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - nginx
        topologyKey: kubernetes.io/hostname

这样的话这两个pod就会是相互排斥的,始终不会处于同一个结点内。

[root@server2 scheduler]# kubectl apply -f pod.yml 
pod/nginx created
pod/mysql created
[root@server2 scheduler]# kubectl get pod -owide
NAME    READY   STATUS              RESTARTS   AGE   IP       NODE      NOMINATED NODE   READINESS GATES
mysql   0/1     ContainerCreating   0          4s    <none>   server4   <none>           <none>
nginx   0/1     ContainerCreating   0          4s    <none>   server3   <none>           <none>

场景:再deployment的情况下,比如有三个副本,都占用80端口的话肯定会起冲突,反亲和就解决了这个问题。

污点

  • NodeAffinity节点亲和性,是Pod上定义的一种属性,使Pod能够按我们的要求调度到某个Node上,而Taints则恰恰相反,它可以让Node 拒绝运行Pod,甚至驱逐Pod。

  • Taints(污点)是Node的一个属性,设置了Taints后,所以Kubernetes是不会将Pod调度到这个Node上的,于是Kubernetes就给Pod设置了个属性Tolerations(容忍),只要Pod能够容忍Node上的污点,那么Kubernetes就会忽略Node上的污点,就能够(不是必须)把Pod调度过去。

[root@server2 scheduler]# kubectl describe nodes server2 |grep Taints
Taints:             node-role.kubernetes.io/master:NoSchedule
[root@server2 scheduler]# kubectl describe nodes server3 |grep Taints
Taints:             <none>
[root@server2 scheduler]# kubectl describe nodes server4 |grep Taints
Taints:             <none>

我们的master结点就是存在污点的,而且没有设置容忍,所以不参加调度.而server3和4没有,就可以参加调度。

去掉master结点的taints

[root@server2 scheduler]# kubectl taint node server2 node-role.kubernetes.io/master:NoSchedule-
node/server2 untainted
[root@server2 scheduler]# kubectl describe nodes server2 |grep Taints
Taints:             <none>

然后,server2就可以参加调度了。

[root@server2 scheduler]# vim deployment.yml 
apiVersion: apps/v1
kind: Deployment			加一个deployment控制器
metadata:
  name: deployment-v1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: myapp:v1
      affinity:
        podAntiAffinity:				/pod反亲和策略
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - myapp
            topologyKey: kubernetes.io/hostname
 
[root@server2 scheduler]# kubectl apply -f deployment.yml 
kdeployment.apps/deployment-v1 created
NAME                             READY   STATUS    RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
deployment-v1-6498765b4b-2l6mc   1/1     Running   0          12s   10.244.22.56     server4   <none>           <none>
deployment-v1-6498765b4b-mt5x2   1/1     Running   0          12s   10.244.179.81    server2   <none>           <none>
deployment-v1-6498765b4b-scvzq   1/1     Running   0          12s   10.244.141.204   server3   <none>           <none>

server2 也参与调度了。由于我们设置的是强制类型的,所以再创建多个副本的时候就会处以等待状态,因为反亲和策略,不会和有myapp 标签的pod运行在一个结点。

[root@server2 scheduler]# kubectl get pod
NAME                             READY   STATUS    RESTARTS   AGE
deployment-v1-6498765b4b-2l6mc   1/1     Running   0          43m
deployment-v1-6498765b4b-lvmr4   0/1     Pending   0          1s
deployment-v1-6498765b4b-mt5x2   1/1     Running   0          43m
deployment-v1-6498765b4b-qjtq2   0/1     Pending   0          1s
deployment-v1-6498765b4b-rn5vj   0/1     Pending   0          1s
deployment-v1-6498765b4b-scvzq   1/1     Running   0          43m

给master结点加上taint

[root@server2 scheduler]# kubectl taint node server2 node-role.kubernetes.io/master:NoSchedule
node/server2 tainted

[root@server2 scheduler]# kubectl describe nodes server2 |grep Taint
Taints:             node-role.kubernetes.io/master:NoSchedule
  • 其中[effect] 可取值: [ NoSchedule | PreferNoSchedule | NoExecute ]
    • NoSchedule:POD 不会被调度到标记为 taints 节点。
    • PreferNoSchedule:NoSchedule 的软策略版本。
    • NoExecute:该选项意味着一旦 Taint 生效,如该节点内正在运行的 POD 没有对应Tolerate (容忍)设置,会直接被逐出。
[root@server2 scheduler]# kubectl apply -f deployment.yml 
deployment.apps/deployment-v1 created

[root@server2 scheduler]# kubectl get pod -owide
NAME                             READY   STATUS    RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
deployment-v1-6498765b4b-9l5s7   0/1     Pending   0          13s   <none>           <none>    <none>           <none>
deployment-v1-6498765b4b-jdjvz   1/1     Running   0          13s   10.244.22.57     server4   <none>           <none>
deployment-v1-6498765b4b-zzdlq   1/1     Running   0          13s   10.244.141.200   server3   <none>           <none>

再次执行我么发现当前有两个pod正常运行,分别再server3和4上,由于server2上有污点,并且集群中,没有多余的结点了,所以又一个pod会处于pend(等待)状态。
master结点添加容忍

  • tolerations中定义的key、value、effect,要与node上设置的taint保持一直:
    • 如果 operator 是 Exists ,value可以省略。
    • 如果 operator 是 Equal ,则key与value之间的关系必须相等。
    • 如果不指定operator属性,则默认值为Equal。
  • 还有两个特殊值:
    • 当不指定key,再配合Exists 就能匹配所有的key与value ,可以容忍所有污点。
    • 当不指定effect ,则匹配所有的effect。
[root@server2 scheduler]# vim deployment.yml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment-v1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: myapp:v1
      tolerations:
      - operator: "Exists"		/存在污点就容忍。//未指定key,配合exists使用
        effect: "NoSchedule"
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - myapp
            topologyKey: kubernetes.io/hostname
[root@server2 scheduler]# kubectl apply -f deployment.yml 
deployment.apps/deployment-v1 created
[root@server2 scheduler]# kubectl get pod -owide
NAME                            READY   STATUS    RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
deployment-v1-bb99b54bc-5ngtf   1/1     Running   0          9s    10.244.22.58     server4   <none>           <none>
deployment-v1-bb99b54bc-l4sf5   1/1     Running   0          9s    10.244.179.82    server2   <none>           <none>
deployment-v1-bb99b54bc-t7v9p   1/1     Running   0          9s    10.244.141.205   server3   <none>           <none>

添加容忍后,msater结点也可以进行调度了,容忍了污点。

我们刚才给master加了noschedual的污点效果,我们还可以加另外两种。

当前三个结点正常运行
[root@server2 scheduler]# kubectl get pod -owide
NAME                            READY   STATUS    RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
deployment-v1-bb99b54bc-d2m7c   1/1     Running   0          7s    10.244.22.2      server4   <none>           <none>
deployment-v1-bb99b54bc-jfwfp   1/1     Running   0          7s    10.244.179.84    server2   <none>           <none>
deployment-v1-bb99b54bc-ntlsr   1/1     Running   0          7s    10.244.141.210   server3   <none>           <none>

添加NoExecute(驱离)的污点,此时我们的资源清单中仍然容忍的NoSchedule的效果:
[root@server2 scheduler]# kubectl taint nodes server2 key1=v1:NoExecute
node/server2 tainted
[root@server2 scheduler]# kubectl get pod -owide
NAME                            READY   STATUS        RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
deployment-v1-bb99b54bc-2p2zz   0/1     Pending       0          1s    <none>           <none>    <none>           <none>
deployment-v1-bb99b54bc-d2m7c   1/1     Running       0          42s   10.244.22.2      server4   <none>           <none>
deployment-v1-bb99b54bc-jfwfp   1/1     Terminating   0          42s   10.244.179.84    server2   <none>           <none>
deployment-v1-bb99b54bc-ntlsr   1/1     Running       0          42s   10.244.141.210   server3   <none>           <none>
我们可以看到server2上的原来的pod已经被驱逐了,deployment控制器自动的重新拉起了一个pod处于pend状态

[root@server2 scheduler]# kubectl get pod -owide
NAME                            READY   STATUS    RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
deployment-v1-bb99b54bc-2p2zz   0/1     Pending   0          5s    <none>           <none>    <none>           <none>
deployment-v1-bb99b54bc-d2m7c   1/1     Running   0          46s   10.244.22.2      server4   <none>           <none>
deployment-v1-bb99b54bc-ntlsr   1/1     Running   0          46s   10.244.141.210   server3   <none>           <none>

我们可以:

[root@server2 scheduler]# vim deployment.yml 
      tolerations:
      - operator: "Exists"
      #  effect: "NoSchedule"		/注释掉这里,就可以匹配所有的类型

[root@server2 scheduler]# kubectl apply -f deployment.yml 
kubecdeployment.apps/deployment-v1 configured
[root@server2 scheduler]# kubectl get pod -owide
NAME                             READY   STATUS    RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
deployment-v1-68679b84bf-ddmlm   0/1     Pending   0          1s    <none>           <none>    <none>           <none>
deployment-v1-68679b84bf-qhd7s   1/1     Running   0          3s    10.244.179.85    server2   <none>           <none>
deployment-v1-bb99b54bc-d2m7c    1/1     Running   0          11m   10.244.22.2      server4   <none>           <none>
deployment-v1-bb99b54bc-ntlsr    1/1     Running   0          11m   10.244.141.210   server3   <none>           <none>

server上就运行起来了。

影响pod调度的其它因素

影响Pod调度的指令还有:cordon、drain、delete,后期创建的pod都不会被调度到该节点上,但操作的暴力程度不一样。

  • cordon 停止调度:

影响最小,只会将node调为SchedulingDisabled(停止调度),新创建pod,不会被调度到该节点,节点原有pod不受影响,仍正常对外提供服务。

[root@server2 scheduler]# kubectl delete -f deployment.yml 
deployment.apps "deployment-v1" deleted

[root@server2 scheduler]# kubectl cordon server3
node/server3 cordoned
[root@server2 scheduler]# kubectl get nodes
NAME      STATUS                     ROLES    AGE   VERSION
server2   Ready                      master   18d   v1.18.3
server3   Ready,SchedulingDisabled   <none>   18d   v1.18.3		/出现了停止调度的标签
server4   Ready                      <none>   18d   v1.18.3

我们注释掉刚才的容忍和反亲和,方便我们再开启一个pod。容忍的效果包含这个 cordon ,会对实验有影响.

[root@server2 scheduler]# vim deployment.yml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment-v1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: myapp:v1

[root@server2 scheduler]# kubectl get pod -owide
NAME                             READY   STATUS              RESTARTS   AGE   IP            NODE      NOMINATED NODE   READINESS GATES
deployment-v1-7449b5b68f-cqmkq   0/1     ContainerCreating   0          4s    <none>        server4   <none>           <none>
deployment-v1-7449b5b68f-m442w   1/1     Running             0          4s    10.244.22.3   server4   <none>           <none>
deployment-v1-7449b5b68f-rtct6   0/1     ContainerCreating   0          4s    <none>        server4   <none>           <none>
[root@server2 scheduler]# vim deployment.yml 
spec:
  replicas: 6			/副本数增加至6个
[root@server2 scheduler]# kubectl apply -f deployment.yml 
deployment.apps/deployment-v1 configured
[root@server2 scheduler]# kubectl get pod -owide
NAME                             READY   STATUS              RESTARTS   AGE   IP            NODE      NOMINATED NODE   READINESS GATES
deployment-v1-7449b5b68f-2smn9   0/1     ContainerCreating   0          2s    <none>        server4   <none>           <none>
deployment-v1-7449b5b68f-9n2w8   0/1     ContainerCreating   0          2s    <none>        server4   <none>           <none>
deployment-v1-7449b5b68f-cqmkq   1/1     Running             0          23s   10.244.22.4   server4   <none>           <none>
deployment-v1-7449b5b68f-fgvdf   0/1     ContainerCreating   0          2s    <none>        server4   <none>           <none>
deployment-v1-7449b5b68f-m442w   1/1     Running             0          23s   10.244.22.3   server4   <none>           <none>
deployment-v1-7449b5b68f-rtct6   1/1     Running             0          23s   10.244.22.5   server4   <none>           <none>

此时我们发现,由于没有容忍,而server3 处于SchedulingDisabled 的状态,所有的的pod都运行再了server4上。

  • drain 驱逐节点:
    这种方式是首先驱逐node上的pod,在其他节点重新创建,然后将节点调为SchedulingDisabled。
    不适用,因为每个结点都有用daemonset控制器控制的pod:
[root@server2 scheduler]# kubectl delete -f deployment.yml 
kubec	undeployment.apps "deployment-v1" deleted
[root@server2 scheduler]# kubectl uncordon server3			/恢复
node/server3 uncordoned
[root@server2 scheduler]# kubectl apply -f  deployment.yml 
deployment.apps/deployment-v1 created
[root@server2 scheduler]# kubectl get pod -owide
NAME                             READY   STATUS    RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
deployment-v1-7449b5b68f-7glbp   1/1     Running   0          6s    10.244.141.212   server3   <none>           <none>
deployment-v1-7449b5b68f-7pt5j   1/1     Running   0          6s    10.244.141.214   server3   <none>           <none>
deployment-v1-7449b5b68f-m292z   1/1     Running   0          6s    10.244.22.9      server4   <none>           <none>
deployment-v1-7449b5b68f-qdb9x   1/1     Running   0          6s    10.244.141.213   server3   <none>           <none>
当前有三个pod都运行在server3上.

[root@server2 scheduler]# kubectl drain server3
node/server3 cordoned
error: unable to drain node "server3", aborting command...

There are pending nodes to be drained:
 server3
error: cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/calico-node-smn7b, kube-system/kube-proxy-9nr5k	
//提示有daemonset方式的pod,让我们加 --ignore-daemonsets  参数
[root@server2 scheduler]# kubectl drain server3 --ignore-daemonsets
node/server3 already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-smn7b, kube-system/kube-proxy-9nr5k
。。。

[root@server2 scheduler]# kubectl get nodes
NAME      STATUS                     ROLES    AGE   VERSION
server2   Ready                      master   18d   v1.18.3		
server3   Ready,SchedulingDisabled   <none>   18d   v1.18.3		/处于disable状态
server4   Ready                      <none>   18d   v1.18.3
[root@server2 scheduler]# kubectl get pod -owide
NAME                             READY   STATUS    RESTARTS   AGE   IP             NODE      NOMINATED NODE   READINESS GATES
deployment-v1-7449b5b68f-68bld   1/1     Running   0          18s   10.244.22.10   server4   <none>           <none>
deployment-v1-7449b5b68f-9t9v9   1/1     Running   0          18s   10.244.22.12   server4   <none>           <none>
deployment-v1-7449b5b68f-m292z   1/1     Running   0          55s   10.244.22.9    server4   <none>           <none>
deployment-v1-7449b5b68f-tjh4x   1/1     Running   0          18s   10.244.22.13   server4   <none>           <none>

所有的pod都取运行到了server4上。
  • delete 删除节点

最暴力的一个,首先用drain 驱逐node上的pod,在其他节点重新创建,然后再用delete从master节点删除该node,master失去对其控制,如要恢复调度,需进入node节点,重启kubelet服务

[root@server2 scheduler]# kubectl delete nodes server3
node "server3" deleted
[root@server2 scheduler]# kubectl get nodes
NAME      STATUS   ROLES    AGE   VERSION
server2   Ready    master   18d   v1.18.3
server4   Ready    <none>   18d   v1.18.3

//server3上重启服务
[root@server3 ~]# systemctl restart kubelet.service 

[root@server2 scheduler]# kubectl get nodes
NAME      STATUS     ROLES    AGE   VERSION
server2   Ready      master   18d   v1.18.3
server3   NotReady   <none>   0s    v1.18.3			/没准备好,再启动一些pod
server4   Ready      <none>   18d   v1.18.3

[root@server2 scheduler]# kubectl get nodes
NAME      STATUS   ROLES    AGE   VERSION
server2   Ready    master   18d   v1.18.3
server3   Ready    <none>   48s   v1.18.3		/好了
server4   Ready    <none>   18d   v1.18.3

如果我们像把这个结点彻底从集群中分离出来,我们就可以再结点上执行 kubeadm reset清除所有信息。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章