Kubernetes管理經驗

集羣管理相關命令

kubectl get cs

# 查看節點
kubectl get nodes

kubectl get ing pdd --n java
# 不調度
kubectl taint nodes node1 key=value:NoSchedule
kubectl cluster-info dump


kubectl get svc --sort-by=.metadata.creationTimestamp
kubectl get no --sort-by=.metadata.creationTimestamp
kubectl get po --field-selector spec.nodeName=xxxx
kubectl get events  --field-selector involvedObject.kind=Service --sort-by='.metadata.creationTimestamp'

參考鏈接:

kubernetes 節點維護 cordon, drain, uncordon

應用管理相關

kubectl top pod
kubectl delete deployment,services -l app=nginx 
kubectl scale deployment/nginx-deployment --replicas=2
kubectl get svc --all-namespaces=true

強制刪除

有時刪除pv/pvc時會有問題,這個使用得加2個命令參數--grace-period=0 --force

刪除所有失敗的pod

  kubectl get po --all-namespaces --field-selector 'status.phase==Failed'
  kubectl delete po  --field-selector 'status.phase==Failed'

一些技巧

k8s目前沒有沒有類似docker-compose的depends_on依賴啓動機制,建議使用wait-for-it重寫鏡像的command.

集羣管理經(教)驗(訓)

節點問題

taint別亂用

kubectl taint nodes xx  elasticsearch-test-ready=true:NoSchedule
kubectl taint nodes xx  elasticsearch-test-ready:NoSchedule-

master節點本身就自帶taint,所以纔會導致我們發佈的容器不會在master節點上面跑.但是如果自定義taint的話就要注意了!所有DaemonSet和kube-system,都需要帶上相應的tolerations.不然該節點會驅逐所有不帶這個tolerations的容器,甚至包括網絡插件,kube-proxy,後果相當嚴重,請注意

taint跟tolerations是結對對應存在的,操作符也不能亂用

NoExecute

      tolerations:
        - key: "elasticsearch-exclusive"
          operator: "Equal"
          value: "true"
          effect: "NoExecute"

kubectl taint node cn-shenzhen.xxxx elasticsearch-exclusive=true:NoExecute

NoExecute是立刻驅逐不滿足容忍條件的pod,該操作非常兇險,請務必先行確認系統組件有對應配置tolerations.

特別注意用Exists這個操作符是無效的,必須用Equal

NoSchedule

      tolerations:
        - key: "elasticsearch-exclusive"
          operator: "Exists"
          effect: "NoSchedule"
        - key: "elasticsearch-exclusive"
          operator: "Equal"
          value: "true"
          effect: "NoExecute"

kubectl taint node cn-shenzhen.xxxx elasticsearch-exclusive=true:NoSchedule

是儘量不往這上面調度,但實際上還是會有pod在那上面跑

Exists和Exists隨意使用,不是很影響

值得一提的是,同一個key可以同時存在多個effect

Taints:             elasticsearch-exclusive=true:NoExecute
                    elasticsearch-exclusive=true:NoSchedule

其他參考鏈接：

隔離節點的正確步驟

# 驅逐除了ds以外所有的pod
kubectl drain <node name>   --ignore-daemonsets
kubectl cordon <node name>

這個時候運行get node命令,狀態會變

node.xx   Ready,SchedulingDisabled   <none>   189d   v1.11.5

最後

kubectl delete <node name>

維護節點的正確步驟

kubectl drain <node name> --ignore-daemonsets
kubectl uncordon <node name>

節點出現磁盤壓力(DiskPressure)

--eviction-hard=imagefs.available<15%,memory.available<300Mi,nodefs.available<10%,nodefs.inodesFree<5%

kubelet在啓動時指定了磁盤壓力,以阿里云爲例,imagefs.available<15%意思是說容器的讀寫層少於15%的時候,節點會被驅逐.節點被驅逐的後果就是產生DiskPressure這種狀況,並且節點上再也不能運行任何鏡像,直至磁盤問題得到解決.如果節點上容器使用了宿主目錄,這個問題將會是致命的.因爲你不能把目錄刪除掉,但是真是這些宿主機的目錄堆積,導致了節點被驅逐.

所以,平時要養好良好習慣,容器裏面別瞎寫東西(容器裏面寫文件會佔用ephemeral-storage,ephemeral-storage過多pod會被驅逐),多使用無狀態型容器,謹慎選擇存儲方式,儘量別用hostpath這種存儲

出現狀況時,真的有種欲哭無淚的感覺.

Events:
  Type     Reason                 Age                   From                                            Message
  ----     ------                 ----                  ----                                            -------
  Warning  FreeDiskSpaceFailed    23m                   kubelet, node.xxxx1     failed to garbage collect required amount of images. Wanted to free 5182058496 bytes, but freed 0 bytes
  Warning  FreeDiskSpaceFailed    18m                   kubelet, node.xxxx1     failed to garbage collect required amount of images. Wanted to free 6089891840 bytes, but freed 0 bytes
  Warning  ImageGCFailed          18m                   kubelet, node.xxxx1     failed to garbage collect required amount of images. Wanted to free 6089891840 bytes, but freed 0 bytes
  Warning  FreeDiskSpaceFailed    13m                   kubelet, node.xxxx1     failed to garbage collect required amount of images. Wanted to free 4953321472 bytes, but freed 0 bytes
  Warning  ImageGCFailed          13m                   kubelet, node.xxxx1     failed to garbage collect required amount of images. Wanted to free 4953321472 bytes, but freed 0 bytes
  Normal   NodeHasNoDiskPressure  10m (x5 over 47d)     kubelet, node.xxxx1     Node node.xxxx1 status is now: NodeHasNoDiskPressure
  Normal   Starting               10m                   kube-proxy, node.xxxx1  Starting kube-proxy.
  Normal   NodeHasDiskPressure    10m (x4 over 42m)     kubelet, node.xxxx1     Node node.xxxx1 status is now: NodeHasDiskPressure
  Warning  EvictionThresholdMet   8m29s (x19 over 42m)  kubelet, node.xxxx1     Attempting to reclaim ephemeral-storage
  Warning  ImageGCFailed          3m4s                  kubelet, node.xxxx1     failed to garbage collect required amount of images. Wanted to free 4920913920 bytes, but freed 0 bytes

參考鏈接:

節點CPU彪高

有可能是節點在進行GC(container GC/image GC),用describe node查查.我有次遇到這種狀況,最後節點上的容器少了很多,也是有點鬱悶

Events:
  Type     Reason                 Age                 From                                         Message
  ----     ------                 ----                ----
  Warning  ImageGCFailed          45m                 kubelet, cn-shenzhen.xxxx  failed to get image stats: rpc error: code = DeadlineExceeded desc = context deadline exceeded

參考:

kubelet 源碼分析：Garbage Collect

對象問題

pod

pod頻繁重啓

原因有多種,不可一概而論

資源達到limit設置值

調高limit或者檢查應用

Readiness/Liveness connection refused

Readiness檢查失敗的也會重啓,但是Readiness檢查失敗不一定是應用的問題,如果節點本身負載過重,也是會出現connection refused或者timeout

這個問題要上節點排查

pod被驅逐(Evicted)

節點加了污點導致pod被驅逐
ephemeral-storage超過限制被驅逐
1. EmptyDir 的使用量超過了他的 SizeLimit，那麼這個 pod 將會被驅逐
2. Container 的使用量（log，如果沒有 overlay 分區，則包括 imagefs）超過了他的 limit，則這個 pod 會被驅逐
3. Pod 對本地臨時存儲總的使用量（所有 emptydir 和 container）超過了 pod 中所有container 的 limit 之和，則 pod 被驅逐

ephemeral-storage是一個pod用的臨時存儲.

resources:
       requests: 
           ephemeral-storage: "2Gi"
       limits:
           ephemeral-storage: "3Gi"

節點被驅逐後通過get po還是能看到,用describe命令,可以看到被驅逐的歷史原因

Message: The node was low on resource: ephemeral-storage. Container codis-proxy was using 10619440Ki, which exceeds its request of 0.

參考:

kubectl exec 進入容器失敗

這種問題我在搭建codis-server的時候遇到過,當時沒有配置就緒以及健康檢查.但獲取pod描述的時候,顯示running.其實這個時候容器以及不正常了.

~ kex codis-server-3 sh
rpc error: code = 2 desc = containerd: container not found
command terminated with exit code 126

解決辦法:刪了這個pod,配置livenessProbe

pod的virtual host name

Deployment衍生的pod,virtual host name就是pod name.

StatefulSet衍生的pod,virtual host name是<pod name>.<svc name>.<namespace>.svc.cluster.local.相比Deployment顯得更有規律一些.而且支持其他pod訪問

pod接連Crashbackoff

Crashbackoff有多種原因.

沙箱創建(FailedCreateSandBox)失敗,多半是cni網絡插件的問題

鏡像拉取,有中國特色社會主義的問題,可能太大了,拉取較慢

也有一種可能是容器併發過高,流量雪崩導致.

比如,現在有3個容器abc,a突然遇到流量洪峯導致內部奔潰,繼而Crashbackoff,那麼a就會被service剔除出去,剩下的bc也承載不了那麼多流量,接連崩潰,最終網站不可訪問.這種情況,多見於高併發網站+低效率web容器.

在不改變代碼的情況下,最優解是增加副本數,並且加上hpa,實現動態伸縮容.

deploy

MinimumReplicationUnavailable

如果deploy配置了SecurityContext,但是api-server拒絕了,就會出現這個情況,在api-server的容器裏面,去掉SecurityContextDeny這個啓動參數.

具體見Using Admission Controllers

service

建了一個服務,但是沒有對應的po,會出現什麼情況?

請求時一直不會有響應,直到request timeout

參考

Configure Out Of Resource Handling

service connection refuse

原因可能有

pod沒有設置readinessProbe,請求到未就緒的pod
kube-proxy宕機了(kube-proxy負責轉發請求)
網絡過載

service沒有負載均衡

檢查一下是否用了headless service.headless service是不會自動負載均衡的...

kind: Service
spec:
# clusterIP: None的即爲`headless service`
  type: ClusterIP
  clusterIP: None

具體表現service沒有自己的虛擬IP,nslookup會出現所有pod的ip.但是ping的時候只會出現第一個pod的ip

/ # nslookup consul
nslookup: can't resolve '(null)': Name does not resolve

Name:      consul
Address 1: 172.31.10.94 172-31-10-94.consul.default.svc.cluster.local
Address 2: 172.31.10.95 172-31-10-95.consul.default.svc.cluster.local
Address 3: 172.31.11.176 172-31-11-176.consul.default.svc.cluster.local

/ # ping consul
PING consul (172.31.10.94): 56 data bytes
64 bytes from 172.31.10.94: seq=0 ttl=62 time=0.973 ms
64 bytes from 172.31.10.94: seq=1 ttl=62 time=0.170 ms
^C
--- consul ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.170/0.571/0.973 ms

/ # ping consul
PING consul (172.31.10.94): 56 data bytes
64 bytes from 172.31.10.94: seq=0 ttl=62 time=0.206 ms
64 bytes from 172.31.10.94: seq=1 ttl=62 time=0.178 ms
^C
--- consul ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.178/0.192/0.206 ms

普通的type: ClusterIP service,nslookup會出現該服務自己的IP

/ # nslookup consul
nslookup: can't resolve '(null)': Name does not resolve

Name:      consul
Address 1: 172.30.15.52 consul.default.svc.cluster.local

ReplicationController不更新

ReplicationController不是用apply去更新的,而是kubectl rolling-update,但是這個指令也廢除了,取而代之的是kubectl rollout.所以應該使用kubectl rollout作爲更新手段,或者懶一點,apply file之後,delete po.

儘量使用deploy吧.

StatefulSet更新失敗

StatefulSet是逐一更新的,觀察一下是否有Crashbackoff的容器,有可能是這個容器導致更新卡住了,刪掉即可.

進階調度

使用親和度確保節點在目標節點上運行

        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: elasticsearch-test-ready
                operator: Exists

參考鏈接:

使用反親和度確保每個節點只跑同一個應用

      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: 'app'
                operator: In
                values:
                - nginx-test2
            topologyKey: "kubernetes.io/hostname"
            namespaces: 
            - test

容忍運行

master節點之所以不允許普通鏡像,是因爲master節點帶了污點,如果需要強制在master上面運行鏡像,則需要容忍相應的污點.

      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/master
          operator: Exists
        - effect: NoSchedule
          key: node.cloudprovider.kubernetes.io/uninitialized
          operator: Exists

阿里雲Kubernetes問題

修改默認ingress

新建一個指向ingress的負載均衡型svc,然後修改一下kube-system下nginx-ingress-controller啓動參數.

        - args:
            - /nginx-ingress-controller
            - '--configmap=$(POD_NAMESPACE)/nginx-configuration'
            - '--tcp-services-configmap=$(POD_NAMESPACE)/tcp-services'
            - '--udp-services-configmap=$(POD_NAMESPACE)/udp-services'
            - '--annotations-prefix=nginx.ingress.kubernetes.io'
            - '--publish-service=$(POD_NAMESPACE)/<自定義svc>'
            - '--v=2'

LoadBalancer服務一直沒有IP

具體表現是EXTERNAL-IP一直顯示pending.

~ kg svc consul-web
NAME         TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
consul-web   LoadBalancer   172.30.13.122   <pending>     443:32082/TCP   5m

這問題跟Alibaba Cloud Provider這個組件有關,cloud-controller-manager有3個組件,他們需要內部選主,可能哪裏出錯了,當時我把其中一個出問題的pod刪了,就好了.

清理Statefulset動態PVC

目前阿里雲Statefulset動態PVC用的是nas。

對於這種存儲，需要先把容器副本將爲0，或者整個Statefulset刪除。
刪除PVC
把nas掛載到任意一臺服務器上面，然後刪除pvc對應nas的目錄。

升級到v1.12.6-aliyun.1之後節點可分配內存變少

該版本每個節點保留了1Gi,相當於整個集羣少了N GB(N爲節點數)供Pod分配.

如果節點是4G的,Pod請求3G,極其容易被驅逐.

建議提高節點規格.

Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.6-aliyun.1", GitCommit:"8cb561c", GitTreeState:"", BuildDate:"2019-04-22T11:34:20Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

新加節點出現NetworkUnavailable

RouteController failed to create a route

看一下kubernetes events,是否出現了

timed out waiting for the condition -> WaitCreate: ceate route for table vtb-wz9cpnsbt11hlelpoq2zh error, Aliyun API Error: RequestId: 7006BF4E-000B-4E12-89F2-F0149D6688E4 Status Code: 400 Code: QuotaExceeded Message: Route entry quota exceeded in this route table

出現這個問題是因爲達到了VPC的自定義路由條目限制,默認是48,需要提高vpc_quota_route_entrys_num的配額

參考(應用調度相關):

Kubernetes管理經驗

推薦工具

集羣管理相關命令

集羣管理經(教)驗(訓)

節點問題

taint別亂用

NoExecute

NoSchedule

隔離節點的正確步驟

維護節點的正確步驟

節點出現磁盤壓力(DiskPressure)

節點CPU彪高

對象問題

pod

pod頻繁重啓

資源達到limit設置值

Readiness/Liveness connection refused

pod被驅逐(Evicted)

kubectl exec 進入容器失敗

pod的virtual host name

pod接連Crashbackoff

deploy

MinimumReplicationUnavailable

service

建了一個服務,但是沒有對應的po,會出現什麼情況?

service connection refuse

service沒有負載均衡

ReplicationController不更新

StatefulSet更新失敗

進階調度

使用親和度確保節點在目標節點上運行

使用反親和度確保每個節點只跑同一個應用

容忍運行

阿里雲Kubernetes問題

修改默認ingress

LoadBalancer服務一直沒有IP

清理Statefulset動態PVC

升級到v1.12.6-aliyun.1之後節點可分配內存變少

新加節點出現NetworkUnavailable