CentOS 7 yum安裝 k8s 創建Pod一直處於ContainerCreating狀態 問題解決
閱讀目錄
問題描述
本文內容
問題分析與解決
原因分析與其他方案
參考
問題描述
使用CentOS7的 yum 包管理器安裝了 Kubernetes 集羣,使用 kubectl 創建服務成功後,執行 kubectl get pods,發現AGE雖然在不斷增加,但狀態始終不變
本文內容
分析問題原因
給出直接解決此問題的方式 (不完美)
給出其他方案
且聽我娓娓道來~
問題分析與解決
kubectl 提供了 describe 子命令來輸出指定的一個/多個資源的詳細信息。
執行 kubectl describe pod mytomcat-9lcq5,查看問題 Pod 的狀態信息,輸出如下:
[root@kube-master app]# kubectl describe pod mytomcat-9lcq5
Name: mytomcat-9lcq5
Namespace: default
Node: kube-node-2/192.168.87.145
Start Time: Fri, 17 Apr 2020 15:53:50 +0800
Labels: app=mytomcat
Status: Pending
IP:
Controllers: ReplicationController/mytomcat
Containers:
mytomcat:
Container ID:
Image: tomcat:9-jre8-alpine
Image ID:
Port: 8080/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Volume Mounts: <none>
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
No volumes.
QoS Class: BestEffort
Tolerations:
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
5m 5m 1 {default-scheduler } Normal Scheduled Successfully assigned mytomcat-9lcq5 to kube-node-2
4m 4m 1 {kubelet kube-node-2} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "image pull failed for registry.access.redhat.com/rhel7/pod-infrastructure:latest, this may be because there are no credentials on this request. details: (Get https://registry.access.redhat.com/v1/_ping: net/http: TLS handshake timeout)"
3m 3m 1 {kubelet kube-node-2} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "image pull failed for registry.access.redhat.com/rhel7/pod-infrastructure:latest, this may be because there are no credentials on this request. details: (Network timed out while trying to connect to https://registry.access.redhat.com/v1/repositories/rhel7/pod-infrastructure/images. You may want to check your internet connection or if you are behind a proxy.)"
2m 2m 1 {kubelet kube-node-2} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "image pull failed for registry.access.redhat.com/rhel7/pod-infrastructure:latest, this may be because there are no credentials on this request. details: (Error: image rhel7/pod-infrastructure:latest not found)"
3m 1m 3 {kubelet kube-node-2} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ImagePullBackOff: "Back-off pulling image "registry.access.redhat.com/rhel7/pod-infrastructure:latest""
通過查看最下方的輸出信息,Successfully assigned mytomcat-9lcq5 to kube-node-2 說明這個 Pod 分配到 kube-node-2 這個主機上了,然後在這個主機上創建 Pod 失敗,
原因是 image pull failed for registry.access.redhat.com/rhel7/pod-infrastructure:latest, this may be because there are no credentials on this request.
通過以上信息,我們瞭解到通過紅帽自家的 docker 倉庫 pull 鏡像,需要使用 CA 證書進行認證,才能 pull 成功
docker的證書在 /etc/docker/certs.d 目錄下,根據上邊的錯誤提示域名是 registry.access.redhat.com,證書在這個目錄中
經過 ll 命令查看,發現 /etc/docker/certs.d/registry.access.redhat.com/redhat-ca.crt 是一個軟鏈接(軟鏈接是什麼?),指向到 /etc/rhsm/ca/redhat-uep.pem,
熟悉軟連接的我們知道,處於紅色閃爍狀態的目標是不存在,需要生成 /etc/rhsm/ca/redhat-uep.pem 證書文件
生成證書:
openssl s_client -showcerts -servername registry.access.redhat.com -connect registry.access.redhat.com:443 /null 2>/dev/null | openssl x509 -text > /etc/rhsm/ca/redhat-uep.pem
生成證書命令執行有時會出現 unable to load certificate 139930742028176:error:0906D06C:PEM routines:PEM_read_bio:no start line:pem_lib.c:707:Expecting: TRUSTED CERTIFICATE 問題,重新執行就好
命令執行完畢後,查看軟鏈接指向的證書文件:
[root@kube-node-2 registry.access.redhat.com]# ll /etc/rhsm/ca/redhat-uep.pem
-rw-r--r-- 1 root root 9233 Apr 17 16:55 /etc/rhsm/ca/redhat-uep.pem
證書文件已經存在,我們去 k8s 管理節點 kube-master 主機刪除剛纔的 Pods,等待 Pod 重新創建成功 (第二個節點因爲網絡問題沒有拉成功鏡像……)
至此完成 Pod 的創建
但是還有存在些問題的,當前國內網絡環境訪問外邊的網絡偶爾會有問題,導致創建 Pod 失敗,通過 describe 描述還是同樣的信息提示,但是查看證書文件卻存在且有內容
原因分析與其他方案
k8s 管理節點分配創建 Pod 到執行節點,到達執行節點後,拉取紅帽 docker 倉庫的 Pod基礎鏡像 pod-infrastructure:latest,由於其倉庫使用 https 需要驗證證書,證書不存在導致失敗
另外就是因爲拉取的鏡像是紅帽 docker 倉庫中的,在國內網絡環境下握手失敗,無法下載鏡像
所以問題就成了 如何解決 k8s pod-infrastructure 鏡像拉取失敗,這裏給出一個方案,步驟如下:
拉取 docker 官方倉庫其他人上傳的 pod-infrastructure 鏡像,docker pull tianyebj/pod-infrastructure
添加tag標籤,改爲私有倉庫地址,如:docker tag tianyebj/pod-infrastructure 10.2.7.70:5000/dev/pod-infrastructure
push鏡像到私有倉庫,如:docker push 10.2.7.70:5000/dev/pod-infrastructure
修改所有 worker 節點的 /etc/kubernetes/kubelet,修改 registry.access.redhat.com/rhel7/pod-infrastructure 爲剛纔設置的 tag 標籤
sed -i "s#registry.access.redhat.com/rhel7/pod-infrastructure#<私有倉庫pod-infrastructure鏡像tag>#" /etc/kubernetes/kubelet
重啓所有 worker 節點的 kubelet,systemctl restart kubelet,即可
注意事項:
上傳的鏡像要設爲公開鏡像,否則 kubelet 自己沒權限拉鏡像的,另外也可以去 ssh 登錄 worker 節點登錄倉庫,執行docker pull <私有倉庫pod-infrastructure鏡像tag>
最後的效果:
參考
https://github.com/CentOS/sig-atomic-buildscripts/issues/329
https://cloud.tencent.com/developer/article/1156329
原文地址https://www.cnblogs.com/hellxz/p/k8s-pod-always-container-creating-status-problem.html