kubernetes node 節點啓動報錯故障排查
報錯場景:
kubernetes 集羣安裝部署期間,部署node節點kubelet服務時,執行 systemctl start kubelet ,tailf /var/log/messages 看到大量證書驗證報錯;
報錯內容:
May 5 22:23:40 kubnode-01 kubelet: I0505 22:23:40.583305 5336 feature_gate.go:206] feature gates: &{map[]}
May 5 22:23:40 kubnode-01 kubelet: I0505 22:23:40.589637 5336 mount_linux.go:180] Detected OS with systemd
May 5 22:23:40 kubnode-01 kubelet: I0505 22:23:40.589680 5336 server.go:407] Version: v1.13.4
May 5 22:23:40 kubnode-01 kubelet: I0505 22:23:40.589732 5336 feature_gate.go:206] feature gates: &{map[]}
May 5 22:23:40 kubnode-01 kubelet: I0505 22:23:40.589825 5336 feature_gate.go:206] feature gates: &{map[]}
May 5 22:23:40 kubnode-01 kubelet: I0505 22:23:40.589899 5336 plugins.go:103] No cloud provider specified.
May 5 22:23:40 kubnode-01 kubelet: I0505 22:23:40.589916 5336 server.go:523] No cloud provider specified: "" from the config file: ""
May 5 22:23:40 kubnode-01 kubelet: I0505 22:23:40.589938 5336 bootstrap.go:65] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file
May 5 22:23:40 kubnode-01 kubelet: I0505 22:23:40.593022 5336 bootstrap.go:96] No valid private key and/or certificate found, reusing existing private key or creating a new one
May 5 22:23:40 kubnode-01 kubelet: I0505 22:23:40.612493 5336 bootstrap.go:239] Failed to connect to apiserver: Get https://172.20.101.157:6443/healthz?timeout=1s: x509: certificate signed by unknown authority
May 5 22:23:42 kubnode-01 kubelet: I0505 22:23:42.909358 5336 bootstrap.go:239] Failed to connect to apiserver: Get https://172.20.101.157:6443/healthz?timeout=1s: x509: certificate signed by unknown authority
May 5 22:23:45 kubnode-01 kubelet: I0505 22:23:45.036663 5336 bootstrap.go:239] Failed to connect to apiserver: Get https://172.20.101.157:6443/healthz?timeout=1s: x509: certificate signed by unknown authority
解決辦法如下:
在master節點創建kubelet-bootstrap用戶
[root@k8s-node01 ~]#
kubectl create clusterrolebinding kubelet-bootstrap --clusterrole=system:node-bootstrapper --user=kubelet-bootstrap
clusterrolebinding "kubelet-bootstrap" created
node節點執行啓動服務
[root@k8s-node01 ~]# systemctl start kubelet
node 節點kubelet啓動後,會向master申請csr證書,需要在master上同意證書申請
master節點執行命令,查看csr狀態是Pending
[root@kubm-01 ~]# kubectl get csr
NAME AGE REQUESTOR CONDITION
node-csr-mgZK4Cqvb7kZA7tDqVmszNQYLq27Yydia5LCqKJnnEI 4m11s kubelet-bootstrap Pending
master節點執行命令批准證書
[root@kubm-01 ~]#
kubectl certificate approve node-csr-mgZK4Cqvb7kZA7tDqVmszNQYLq27Yydia5LCqKJnnEI
master節點執行命令接受證書申請,同意後查看狀態變成 Approved,Issued
[root@kubm-01 ~]# kubectl get csr
NAME AGE REQUESTOR CONDITION
node-csr-mgZK4Cqvb7kZA7tDqVmszNQYLq27Yydia5LCqKJnnEI 5m39s kubelet-bootstrap Approved,Issued
node節點驗證
在node節點ssl目錄可以看到,多了4個kubelet的證書文件
[root@kubnode-02 kubernetes]# ls /kubernetes/ssl/kubelet*
/kubernetes/ssl/kubelet-client-2019-05-05-22-15-53.pem /kubernetes/ssl/kubelet-client-current.pem /kubernetes/ssl/kubelet.crt /kubernetes/ssl/kubelet.key
刪除csr證書 (按需執行)
[root@kubm-01 ~]# kubectl delete csr node-csr-mgZK4Cqvb7kZA7tDqVmszNQYLq27Yydia5LCqKJnnEI
certificatesigningrequest.certificates.k8s.io "node-csr-mgZK4Cqvb7kZA7tDqVmszNQYLq27Yydia5LCqKJnnEI" deleted
驗證刪除:
kubectl get csr
返回爲空
排查過程有點坑。。。。。。。