Kubernetes 1.20.4 跨版本升級折騰記

Kubernetes 1.20.4已經發布,一個集羣從1.20.2順利升級,但另外一個集羣以前版本有點老,進行跨版本升級時出現問題,後來全部重做,新安裝時也出現問題,無法kubeadm init和kubeadm join,後來終於搞好了,一些過程記錄如下:

證書問題

出現下面的情況:

(base) supermap@podc01:/etc$ sudo kubeadm join 10.1.1.202:6443 --token 4q3hdy.y7xjfjh0u1vqdx7k     --discovery-token-ca-cert-hash sha256:7eff3c734585308e0934c4af34a67edff0a98c5a3d9e99c24f1c5cdd09d3f519     --control-plane
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
error execution phase preflight: 
One or more conditions for hosting a new control plane instance is not satisfied.

failure loading certificate for CA: couldn't load the certificate file /etc/kubernetes/pki/ca.crt: open /etc/kubernetes/pki/ca.crt: no such file or directory

Please ensure that:
* The cluster has a stable controlPlaneEndpoint address.
* The certificates that must be shared among control plane instances are provided.


To see the stack trace of this error execute with --v=5 or higher

後來發現是kubeadm init時少寫了--upload-certs參數。加上後就不出現上面的錯誤信息了。

有人用這個方法解決,我沒試過,感覺不行:

194  scp -rp /etc/kubernetes/pki/ca.* master02:/etc/kubernetes/pki
195  scp -rp /etc/kubernetes/pki/sa.* master02:/etc/kubernetes/pki
196  scp -rp /etc/kubernetes/pki/front-proxy-ca.* master02:/etc/kubernetes/pki
197  scp -rp /etc/kubernetes/pki/etcd/ca.* master02:/etc/kubernetes/pki/etcd
198  scp -rp /etc/kubernetes/admin.conf master02:/etc/kubernetes

master節點設置

把master node設爲可以安裝其它負載。如下:

kubectl taint nodes --all node-role.kubernetes.io/master-

CoreDNS問題

CoreDNS 出現問題,pod啓動失敗,如下:

supermap@podc02:~$ kubectl get pod -n kube-system
NAME                             READY   STATUS              RESTARTS   AGE
coredns-74ff55c5b-dtwdz          0/1     ContainerCreating   0          32m
coredns-74ff55c5b-jns5b          0/1     ContainerCreating   0          32m
etcd-podc02                      1/1     Running             0          32m
kube-apiserver-podc02            1/1     Running             0          32m
kube-controller-manager-podc02   1/1     Running             0          32m
kube-proxy-45jxl                 1/1     Running             0          32m
kube-scheduler-podc02            1/1     Running             0          32m

⚠️這個後來發現是網絡驅動問題,重新安裝flannel就可以了。

flannel安裝

flannel的項目已經移到了flannel-io,原來的地址和raw.githubxxxx都訪問不了,需要到新地址下載。

wget https://github.com/flannel-io/flannel/releases/download/v0.13.0/flannel-v0.13.0-linux-amd64.tar.gz

上面這個是可以的,也許換個網絡又不行了。

訪問github時不通,總是出現下面的錯誤。

fatal: 無法訪問 'https://github.com/openthings/kubernetes-tools.git/':gnutls_handshake() failed: Error in the pull function.

後來莫名其妙的又好了。

有人說要安裝這些軟件,裝了沒用。

supermap@pods01:~/openthings$ sudo apt-get -y install build-essential nghttp2 libnghttp2-dev libssl-dev

⚠️更多的方法見後面。

systemd兼容性

用的docker 19.03,好久沒有升級了。但是Ubuntu和systemd都在升級。

總是出現kubeadm init失敗,把 /etc/docker/daemon.json 的systemd註釋掉就成功了。

kubernetes不是推薦cgroupfs用systemd的麼?也不知道是咋回事,下次把docker升級一下,再試試。

sudo kubeadm join 10.1.1.201:6443 --token k4l26p.d99xrvu2higwz9ow     --discovery-token-ca-cert-hash sha256:eda3e649672134c93d11bdb741672b3add5073eb3f4da021274dc51f9278d5f1      --control-plane --certificate-key 0a3656c05b225b35724851d08a52ab5ba8c0b70ea64fd4beeb5d727225b63ce4

如果token過期了,可以用下面的命令重新生成:

sudo kubeadm init phase upload-certs --upload-certs

CNI問題

出現下面的CNI錯誤信息:

3月 18 17:57:27 podc01 kubelet[312941]: E0318 17:57:27.777448  312941 kubelet.go:2184] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network pl>
3月 18 17:57:32 podc01 kubelet[312941]: W0318 17:57:32.184598  312941 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.d

用料哼多方法都搞不定,從別的機器上拷貝10-flannel.conflist過來:

sudo scp [email protected]:~/10-flannel.conflist /etc/cni/net.d/ 

文件中就這些內容:

{
  "name": "cbr0",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "flannel",
      "delegate": {
        "hairpinMode": true,
        "isDefaultGateway": true
      }
    },
    {
      "type": "portmap",
      "capabilities": {
        "portMappings": true
      }
    }
  ]
}

GnuTLS錯誤

網上搜到的方法,還沒有試過:

Got reason of the problem, it was gnutls package. It's working weird behind a proxy. But openssl is working fine even in weak network. So workaround is that we should compile git with openssl. To do this, run the following commands:

sudo apt-get update
sudo apt-get install build-essential fakeroot dpkg-dev
sudo apt-get build-dep git
mkdir ~/git-openssl
cd ~/git-openssl
apt-get source git
dpkg-source -x git_1.7.9.5-1.dsc
cd git-1.7.9.5

(Remember to replace 1.7.9.5 with the actual version of git in your system.)

Then, edit debian/control file (run the command: gksu gedit debian/control) and replace all instances of libcurl4-gnutls-dev with libcurl4-openssl-dev.

Then build the package (if it's failing on test, you can remove the line TEST=test from the file debian/rules):

sudo apt-get install libcurl4-openssl-dev
sudo dpkg-buildpackage -rfakeroot -b

Install new package:

i386: sudo dpkg -i ../git_1.7.9.5-1_i386.deb

x86_64: sudo dpkg -i ../git_1.7.9.5-1_amd64.deb

Github訪問故障

在系統中找到 hosts 文件:

Window:C:\Windows\System32\drivers\etc\hosts 或r Linux:/etc/hosts

放入以下兩個 IP 地址:

# GitHub Start 
140.82.114.4 github.com
199.232.69.194 github.global.ssl.fastly.net
# GitHub End

存盤退出。

在 CMD 命令行中執行 ipconfig/flushdns,之後就能進入Github 網址。

訪問這個地址 https://github.com.ipaddress.com/www.github.com 能夠查到github的ip地址信息。

集羣配置情況

最後還是把集羣給恢復起來了:

(base) supermap@podc01:~$ kubectl get node -owide
NAME     STATUS   ROLES                  AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
podc01   Ready    control-plane,master   16h     v1.20.4   10.1.1.201    <none>        Ubuntu 20.10         5.8.0-45-generic   docker://20.10.5
podc02   Ready    control-plane,master   16h     v1.20.4   10.1.1.202    <none>        Ubuntu 20.04.2 LTS   5.4.0-67-generic   docker://19.3.8
podc04   Ready    control-plane,master   16h     v1.20.4   10.1.1.204    <none>        Ubuntu 20.04.2 LTS   5.4.0-67-generic   docker://19.3.8
pods01   Ready    control-plane,master   16h     v1.20.4   10.1.1.193    <none>        Ubuntu 20.04.2 LTS   5.4.0-67-generic   docker://19.3.8
pods02   Ready    control-plane,master   131m    v1.20.4   10.1.1.234    <none>        Ubuntu 20.04.2 LTS   5.4.0-67-generic   docker://19.3.8
pods03   Ready    control-plane,master   68m     v1.20.4   10.1.1.205    <none>        Ubuntu 20.04.2 LTS   5.4.0-67-generic   docker://19.3.8
pods04   Ready    control-plane,master   50m     v1.20.4   10.1.1.206    <none>        Ubuntu 20.04.2 LTS   5.4.0-67-generic   docker://19.3.8
pods05   Ready    control-plane,master   36m     v1.20.4   10.1.1.34     <none>        Ubuntu 20.04.2 LTS   5.4.0-66-generic   docker://19.3.8
pods06   Ready    control-plane,master   6m22s   v1.20.4   10.1.1.167    <none>        Ubuntu 20.04.2 LTS   5.4.0-66-generic   docker://19.3.8

三個節點出現其他異常:

  • 其中一個節點重啓多次後能夠正常更新;
  • 另一個節點podc03重啓多次都不行估計掛了;
  • 還有一個節點出現文件系統只讀,無法更新,後來修復了。
    • 啓動時進菜單,選擇修復。
    • 運行fsck,然後重啓。

安裝DashBoard

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章