這篇博文記錄的是修改 k8s 集羣 master(control plane) 的主機名與節點名稱的操作步驟,是 用 master 服務器鏡像恢復出新集羣 的後續博文,目標是將 master 主機名與節點名稱由 k8s-master0
修改爲 kube-master0
。
服務器操作系統是 Ubuntu 18.04,Kubernetes 版本是 1.20.2。
第1次修改嘗試
修改 master 服務器 hostname
hostnamectl set-hostname kube-master0
替換 /etc/kubernetes/manifests 中配置文件的主機名
oldhost=k8s-master0
newhost=kube-master0
cd /etc/kubernetes/manifests
find . -type f | xargs grep $oldhost
find . -type f | xargs sed -i "s/$oldhost/$newhost/"
find . -type f | xargs grep $newhost
替換 kubeadm-config 中的主機名
kubectl edit cm kubeadm-config -n kube-system
:%s/k8s-master0/kube-master0
重啓相關服務是配置修改生效
systemctl daemon-reload && systemctl restart kubelet && systemctl restart docker
進入 etcd 容器確認 member 名稱是否已更新
docker exec -it $(docker ps -f name=etcd_etcd -q) /bin/sh
etcdctl --endpoints 127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
896d19d1d0a08f49, started, kube-master0, https://10.0.9.171:2380, https://10.0.9.171:2379, false
查看 node name 是否已經改過來
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master0 NotReady control-plane,master 372d v1.20.2
很遺憾,沒改過來。
第2次修改嘗試
通過 kubectl edit node k8s-master0
查看節點配置有3個地方還在使用 k8s-master0
- metadata -> labels:
kubernetes.io/hostname: kube-master0
(可以直接修改) - metadata:
name: k8s-master0
(無法修改,報錯"error: At least one of apiVersion, kind and name was changed") - status -> addresses:(修改後再次打開又恢復爲原值)
- address: k8s-master0
type: Hostname
修改 node 配置文件的方法未成功。
第3次修改嘗試
嘗試通過 etcdctl 直接修改 etcd 數據庫中包含 k8s-master0 的配置數據
設置 etcdctl 的環境變量
export ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt
export ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt
export ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key
export ETCDCTL_ENDPOINTS=10.0.9.171:2379
導出所有配置
etcdctl get "" --prefix -w json > etcd-kv.json
基於 etcd-kv.json 導出所有包含 k8s-master0 的配置
for k in $(cat etcd-kv.json | jq '.kvs[].key' | cut -d '"' -f2); do echo $k | base64 --decode; echo; done | grep k8s-master0 > kv_k8s-master0.txt
導出結果如下
/registry/crd.projectcalico.org/blockaffinities/k8s-master0-192-168-70-128-26
/registry/crd.projectcalico.org/ipamhandles/ipip-tunnel-addr-k8s-master0
/registry/csinodes/k8s-master0
/registry/events/default/k8s-master0.165a969b97e7c4ea
...
/registry/events/kube-system/etcd-k8s-master0.165a984e78509ebd
...
/registry/events/kube-system/kube-apiserver-k8s-master0.165a96905a9bf40c
...
/registry/events/kube-system/kube-controller-manager-k8s-master0.165a7016cd8a6ca9
...
/registry/events/kube-system/kube-scheduler-k8s-master0.165a7016cead2a32
...
/registry/leases/kube-node-lease/k8s-master0
/registry/minions/k8s-master0
/registry/pods/kube-system/etcd-k8s-master0
/registry/pods/kube-system/kube-apiserver-k8s-master0
/registry/pods/kube-system/kube-controller-manager-k8s-master0
/registry/pods/kube-system/kube-scheduler-k8s-master0
通過下面的命令添加 /registry/minions/k8s-master0
key=/registry/minions/k8s-master0
etcdctl get $key --print-value-only > kv-temp.txt
sed -i "s/k8s-master0/kube-master0/" kv-temp.txt
cat kv-temp.txt | etcdctl put `echo $key | sed "s/k8s-master0/kube-master0/"`
添加之後運行 kubectl get nodes 報錯
Error from server: proto: Unknown: illegal tag 0 (wire type 0)
給 etcdctl 加了 -w fields 參數後消除了上面的報錯,但通過 etcdctl 修改的嘗試也失敗了,詳見博問 https://q.cnblogs.com/q/133164/
第4次修改嘗試
導出 k8s-master0 的 node 配置文件
kubectl get node k8s-master0 -o yaml > kube-master0.yml
將配置文件中的 k8s-master0 替換爲 kube-master0
sed -i "s/k8s-master0/kube-master0/" kube-master0.yml
將宿主機 hostname 修改爲 kube-master0
hostnamectl set-hostname kube-master0
通過 etcdctl 從 etcd 中刪除 /registry/minions/k8s-master0
etcdctl del /registry/minions/k8s-master0
用之前導出並修改的配置文件部署 kube-master0 node
kubectl apply -f kube-master0.yml
這樣一番操作後,kubectl get nodes 列表中出現了 kube-master0,但處於 NotReady 狀態
NAME STATUS ROLES AGE VERSION
kube-master0 NotReady control-plane,master 97m v1.20.2
syslog 中的錯誤日誌之一
Jan 20 18:20:27 kube-master0 kubelet[23220]: E0120 18:20:27.460470 23220 controller.go:144] failed to ensure lease exists, will retry in 7s, error: leases.coordination.k8s.io "kube-master0" is forbidden: User "system:node:k8s-master0" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-node-lease": can only access node lease with the same name as the requesting node
從日誌中的 User system:node:k8s-master0"
獲知 node 的用戶名還沒改過來,查看 /etc/kubernetes/kubelet.conf
users:
- name: default-auth
user:
client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem
client-key: /var/lib/kubelet/pki/kubelet-client-current.pem
用戶信息是來自 /var/lib/kubelet/pki/
中的證書文件 kubelet-client-current.pem,用 openssl 命令查看證書綁定的 common name (CN)
$ openssl x509 -noout -subject -in kubelet-client-current.pem
subject=O = system:nodes, CN = system:node:k8s-master0
原來證書還是改名之前的,需要針對新主機名爲節點的 kubelet 重新生成證書。
經過一番折騰後,用下面的 kubeadm 命令輕鬆搞定:
kubeadm init phase kubeconfig kubelet
運行上面的命令重新生成證書後,/etc/kubernetes/kubelet.conf 中 users 部分變成下面的內容:
users:
- name: system:node:kube-master0
user:
client-certificate-data:
***...
client-key-data:
***...
重啓 kubelet
systemctl restart kubelet
終於大功告成!
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
kube-master0 Ready control-plane,master 18h v1.20.2