前言
斷點續傳模式~
記錄
我用的是ubuntu16.04,首先要做的是配置apt源,這裏推薦阿里雲的源地址 https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg,centos的在這兒https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64,用梯子的或者海外的用谷歌源是最方便的,後面可以省很多事。
安裝docker和kubeadm、kubectl、kubelet基本沒啥問題,照着來就行,安裝完工具就需要下載鏡像了。
由於鏡像在谷歌倉庫(k8s.grc.io),config image pull會報以下錯誤:
[preflight/images] You can also perform this action in beforehand using 'kubeadm config images pull'
[preflight] Some fatal errors occurred:
[ERROR ImagePull]: failed to pull image [k8s.gcr.io/kube-apiserver-amd64:v1.11.3]: exit status
這個很明顯是牆的緣故,手動從docker的國內託管站點下載:
docker pull mirrorgooglecontainers/kube-apiserver-amd64:v1.11.3
docker pull mirrorgooglecontainers/kube-controller-manager-amd64:v1.11.3
docker pull mirrorgooglecontainers/kube-scheduler-amd64:v1.11.3
docker pull mirrorgooglecontainers/kube-proxy-amd64:v1.11.3
docker pull mirrorgooglecontainers/pause:3.1
docker pull mirrorgooglecontainers/etcd-amd64:3.2.18
相關的鏡像就上面幾個,但是下下來之後仍然無法kubeadm init,依然會去谷歌倉庫拉鏡像,卡了很久才發現從docker倉庫拉下來的鏡像和谷歌倉庫中的不一樣,因此在kubeadm init時會認爲這些鏡像不存在,接着去谷歌倉庫拉,docker images可以查看這些鏡像現在的名字,所以需要執行如下命令:
docker tag docker.io/mirrorgooglecontainers/kube-proxy-amd64:v1.11.3 k8s.gcr.io/kube-proxy-amd64:v1.11.3
docker tag docker.io/mirrorgooglecontainers/kube-scheduler-amd64:v1.11.3 k8s.gcr.io/kube-scheduler-amd64:v1.11.3
docker tag docker.io/mirrorgooglecontainers/kube-apiserver-amd64:v1.11.3 k8s.gcr.io/kube-apiserver-amd64:v1.11.3
docker tag docker.io/mirrorgooglecontainers/kube-controller-manager-amd64:v1.11.3 k8s.gcr.io/kube-controller-manager-amd64:v1.11.3
docker tag docker.io/mirrorgooglecontainers/etcd-amd64:3.2.18 k8s.gcr.io/etcd-amd64:3.2.18
docker tag docker.io/mirrorgooglecontainers/pause:3.1 k8s.gcr.io/pause:3.1
docker tag docker.io/coredns/coredns:1.1.3 k8s.gcr.io/coredns:1.1.3
接着出現了CPU數量少於2(VM)和swap分區的問題,調整了CPU的數量,並禁用了swap分區,這兩個問題解決,搜了一下爲啥要禁止swap分區,大概就是不使用虛擬內存以提高性能,將實例緊密包裝到儘可能接近百分之百的意思吧,接着繼續執行kubeadm init,報錯如下:
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
開始以爲是kubelet的版本和apiserver版本不兼容,於是又重新裝了一遍工具,發現端口被佔用,這是因爲之前master啓動時佔了6443端口,kubeadm reset後解決,但是中間又很二的執行了一個改ip的操作-_-!,依然報上面的錯誤,在日誌裏看到重複報以下錯誤:
1006 02:44:41.050125 19805 reflector.go:123] k8s.io/client-go/informers/factor
1006 02:44:41.129531 19805 kubelet.go:2267] node "ubuntu" not found
1006 02:44:41.230347 19805 kubelet.go:2267] node "ubuntu" not found
1006 02:44:41.331174 19805 kubelet.go:2267] node "ubuntu" not found
1006 02:44:41.431984 19805 kubelet.go:2267] node "ubuntu" not found
1006 02:44:41.532748 19805 kubelet.go:2267] node "ubuntu" not found
1006 02:44:41.596687 19805 controller.go:135] failed to ensure node lease exist
1006 02:44:41.633573 19805 kubelet.go:2267] node "ubuntu" not found
1006 02:44:41.734381 19805 kubelet.go:2267] node "ubuntu" not found
1006 02:44:41.745881 19805 kubelet_node_status.go:94] Unable to register node with apiserver ~~
1006 02:44:41.835220 19805 kubelet.go:2267] node "ubuntu" not found
1006 02:44:41.936016 19805 kubelet.go:2267] node "ubuntu" not found
因爲改地址導致舊的ip無法匹配,最簡單的解決方案依然是kubeadm reset,推薦修改conf參數的方式來解決,將conf文件中的舊地址修改爲新地址,重啓kubelet服務,問題解決。
因爲整個安裝操作都是在root下進行,所以還要複製下配置文件到home目錄,然後改個權限,做完這些master節點安裝kubernetes就成功了,這裏要把安裝成功後的token記下來,用於之後node的添加:
已達總線帶寬上限,後面的下次續傳。