Centos7 部署Kubenetes文檔
部署說明:一個master + 一個node
操作系統信息
[root@zf zhangfeng]# uname -a
Linux zf.master 3.10.0-1062.1.1.el7.x86_64 #1 SMP Fri Sep 13 22:55:44 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@zf zhangfeng]# cat /proc/version
Linux version 3.10.0-1062.1.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Fri Sep 13 22:55:44 UTC 2019
[root@zf zhangfeng]# cat /etc/redhat-release
CentOS Linux release 7.7.1908 (Core)
一、系統配置
1、設置主機名
1、master節點
hostnamectl set-hostname zf.master
2、node節點
hostnamectl set-hostname zf.node1
2、必須設置域名解析(master、node都要設置)
master和node都要執行以下命令
cat <<EOF >>/etc/hosts
192.168.1.4 zf.master
192.168.1.143 zf.node1
EOF
3、關閉防火牆 、selinux和swap
systemctl disable firewalld --now
setenforce 0
sed -i "s/^SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config
swapoff -a
echo "vm.swappiness = 0">> /etc/sysctl.conf
sed -i 's/.*swap.*/#&/' /etc/fstab
sysctl -p
4、配置內核參數,將橋接的IPv4流量傳遞到iptables的鏈
cat > /etc/sysctl.d/k8s.conf << EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sysctl --system
5、配置軟件源
1、配置yum源base repo爲阿里雲的yum源
cd /etc/yum.repos.d
mv CentOS-Base.repo CentOS-Base.repo.bak
mv epel.repo epel.repo.bak
curl https://mirrors.aliyun.com/repo/Centos-7.repo -o CentOS-Base.repo
sed -i 's/gpgcheck=1/gpgcheck=0/g' /etc/yum.repos.d/CentOS-Base.repo
curl https://mirrors.aliyun.com/repo/epel-7.repo -o epel.repo
2、配置kubernetes源爲阿里的yum源
cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64 Kubernetes源設爲阿里
gpgcheck=0:表示對從這個源下載的rpm包不進行校驗
repo_gpgcheck=0:某些安全性配置文件會在 /etc/yum.conf 內全面啓用 repo_gpgcheck,以便能檢驗軟件庫的中繼數據的加密簽署
如果gpgcheck設爲1,會進行校驗,就會報錯如下,所以這裏設爲0
3、update cache 更新緩存
yum clean all && yum makecache && yum repolist
二、安裝docker
master和node都要安裝docker,安裝成功後需要設置開機自己啓動。
systemctl enable docker && systemctl start docker
三、安裝kubeadm、kubelet和kubectl
1、安裝kubelet kubeadm kubectl(master node都要執行)
yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
Kubelet負責與其他節點集羣通信,並進行本節點Pod和容器生命週期的管理。
Kubeadm是Kubernetes的自動化部署工具,降低了部署難度,提高效率。
Kubectl是Kubernetes集羣管理工具。
最後啓動kubelet:
systemctl enable kubelet --now
查看k8s版本
kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-18T23:27:49Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
2、kubernetes 鏡像 (master node都要執行)
開始初始化集羣之前可以使用kubeadm config images list查看一下初始化需要哪些鏡像
kubeadm config images list
以下是所需要的鏡像
k8s.gcr.io/kube-apiserver:v1.17.2
k8s.gcr.io/kube-controller-manager:v1.17.2
k8s.gcr.io/kube-scheduler:v1.17.2
k8s.gcr.io/kube-proxy:v1.17.2
k8s.gcr.io/pause:3.1
k8s.gcr.io/etcd:3.4.3-0
k8s.gcr.io/coredns:1.6.5
可以先通過kubeadm config images pull手動在各個節點上拉取所k8s需要的docker鏡像,master節點初始化或者node節點加入集羣時,會用到這些鏡像
如果不先執行kubeadm config images pull拉取鏡像,其實在master節點執行kubeadm init 或者node節點執行 kubeadm join命令時,也會先拉取鏡像。
本人沒有提前拉取鏡像,都是在master節點kubeadm init 和node節點 kubeadm join時,自動拉的鏡像
kubeadm config images pull
3、 初始化kubeadm
1、執行初始化
與初始化相對應的命令是kubeadm reset
kubeadm init --kubernetes-version=1.17.2 \
--apiserver-advertise-address=192.168.1.4 \
--image-repository=registry.aliyuncs.com/google_containers \
--service-cidr=10.1.0.0/16 \
--pod-network-cidr=10.244.0.0/16
–kubernetes-version: 用於指定k8s版本;
–apiserver-advertise-address:用於指定kube-apiserver監聽的ip地址,就是 master本機IP地址。
–pod-network-cidr:用於指定Pod的網絡範圍; 10.244.0.0/16
–service-cidr:用於指定SVC的網絡範圍;
–image-repository: 指定阿里雲鏡像倉庫地址
這一步很關鍵,由於kubeadm 默認從官網k8s.grc.io下載所需鏡像,國內無法訪問,因此需要通過–image-repository指定阿里雲鏡像倉庫地址
初始化時間比較長一些,大概1分鐘左右,當出現以下結果,標識初始化完成(初始化完成,不代表集羣運行正常)
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.43.88:6443 --token m16ado.6ne248sk47nln0jj \
--discovery-token-ca-cert-hash sha256:09cda974fb18e716219bf08ef9d7a4eaa76bfe59ec91d0930b4ccfbd111276de
3、按提示執行命令
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
4、將pod網絡(flannel)部署到集羣
下載kube-flannel.yml文件
curl -O https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
執行改文件需要下載 quay.io/coreos/flannel:v0.11.0-arm64鏡像。
在所有機器上拉去flannel所需要的鏡像,獲取在master上導出鏡像後,在其他節點上導入。
# 手動拉取flannel的docker鏡像
docker pull easzlab/flannel:v0.11.0-amd64
# 修改鏡像名稱
docker tag easzlab/flannel:v0.11.0-amd64 quay.io/coreos/flannel:v0.11.0-amd64
安裝 flannel network add-on
kubectl apply -f ./kube-flannel.yml
執行結果
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds-amd64 created
daemonset.apps/kube-flannel-ds-arm64 created
daemonset.apps/kube-flannel-ds-arm created
daemonset.apps/kube-flannel-ds-ppc64le created
daemonset.apps/kube-flannel-ds-s390x created
刪除
kubectl delete -f ./kube-flannel.yml
5、查看node狀態、pod狀態
一般node狀態爲NotReady狀態,kubectl get pod -n kube-system看一下pod狀態,
一般可以發現問題,flannel鏡像下載失敗ImagePullBackOff,coredns狀態一直是pending,
這時查看一下docker image ls裏面有沒有quay.io/coreos/flannel:v0.11.0-amd64鏡像,如果沒有,嘗試
docker pull quay.io/coreos/flannel:v0.11.0-amd64
必須把flannel鏡像拉取下,kube-flannel-ds-amd64的狀態才正常running
本人虛擬機裏面死活沒辦法拉取quay.io/coreos/flannel:v0.11.0-amd64鏡像,只好從其他地方拉取後save導出,再load到虛擬機的docker裏面
注意:master節點和node節點,都必須有這個鏡像quay.io/coreos/flannel:v0.11.0-amd64
4、部署Node節點加入集羣
1、首先確保node節點是否存在flannel的docker鏡像:quay.io/coreos/flannel:v0.11.0-amd64
2、執行kubeadm join命令加入集羣
kubeadm join 192.168.43.88:6443 --token ep9bne.6at6gds2o05dgutd \
--discovery-token-ca-cert-hash sha256:b2f75a6e5a49e66e467392d7d237548664ba8a28aafe98bdb18a7dd63ecc4aa8
到master節點查看node狀態,都顯示ready
kubectl get nodes
NAME STATUS ROLES AGE VERSION
zf.master Ready master 33h v1.17.2
zf.node1 Ready <none> 70m v1.17.2
節點加入集羣時可能會遇到token過期
錯信息
kubeadm 報錯 error execution phase preflight: couldn’t validate the identity of the API Server: abort connecting to API servers after timeout of 5m0s
kubeadm token create --print-join-command //默認有效期24小時,若想久一些可以結合--ttl參數,設爲0則用不過期
kubeadm join k8smaster.com:6443 --token pdas2m.fkgn8q7mz5u96jm6 --discovery-token-ca-cert-hash sha256:6fd9b1bf2d593d2d4f550cd9f1f596865f117fef462db42860228311c2712b8b
5、節點管理
1、移除Node節點
在node節點執行
kubeadm reset
撤銷kubeadm join,再手動rm掉提示的配置文件夾
2、在master節點執行(kubectl get node可以查看節點名)
kubectl delete node 節點名稱
kubectl delete node zf.node1
四、驗證kubenetes
kubectl create deployment nginx --image=nginx
deployment.apps/nginx created
kubectl expose deployment nginx --port=80 --type=NodePort
service/nginx exposed
kubectl get pods,svc
NAME READY STATUS RESTARTS AGE
pod/nginx-86c57db685-ljzhp 0/1 ContainerCreating 0 15s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.1.0.1 <none> 443/TCP 91m
service/nginx NodePort 10.1.136.233 <none> 80:32387/TCP 9s
五、部署Dashboard
六、配置 單機版 k8s
默認 Master Node不參與工作負載,所以 要配置讓Master
工作,請安如下2步操作.
查看
kubectl describe node zf.master | grep Taints
或
kubectl describe node -A | grep Taints
結果
Taints: node-role.kubernetes.io/master:NoSchedule
去掉 taint , 這樣 單機 k8s 就能部署各種東東了
kubectl taint nodes --all node-role.kubernetes.io/master-
或
kubectl taint nodes zf.master node-role.kubernetes.io/master-
查看
kubectl describe node zf.master | grep Taints
或
kubectl describe node -A | grep Taints
結果
Taints: <none>
七、部署應用測試kubenetes
1、 kubenetes自帶pod
kubectl get pod --namespace=kube-system
或
kubectl get pod -n kube-system
或
kubectl get pod -A
結果
kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-9d85f5447-54x8h 1/1 Running 0 33h
coredns-9d85f5447-jhnc7 1/1 Running 0 33h
etcd-zf.master 1/1 Running 0 33h
kube-apiserver-zf.master 1/1 Running 0 33h
kube-controller-manager-zf.master 1/1 Running 1 33h
kube-flannel-ds-amd64-4cfjl 1/1 Running 0 33h
kube-flannel-ds-amd64-wwk7n 1/1 Running 0 96m
kube-proxy-cxfqx 1/1 Running 0 96m
kube-proxy-nmg7h 1/1 Running 0 33h
kube-scheduler-zf.master 1/1 Running 1 33h
2、pod錯誤日誌查看
這裏假設 :kube-flannel-ds-amd64-wwk7n STATUS 狀態 Pending,那麼用如下查看
kubectl describe pod kube-flannel-ds-amd64-wwk7n --namespace=kube-system
或
kubectl describe pod kube-flannel-ds-amd64-wwk7n -n kube-system
就會輸出 錯誤日誌信息
3、非系統pod
當使用非系統pod時,不需要-n或–namespace選項。
節點標籤、集羣namespace;
常見問題
kubeadm join報錯及解決
1、報錯:
kubeadm join —
[WARNING IsDockerSystemdCheck]: detected “cgroupfs” as the Docker cgroup driver. The recommended driver is “systemd”. Please follow the guide at https://kubernetes.io/docs/setup/cri/
原因k8s默認的cgroup-driver爲cgroupfs,但是yum安裝kubelet的時候自動修改爲systemd,而docker通過docker info命令查看是cgroupfs,解決方法有兩種。
方法一:
將k8s的修改爲cgroupfs
#vim /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs"
#systemctl enable docker
#systemctl enable kubelet
#kubeadm join --token c04f89.b781cdb55d83c1ef 10.10.3.4:63 --discovery-token-ca-cert-hash sha256:986e83a9cb948368ad0552b95232e31d3b76e2476b595bd1d905d5242ace29af --ignore-preflight-errors=Swap
方法二:
修改docker的cgroup driver爲systemd
mkdir /etc/docker
Setup daemon.
cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
]
}
EOF
mkdir -p /etc/systemd/system/docker.service.d
Restart Docker
systemctl daemon-reload
systemctl restart docker
錯誤
Failed create pod sandbox: rpc error: code = Unknown desc = failed pulling image “k8s.gcr.io/pause:3.1”: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
原因:新增節點需要下載pause:3.1鏡像,默認鏡像源gcr.io被GFW牆了