KubeSphere排錯實戰

概述:近期在使用QingCloud的Kubesphere,極好的用戶體驗,私有化部署,無基礎設施依賴,無 Kubernetes 依賴,支持跨物理機、虛擬機、雲平臺部署,可以納管不同版本、不同廠商的 Kubernetes 集羣。在k8s上層進行了封裝實現了基於角色的權限控制,DevOPS流水線快速實現CI/CD,內置harbor/gitlab/jenkins/sonarqube等常用工具,基於基於 OpenPitrix 提供應用的全生命週期管理,包含開發、測試、發佈、升級,下架等應用相關操作自己體驗還是非常的棒。
同樣作爲開源項目,難免存在一些bug,在自己的使用中遇到下排錯思路,非常感謝qingcloud社區提供的技術協助,對k8s有興趣的可以去體驗下國產的平臺,如絲般順滑的體驗,rancher的用戶也可以來對不體驗下。

一 清理退出狀態的容器

在集羣運行一段時間後,有些container由於異常狀態退出Exited,需要去及時清理釋放磁盤,可以將其設置成定時任務執行

docker rm `docker ps -a |grep Exited |awk '{print $1}'`

二 清理異常或被驅逐的pod

  • 清理kubesphere-devops-system的ns下清理
kubectl delete pods -n kubesphere-devops-system $(kubectl get pods -n kubesphere-devops-system |grep Evicted|awk '{print $1}')
kubectl delete pods -n kubesphere-devops-system $(kubectl get pods -n kubesphere-devops-system |grep CrashLoopBackOff|awk '{print $1}')
  • 爲方便清理指定ns清理evicted/crashloopbackoff的pod/清理exited的容器
#!/bin/bash
# auth:kaliarch

clear_evicted_pod() {
  ns=$1
  kubectl delete pods -n ${ns} $(kubectl get pods -n ${ns} |grep Evicted|awk '{print $1}')
}
clear_crash_pod() {
  ns=$1
  kubectl delete pods -n ${ns} $(kubectl get pods -n ${ns} |grep CrashLoopBackOff|awk '{print $1}')
}
clear_exited_container() {
  docker rm `docker ps -a |grep Exited |awk '{print $1}'`
}


echo "1.clear exicted pod"
echo "2.clear crash pod"
echo "3.clear exited container"
read -p "Please input num:" num


case ${num} in 
"1")
  read -p "Please input oper namespace:" ns
  clear_evicted_pod ${ns}
  ;;


"2")
  read -p "Please input oper namespace:" ns
  clear_crash_pod ${ns}
  ;;
"3")
  clear_exited_container
  ;;
"*")
  echo "input error"
  ;;
esac
  • 清理全部ns中evicted/crashloopbackoff的pod
# 獲取所有ns
kubectl get ns|grep -v "NAME"|awk '{print $1}'

# 清理驅逐狀態的pod
for ns in `kubectl get ns|grep -v "NAME"|awk '{print $1}'`;do kubectl delete pods -n ${ns} $(kubectl get pods -n ${ns} |grep Evicted|awk '{print $1}');done
# 清理異常pod
for ns in `kubectl get ns|grep -v "NAME"|awk '{print $1}'`;do kubectl delete pods -n ${ns} $(kubectl get pods -n ${ns} |grep CrashLoopBackOff|awk '{print $1}');done

三 將docker數據遷移

在安裝過程中未指定docker數據目錄,系統盤50G,隨着時間推移磁盤不夠用,需要遷移docker數據,使用軟連接方式:
首選掛載新磁盤到/data目錄

systemctl stop docker

mkdir -p /data/docker/  

rsync -avz /var/lib/docker/ /data/docker/  

mv /var/lib/docker /data/docker_bak

ln -s /data/docker /var/lib/

systemctl daemon-reload

systemctl start docker

四 kubesphere網絡排錯

  • 問題描述:

在kubesphere的node節點或master節點,手動去啓動容器,在容器裏面無法連通公網,是我的配置哪裏不對麼,之前默認使用calico,現在改成fluannel也不行,在kubesphere中部署deployment中的pod的容器上可以出公網,在node或master單獨手動啓動的訪問不了公網

查看手動啓動的容器網絡上走的docker0


root@fd1b8101475d:/# ip a

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

    inet 127.0.0.1/8 scope host lo

       valid_lft forever preferred_lft forever

2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1

    link/ipip 0.0.0.0 brd 0.0.0.0

105: eth0@if106: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 

    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0

    inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0

       valid_lft forever preferred_lft forever

在pods中的容器網絡用的是kube-ipvs0


1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

    inet 127.0.0.1/8 scope host lo

       valid_lft forever preferred_lft forever

2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop qlen 1

    link/ipip 0.0.0.0 brd 0.0.0.0

4: eth0@if18: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue

    link/ether c2:27:44:13:df:5d brd ff:ff:ff:ff:ff:ff

    inet 10.233.97.175/32 scope global eth0

       valid_lft forever preferred_lft forever
  • 解決方案:

查看docker啓動配置
圖片描述

修改文件/etc/systemd/system/docker.service.d/docker-options.conf中去掉參數:–iptables=false 這個參數等於false時會不寫iptables

[Service]
Environment="DOCKER_OPTS=  --registry-mirror=https://registry.docker-cn.com --data-root=/var/lib/docker --log-opt max-size=10m --log-opt max-file=3 --insecure-registry=harbor.devops.kubesphere.local:30280"

五 kubesphere 應用路由異常

在kubesphere中應用路由ingress使用的是nginx,在web界面配置會導致兩個host使用同一個ca證書,可以通過註釋文件配置

⚠️注意:ingress控制deployment在:

圖片描述

kind: Ingress
apiVersion: extensions/v1beta1
metadata:
  name: prod-app-ingress
  namespace: prod-net-route
  resourceVersion: '8631859'
  labels:
    app: prod-app-ingress
  annotations:
    desc: 生產環境應用路由
    nginx.ingress.kubernetes.io/client-body-buffer-size: 1024m
    nginx.ingress.kubernetes.io/proxy-body-size: 2048m
    nginx.ingress.kubernetes.io/proxy-read-timeout: '3600'
    nginx.ingress.kubernetes.io/proxy-send-timeout: '1800'
    nginx.ingress.kubernetes.io/service-upstream: 'true'
spec:
  tls:
    - hosts:
        - smartms.tools.anchnet.com
      secretName: smartms-ca
    - hosts:
        - smartsds.tools.anchnet.com
      secretName: smartsds-ca
  rules:
    - host: smartms.tools.anchnet.com
      http:
        paths:
          - path: /
            backend:
              serviceName: smartms-frontend-svc
              servicePort: 80
    - host: smartsds.tools.anchnet.com
      http:
        paths:
          - path: /
            backend:
              serviceName: smartsds-frontend-svc

              servicePort: 80

六 kubesphere更新jenkins的agent

用戶在自己的使用場景當中,可能會使用不同的語言版本活不同的工具版本。這篇文檔主要介紹如何替換內置的 agent。

默認base-build鏡像中沒有sonar-scanner工具,Kubesphere Jenkins 的每一個 agent 都是一個Pod,如果要替換內置的agent,就需要替換 agent 的相應鏡像。

構建最新 kubesphere/builder-base:advanced-1.0.0 版本的 agent 鏡像

更新爲指定的自定義鏡像:ccr.ccs.tencentyun.com/testns/base:v1

參考鏈接:https://kubesphere.io/docs/advanced-v2.0/zh-CN/devops/devops-admin-faq/#%E5%8D%87%E7%BA%A7-jenkins-agent-%E7%9A%84%E5%8C%85%E7%89%88%E6%9C%AC

圖片描述
圖片描述

在 KubeSphere 修改 jenkins-casc-config 以後,您需要在 Jenkins Dashboard 系統管理下的 configuration-as-code 頁面重新加載您更新過的系統配置。

參考:

https://kubesphere.io/docs/advanced-v2.0/zh-CN/devops/jenkins-setting/#%E7%99%BB%E9%99%86-jenkins-%E9%87%8D%E6%96%B0%E5%8A%A0%E8%BD%BD

圖片描述
jenkins中更新base鏡像

圖片描述
⚠️先修改kubesphere中jenkins的配置,jenkins-casc-config

七 Devops中Mail發送

參考:https://www.cloudbees.com/blog/mail-step-jenkins-workflow

內置變量:

變量名 解釋
BUILD_NUMBER The current build number, such as “153”
BUILD_ID The current build ID, identical to BUILD_NUMBER for builds created in 1.597+, but a YYYY-MM-DD_hh-mm-ss timestamp for older builds
BUILD_DISPLAY_NAME The display name of the current build, which is something like “#153” by default.
JOB_NAME Name of the project of this build, such as “foo” or “foo/bar”. (To strip off folder paths from a Bourne shell script, try: ${JOB_NAME##*/})
BUILD_TAG String of “jenkins-JOBNAME{JOB_NAME}-{BUILD_NUMBER}”. Convenient to put into a resource file, a jar file, etc for easier identification.
EXECUTOR_NUMBER The unique number that identifies the current executor (among executors of the same machine) that’s carrying out this build. This is the number you see in the “build executor status”, except that the number starts from 0, not 1.
NODE_NAME Name of the slave if the build is on a slave, or “master” if run on master
NODE_LABELS Whitespace-separated list of labels that the node is assigned.
WORKSPACE The absolute path of the directory assigned to the build as a workspace.
JENKINS_HOME The absolute path of the directory assigned on the master node for Jenkins to store data.
JENKINS_URL Full URL of Jenkins, like http://server:port/jenkins/ (note: only available if Jenkins URL set in system configuration)
BUILD_URL Full URL of this build, like http://server:port/jenkins/job/foo/15/ (Jenkins URL must be set)
SVN_REVISION Subversion revision number that’s currently checked out to the workspace, such as “12345”
SVN_URL Subversion URL that’s currently checked out to the workspace.
JOB_URL Full URL of this job, like http://server:port/jenkins/job/foo/ (Jenkins URL must be set)

最終自己寫了適應自己業務的模版,可以直接使用

mail to: '[email protected]',
          charset:'UTF-8', // or GBK/GB18030
          mimeType:'text/plain', // or text/html
          subject: "Kubesphere ${env.JOB_NAME} [${env.BUILD_NUMBER}] 發佈正常Running Pipeline: ${currentBuild.fullDisplayName}",
          body: """
          ---------Anchnet Devops Kubesphere Pipeline job--------------------


          項目名稱 : ${env.JOB_NAME}
          構建次數 : ${env.BUILD_NUMBER}
          掃描信息 : 地址:${SONAR_HOST}
          鏡像地址 : ${REGISTRY}/${QHUB_NAMESPACE}/${APP_NAME}:${IMAGE_TAG}
          構建詳情:SUCCESSFUL: Job ${env.JOB_NAME} [${env.BUILD_NUMBER}]
          構建狀態 : ${env.JOB_NAME} jenkins 發佈運行正常
          構建URL : ${env.BUILD_URL}"""

圖片描述

圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章