k8s【coredns解析問題分析解決】

問題

[root@master busybox]# kubectl get pod -nkube-system -owide
NAME                             READY   STATUS    RESTARTS   AGE     IP               NODE     NOMINATED NODE   READINESS GATES
coredns-5c98db65d4-8zjps         1/1     Running   1          2d23h   10.244.0.13      master   <none>           <none>
coredns-5c98db65d4-d2kth         1/1     Running   1          2d23h   10.244.0.14      master   <none>           <none>
[root@master busybox]# kubectl get pod -owide
NAME                      READY   STATUS    RESTARTS   AGE    IP            NODE     NOMINATED NODE   READINESS GATES
curl-6bf6db5c4f-pjld9     1/1     Running   1          3d     10.244.1.2    node2    <none>           <none>
gateway-99b655cc6-np685   1/1     Running   0          44s    10.244.0.54   master   <none>           <none>
test-post-start1          1/1     Running   0          115s   10.244.1.6    node2    <none>           <none>
test-post-start2          1/1     Running   0          115s   10.244.0.52   master   <none>           <none>
test-post-start3          1/1     Running   0          115s   10.244.0.53   master   <none>           <none>

如上所示,出現部署在master節點上的pod,無法解析gateway.default.svc.cluster.local域名,但是部署在node2,確可以解析,如上curl-6bf6db5c4f-pjld9,test-post-start1通過nslookup都可以解析.

# 報錯
/ # nslookup gateway
nslookup: can't resolve '(null)': Name does not resolve

nslookup: can't resolve 'gateway': Try again

/ # nslookup gateway.default.svc.cluster.local
Server:    10.244.0.10
Address 1: 10.244.0.10

nslookup: can't resolve 'gateway.default.svc.cluster.local'

分析

進入master節點pod,直接通過coredns pod ip解析測試

kubectl exec -it test-post-start2 sh
/ # nslookup gateway.default.svc.cluster.local 10.244.0.13
Server:    10.244.0.13
Address 1: 10.244.0.13 10-244-0-13.kube-dns.kube-system.svc.cluster.local

Name:      gateway.default.svc.cluster.local
Address 1: 10.244.106.29 gateway.default.svc.cluster.local

發現直接通過coredns pod ip解析可以成功,證明coredns服務本身沒有問題.

查看dns clusterIP.

[root@master ~]# kubectl get svc -nkube-system
NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE
kube-dns        ClusterIP   10.244.0.10    <none>        53/UDP,53/TCP,9153/TCP   21m
# 通過clusterIP解析域名失敗
nslookup gateway.default.svc.cluster.local 10.244.0.10

通過以上測試證明問題出現在coredns service上.

解決

導出現有kube-dns service配置

kubectl get svc -nkube-system kube-dns -oyaml > kube-dns-svc.yaml

修改kube-dns-svc.yaml.

apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/port: "9153"
    prometheus.io/scrape: "true"
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: KubeDNS
  name: kube-dns
  namespace: kube-system
spec:
  ports:
  - name: dns
    port: 53
    protocol: UDP
    targetPort: 53
  - name: dns-tcp
    port: 53
    protocol: TCP
    targetPort: 53
  - name: metrics
    port: 9153
    protocol: TCP
    targetPort: 9153
  selector:
    k8s-app: kube-dns
  sessionAffinity: None
  type: ClusterIP
kubectl apply -f kube-dns-svc.yaml

查看最新的coredns clusterIP,當前爲10.244.47.231.

[root@master ~]# kubectl get svc -nkube-system
NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE
kube-dns        ClusterIP   10.244.47.231    <none>        53/UDP,53/TCP,9153/TCP   21m

進去之前無法解析的pod中測試,證明新的clusterIP沒有問題.

nslookup gateway.default.svc.cluster.local 10.244.47.231

修改kubelet --clusterDNS,這樣新創建的pod /etc/resolv.confnameserver爲新的coredns clusterIP.

# 修改kubelet配置
vim  /var/lib/kubelet/config.yaml

# 找到clusterDNS
clusterDNS:
- 10.244.47.231

# 重啓kubelet生效,注意k8s中所有節點都需要修改重啓
systemctl restart kubelet.service

最後測試,新的pod/etc/resolv.conf.解析沒有問題.

/ # cat /etc/resolv.conf 
nameserver 10.244.47.231
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章