定製 k8s 的 DNS 服務

Table of Contents

Introduction

Basics of k8s DNS

kube-dns customization

Setup our own name server

Prepare the Dockerfile for dnsmasq image

Prepare our dnsmasq pod, i.e. custom name server

Integrate our name server with kube-dns: create the ConfigMap

Add entries to our DNS and test it out

Issues

Pod CIDR

References


Introduction

In some cases the k8s default DNS service does not match our needs, for example an Oracle RAC database has to resolve the public/private/vip/scan IPs for its nodes. This document describes how we can customize the DNS service including integrating our own DNS server.

This documents assumes you know the basic concepts of k8s and DNS.

This documents assumes you have read this page: multiple network interfaces for a pod, because we will touch this in later section.

k8s 1.9+ supports better DNS customization, but our environment is still 1.8 compatible, and uses kube-dns instead of CoreDNS.

Basics of k8s DNS

The default DNS record of a k8s pod is: a-b-c-d.sub-domain.my-namespace.pod.cluster.local

a-b-c-d is the IP string of the pod. For example, a pod my-pod with IP address 10.244.0.2 in default namespace, will have an A record of "10-244-0-2.default.pod.cluster.local". If the pod specifies a sub domain, such as ora-subdomain, it will be "10-244-0-2.ora-subdomain.default.pod.cluster.local". There is no pod name or hostname record in the DNS server. However, if we create a headless service (services without a clusterIP) with the name exactly same as the sub-domain, we will have a record "my-pod.ora-subdomin.default.svc.cluster.local" (note the suffix "pod.cluster.local"  becomes "svc.cluster.local"). This is also where the StatefulSet's stable network identifier comes from, in that case, in addition to each pod that belongs to the StatefulSet, the service itself (i.e. "sub-domain.my-namespace.svc.cluster.local") will resolve to the IPs (a set) of the pods.

A service's DNS record is always my-svc.my-namespace.svc.cluster.local.  

kube-dns customization

kube-dns can be extended to support additional DNS name servers. In short, consider the following config map:

1

2

3

4

5

6

7

8

9

10

apiVersion: v1

kind: ConfigMap

metadata:

  name: kube-dns

  namespace: kube-system

data:

  stubDomains: |

    {"example.com": ["1.2.3.4"]}

  upstreamNameservers: |

    ["8.8.8.8", "8.8.4.4"]

The custom configurations are stubDomains and upstreamNameservers.

First, let's see what is the resolve process when these configurations does NOT exist:

  1. The name resolves according to the default rule of k8s DNS, as stated in "Basics of k8s DNS" section.
  2. If the name is not found (e.g. names with example.com domain ), it is forwarded to upstream name servers that inherited from the node.

This means the k8s node's /etc/hosts and /etc/resolv.conf functions inside the pod containers.

Now what if stubDomains and/or upstramNameservers are specified? The process is:

  1. For names with the cluster suffix, i.e. ".cluster.local", the request is sent to kube-dns.
  2. For names with the stub domain suffix, i.e. ".example.com", the request is sent to the configured custom DNS server, listening at 1.2.3.4.
  3. For names without a matching suffix, for example "github.com", the request is forwarded to the upstream DNS, in this case 8.8.8.8 and 8.8.4.4 (These are public DNS servers of Google company).

Important: If upstreamNameservers is specified, we'll lost the node's resolving. A natural solution would be copying the node's resolve to the custom name server.

Setup our own name server

So what will be our custom name server? dnsmasq is an easy-to-configure, widely used one (kube-dns itself is using it). In this section, we will setup our own dnsmasq server from scratch.

Prepare the Dockerfile for dnsmasq image

First write the Dockerfile

Dockerfile

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

FROM oraclelinux:7

 

# Install dnsmasq

RUN yum install -y dnsmasq

 

# Install netcat

RUN yum install -y nc

 

# Pre-configure dnsmasq

RUN echo 'listen-address=__LOCAL_IP__' >> /etc/dnsmasq.conf

RUN echo 'resolv-file=/etc/resolv.dnsmasq.conf' >> /etc/dnsmasq.conf

RUN echo 'conf-dir=/etc/dnsmasq.d'  >> /etc/dnsmasq.conf

RUN touch /etc/resolv.dnsmasq.conf

 

# Copied from node host /etc/resolv.conf, TODO: automate this

RUN echo 'nameserver my.internal.name.server1.ip' >> /etc/resolv.dnsmasq.conf

RUN echo 'nameserver my.internal.name.server2.ip' >> /etc/resolv.dnsmasq.conf

RUN echo 'nameserver my.internal.name.server3.ip' >> /etc/resolv.dnsmasq.conf

RUN echo 'search cn.my.com my.com mycorp.com' >> /etc/resolv.dnsmasq.conf

 

# This directory will usually be provided with the -v option.

# RUN echo 'address=/example.com/xx.xx.xx.xx' >> /etc/dnsmasq.d/0hosts

 

# On the other hand the above directory isn't reloaded with a SIGHUP. Instead

# we can use an --addn-hosts file, see run.sh.

RUN touch /etc/addn-hosts

 

ADD run.sh /root/run.sh

 

EXPOSE 22

EXPOSE 53

EXPOSE 12345

 

CMD /root/run.sh

Note that we explicitly specify "resolv-file=resolv.dnsmasq.conf" to replace the default /etc/resolv.conf and thus bypass the k8s pod default resolving. We copy the node host name servers and search domains, so our custom name server can still resolve the names out of the k8s cluster. Actually what we copy here is the kube-dns pod's /etc/resolv.conf, which in turn copies (partly) the file of the node:

bash-4.2$ kubectl get po -n kube-system | grep "kube-dns"

NAME                               READY     STATUS    RESTARTS   AGE

kube-dns-545bc4bfd4-zjsz8          3/3       Running   4          4d

bash-4.2$ kubectl exec kube-dns-545bc4bfd4-zjsz8 -c kubedns -n kube-system cat /etc/resolv.conf

nameserver my.internal.name.server1.ip

nameserver my.internal.name.server2.ip

nameserver my.internal.name.server3.ip

search cn.my.com my.com mycorp.com

bash-4.2$ cat /etc/resolv.conf

options timout:1

options attempts:2

; generated by /usr/sbin/dhclient-script

search cn.my.com mycorp.com my.com

nameserver my.internal.name.server1.ip

nameserver my.internal.name.server2.ip

nameserver my.internal.name.server3.ip

The /etc/resolv.conf specified the "upstream" name servers of kube-dns, and explains how it inherited from the node's name resolving.

The run.sh specified in the CMD field of Dockerfile is:

#!/bin/bash

 

sed -i s/__LOCAL_IP__/$POD_IP/ /etc/dnsmasq.conf

 

dnsmasq --addn-hosts=/etc/addn-hosts &

echo "Config is /etc/dnsmasq.conf"

echo "--addn-hosts=/etc/addn-hosts"

 

# Start netcat, listen on port 12345 and reload the host line automatically on the fly

while [ 1 ];

do

  #sleep 3600

  m=$(nc -l 0.0.0.0 12345);

  echo $m;

  echo $m  >> /etc/addn-hosts;

  kill -HUP $(pgrep dnsmasq)

  echo "-- Reloaded --"

done

put it to the same path of Dockerfile and build the image:

docker build --rm --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy --build-arg no_proxy=$no_proxy -t my/dnsmasq .

Prepare our dnsmasq pod, i.e. custom name server

Yaml work:

dnsmasq.yaml

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

apiVersion: v1

kind: Pod

metadata:

  name: mydns

  labels:

    app: mydns

spec:

  containers:

  - image: my/dnsmasq

    name: mydns

    command: ["/bin/sh", "-c", "/root/run.sh"]

    imagePullPolicy: Never

    securityContext:

      privileged: true

    env:

    - name: POD_IP

      valueFrom:

        fieldRef:

          fieldPath: status.podIP

---

apiVersion: v1

kind: Service

metadata:

  name: mydns-service

  labels:

    app: mydns-service

spec:

  ports:

  - port: 12345

    name: inbound-port

  - port: 53

    name: dns-service-port

  selector:

    app: mydns 

Then create the pod and service:

kubectl apply -f dnsmasq.yaml

Integrate our name server with kube-dns: create the ConfigMap

After the pod mydns starts, use "kubectl logs.." to see if it starts successfully, and then get its IP:

bash-4.2$ kubectl logs mydns

Config is /etc/dnsmasq.conf

--addn-hosts=/etc/addn-hosts

bash-4.2$ kubectl get po -o wide | grep mydns | awk '{print $6}'

10.244.0.11

Now write our ConfigMap yaml file using this IP:

cm.yaml

1

2

3

4

5

6

7

8

9

10

apiVersion: v1

kind: ConfigMap

metadata:

  name: kube-dns

  namespace: kube-system

data:

  stubDomains: |

    {"example.com": ["10.244.0.11"]}

  upstreamNameservers: |

    ["10.244.0.11"]

Apply it

kubectl apply -f cm.yaml

Add entries to our DNS and test it out

As we stated previously, we assume you have read Multiple Network Interfaces for a k8s pod, now we are about to test our DNS server with some simple pod that enables multiple network interfaces. The testing pod is an ole7 container doing nothing but sleeping:

ole7.yaml

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

apiVersion: v1

kind: Pod

metadata:

  name: ole7

  annotations:

    k8s.v1.cni.cncf.io/networks: macvlan-conf

spec:

  volumes:

  - name: tempdir

    emptyDir: {}

  initContainers:

  - name: get-priv-ip

    image: oraclelinux:7

    command: ["/bin/bash", "-c", "ip a | grep 192.168 | awk '{ print substr($2, 1, index($2, \"/\") - 1) }' 2>&1 | tee /temp/PRIV_IP"]

    volumeMounts:

    - name: "tempdir"

      mountPath: "/temp"

  containers:

  - name: ole7

    command: ["/bin/bash", "-c", "sleep 2000000000000"]

    image: oraclelinux:7

    env:

    - name: POD_IP

      valueFrom:

        fieldRef:

          fieldPath: status.podIP

    volumeMounts:

    - name: "tempdir"

      mountPath: "/temp"

 Create it by:

kubectl apply -f ole7.yaml

The pod has two IP addresses, one is the flannel default (displays in "kubectl get pod ...") and one is in file /temp/PRIV_IP in the container.

Once the pod is initiated, we can test it out. In the following example, I got both public and private IPs for pod "ole7", added them to our custom name server (by redirecting the echo output to /dev/tcp/mydns-service/12345) dynamically, giving them names. Next, I pinged all the names with or without the domain suffix, I also ping some hosts that out of the k8s cluster, e.g. vm09xxl and home.cn.my.com, and all the commands are successful.

Test commands

bash-4.2$ kubectl get po -o wide | grep ole7 | awk '{print $6}'

10.244.0.26

bash-4.2$ kubectl exec ole7 -- cat /temp/PRIV_IP

192.168.1.211

bash-4.2$ kubectl exec -ti ole7 -- bash

[root@ole7 /]# echo "10.244.0.26 ole.example.com ole" > /dev/tcp/mydns-service/12345

[root@ole7 /]# echo "192.168.1.211 ole-priv.example.com ole-priv" > /dev/tcp/mydns-service/12345

[root@ole7 /]# ping -c 3 ole

PING ole (10.244.0.26) 56(84) bytes of data.

64 bytes from ole7 (10.244.0.26): icmp_seq=1 ttl=64 time=0.057 ms

64 bytes from ole7 (10.244.0.26): icmp_seq=2 ttl=64 time=0.057 ms

64 bytes from ole7 (10.244.0.26): icmp_seq=3 ttl=64 time=0.038 ms

--- ole ping statistics ---

3 packets transmitted, 3 received, 0% packet loss, time 1999ms

rtt min/avg/max/mdev = 0.038/0.050/0.057/0.012 ms

[root@ole7 /]# ping -c 3 ole.example.com

PING ole.example.com (10.244.0.26) 56(84) bytes of data.

64 bytes from ole7 (10.244.0.26): icmp_seq=1 ttl=64 time=0.022 ms

64 bytes from ole7 (10.244.0.26): icmp_seq=2 ttl=64 time=0.032 ms

64 bytes from ole7 (10.244.0.26): icmp_seq=3 ttl=64 time=0.038 ms

--- ole.example.com ping statistics ---

3 packets transmitted, 3 received, 0% packet loss, time 2000ms

rtt min/avg/max/mdev = 0.022/0.030/0.038/0.009 ms

[root@ole7 /]# ping -c 3 ole-priv

PING ole-priv (192.168.1.211) 56(84) bytes of data.

64 bytes from ole7 (192.168.1.211): icmp_seq=1 ttl=64 time=0.023 ms

64 bytes from ole7 (192.168.1.211): icmp_seq=2 ttl=64 time=0.045 ms

64 bytes from ole7 (192.168.1.211): icmp_seq=3 ttl=64 time=0.043 ms

--- ole-priv ping statistics ---

3 packets transmitted, 3 received, 0% packet loss, time 2000ms

rtt min/avg/max/mdev = 0.023/0.037/0.045/0.009 ms

[root@ole7 /]# ping -c 3 ole-priv.example.com

PING ole-priv.example.com (192.168.1.211) 56(84) bytes of data.

64 bytes from ole7 (192.168.1.211): icmp_seq=1 ttl=64 time=0.026 ms

64 bytes from ole7 (192.168.1.211): icmp_seq=2 ttl=64 time=0.054 ms

64 bytes from ole7 (192.168.1.211): icmp_seq=3 ttl=64 time=0.038 ms

--- ole-priv.example.com ping statistics ---

3 packets transmitted, 3 received, 0% packet loss, time 2000ms

rtt min/avg/max/mdev = 0.026/0.039/0.054/0.012 ms

[root@ole7 /]# ping -c 3 vm09xxl

PING vm09xxl.cn.my.com (10.xxx.xxx.xxx) 56(84) bytes of data.

64 bytes from vm09xxl.cn.my.com (10.xxx.xxx.xxx): icmp_seq=1 ttl=57 time=0.703 ms

64 bytes from vm09xxl.cn.my.com (10.xxx.xxx.xxx): icmp_seq=2 ttl=57 time=0.716 ms

64 bytes from vm09xxl.cn.my.com (10.xxx.xxx.xxx): icmp_seq=3 ttl=57 time=0.591 ms

--- vm09xxl.cn.my.com ping statistics ---

3 packets transmitted, 3 received, 0% packet loss, time 2000ms

rtt min/avg/max/mdev = 0.591/0.670/0.716/0.056 ms

[root@ole7 /]# ping -c 3 home.cn.my.com

PING vmibfj.cn.my.com (10.xxx.x.xxx) 56(84) bytes of data.

64 bytes from vmibfj.cn.my.com (10.xxx.x.xxx): icmp_seq=1 ttl=57 time=0.602 ms

64 bytes from vmibfj.cn.my.com (10.xxx.x.xxx): icmp_seq=2 ttl=57 time=0.657 ms

64 bytes from vmibfj.cn.my.com (10.xxx.x.xxx): icmp_seq=3 ttl=57 time=0.684 ms

--- vmibfj.cn.my.com ping statistics ---

3 packets transmitted, 3 received, 0% packet loss, time 2000ms

rtt min/avg/max/mdev = 0.602/0.647/0.684/0.045 ms

We add two names for a single pod IP, e.g "ole.exmple.com" and "ole" to addn-hosts in our name server. When a name resolving request is sent to kube-dns, if the name has domain suffix i.e. example.com, the request is forwarded to the custom name server specified in the "stubDomains" attribute of the ConfigMap. If the name does not contain suffix, the request is firstly handled by kube-dns by adding the "...cluster.local" suffix, if the name could not be found in k8s cluster, kube-dns will forward it to "upstreamNameservers", in our case, this is the same custom name server, and the name is found in addn-hosts. On the other hand, a name such as "vm09xxl" follows the same resolving path, but cannot be found in addn-hosts, so the name server consults its /etc/resolv.dnsmasq.conf and forwards the request to its upstream name servers specified there.

Issues

Pod CIDR

Our current pod_network_cidr="10.244.0.0/16", the IP(s) allocated to pods conflict to those of our VMs'. For example, 10.244.0.25 is somevm1.cn.my.com, 10.244.0.26 is somevm2.cn.my.com. This may case confusion in our name resolving. We may need to change pod_network_cidr to another value.

References

https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/

https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/

https://kubernetes.io/blog/2017/04/configuring-private-dns-zones-upstream-nameservers-kubernetes/

https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/

https://github.com/noteed/docker-dnsmasq

https://pracucci.com/kubernetes-dns-resolution-ndots-options-and-why-it-may-affect-application-performances.html

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章