Kubernets:Mutus-CNI簡介

Github-blog
CSDN

Multus-CNI簡介

Multus-CNI是intel開發的一款應用於Kubernetes上的插件,可以作爲其他CNI插件與k8s的中間件,使得kubernetes支持多個網絡,實現網絡冗餘,爲實現控制面與數據面分離提供支持。PS:multus-cni本身不提供網絡配置功能,它是通過用其他滿足CNI規範的插件進行container的網絡配置,例如flannel/sriov等。 Multus-CNI源碼地址


其workflow如下圖所示:
image

  1. kubectl調用RunPod()開始啓動container;
  2. setUpPod()調用網絡配置插件(k8s的CNI);
  3. 網絡配置插件根據k8s配置(默認位於/etc/cni/net.d/下),調用Multus,進入Multus邏輯;
  4. Multus首先調用主pulgin,然後調用其他的plugin創建相應的interface,plugin包括flannel,IPAM, SR-IOV CNI plugin等。

Multus-CNI數據結構

Multus-CNI數據結構如下所示:

type NetConf struct {
    types.NetConf
    // 用於存放配置信息的位置, 默認"/var/lib/cni/multus, 一般用不到"
    CNIDir     string                   `json:"cniDir"`
    // 具體的第三方插件的配置信息
    Delegates  []map[string]interface{} `json:"delegates"`
    // k8s的配置文件的絕對路徑
    Kubeconfig string                   `json:"kubeconfig"`
}

type PodNet struct {
    Networkname string `json:"name"`
}

type netplugin struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty" description:"standard object metadata"`
    Plugin            string `json:"plugin"`
    Args              string `json:"args"`
}

// K8sArgs is the valid CNI_ARGS used for Kubernetes
type K8sArgs struct {
    types.CommonArgs
    IP                         net.IP
    K8S_POD_NAME               types.UnmarshallableString
    K8S_POD_NAMESPACE          types.UnmarshallableString
    K8S_POD_INFRA_CONTAINER_ID types.UnmarshallableString
}

Multus-CNI核心函數

cmdAdd(args *skel.CmdArgs)主要是讀取網絡配置,然後調用具體的CNI實現配置,cmdDel(args *skel.CmdArgs)則是其逆過程。配置過程的log可以通過/var/log/syslog進行查看。 此處僅對cmdAdd函數進行簡單介紹:

func cmdAdd(args *skel.CmdArgs) error {
    var result error
    var nopodnet bool
    // 讀取配置信息
    n, err := loadNetConf(args.StdinData)
    if err != nil {
        return fmt.Errorf("err in loading netconf: %v", err)
    }

    // 讀取k8s配置信息
    if n.Kubeconfig != "" {
        // 獲取k8s的網絡信息;
        // “/etc/cni/net.d/” 下的第一個(按照文件名的ASCII順序)網絡配置應該是multus的配置
        podDelegate, err := getK8sNetwork(args, n.Kubeconfig)
        if err != nil {
            if _, ok := err.(*NoK8sNetworkError); ok {
                nopodnet = true
                if !defaultcninetwork {
                    return fmt.Errorf("Multus: Err in getting k8s network from the pod spec annotation, check the pod spec or set delegate for the default network, Refer the README.md: %v", err)
                }
            } else if !defaultcninetwork {
                return fmt.Errorf("Multus: Err in getting k8s network from pod: %v", err)
            }
        }
        // 獲取默認配置中的 Delegate 字段,即具體調用的CNI的配置信息
        if len(podDelegate) != 0 {
            n.Delegates = podDelegate
        }
    }

    for _, delegate := range n.Delegates {
        if err := checkDelegate(delegate); err != nil {
            return fmt.Errorf("Multus: Err in delegate conf: %v", err)
        }
    }

    if n.Kubeconfig == "" || nopodnet {
        if err := saveDelegates(args.ContainerID, n.CNIDir, n.Delegates); err != nil {
            return fmt.Errorf("Multus: Err in saving the delegates: %v", err)
        }
    }
    // 獲取在pod內的網卡名
    podifName := getifname()
    var mIndex int
    // 遍歷delegate進行配置,此次遍歷進配置主plugin
    for index, delegate := range n.Delegates {
        /* delegateAdd 函數就是調用具體的容器網絡配置插件CNI進行網絡配置,並得到具體的配置信息r。
         * 最後一個參數true表示是否爲主plugin
         * 若配置失敗,則直接退出,容器啓動會失敗
         */
        err, r := delegateAdd(podifName, args.IfName, delegate, true)
        if err != true {
            result = r
            mIndex = index
        } else if (err != false) && r != nil {
            return r
        }
    }
    /* 遍歷delegate進行配置,此次遍歷配置其他plugins
     * 若plugin配置失敗,清除該plugin配置
     * 清除配置成功,則multus仍可順利完成
     */
    for index, delegate := range n.Delegates {
        err, r := delegateAdd(podifName, args.IfName, delegate, false)
        if err != true {
            result = r
        } else if (err != false) && r != nil {
            perr := clearPlugins(mIndex, index, args.IfName, n.Delegates)
            if perr != nil {
                return perr
            }
            return r
        }
    }

    return result
}

Multus配置示例

multus支持Kubernetes Custom Resource Object (CRD)以及Third Party Resource(TPR)兩種網絡配置方式. 我們以CRD爲例,從multus安裝開始詳細介紹其完整配置過程.TRP配置類似.

Multus安裝
//下載源碼
https://github.com/Intel-Corp/multus-cni.git
cd multus-cni
// build可執行文件
./build
// cni的可執行文件默認放置在`/opt/cni/bin/`目錄下
mv bin/multus /opt/cni/bin/
k8s配置使用Multus 並設置主plugin

k8s cni的默認配置路徑是/etc/cni/net.d/,目錄下的配置文件按照ASCII進行排序。配置結束後需要重啓kubelet。

root@xftony:/etc/cni/net.d# cat 01-crd-multus-cni.conf 
{
    "name": "multus-network",
    "type": "multus",
    # 選用admin.conf 而非github上示例的node-kubeconfig.yaml
    "kubeconfig": "/etc/kubernetes/admin.conf",
    "delegates": [{
        "type": "flannel",
        "name": "cbr0",
        "hairpinMode": true,
        "masterplugin": true,
        "isDefaultGateway": true
    }]
}
//  重啓kubelet
root@xftony:/etc/cni/net.d#systemctl restart kubelet
CRD配置

首先在k8s中定義network object,即後續網絡配置文件選用的resource類型.

root@xftony:~/xftony/netYaml#cat crdnetwork.yaml
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  # name must match the spec fields below, and be in the form: <plural>.<group>
  name: networks.kubernetes.com
spec:
  # group name to use for REST API: /apis/<group>/<version>
  group: kubernetes.com
  # version name to use for REST API: /apis/<group>/<version>
  version: v1
  # either Namespaced or Cluster
  scope: Namespaced
  names:
    # plural name to be used in the URL: /apis/<group>/<version>/<plural>
    plural: networks
    # singular name to be used as an alias on the CLI and for display
    singular: network
    # kind is normally the CamelCased singular type. Your resource manifests use this.
    kind: Network
    # shortNames allow shorter string to match your resource on the CLI
    shortNames:
    - net
預置k8s可用的網絡

編寫要使用的網絡配置文件,然後註冊到k8s,以便建立pod是進行調用。此處以最常用的flannel和之前提到過的sriov爲例。

flannel

創建flannel的網絡配置文件;
生成k8s網絡

// 創建flannel的網絡配置信息
root@xftony:~/xftony/netYaml# cat flannel.yaml 
apiVersion: "kubernetes.com/v1"
# 這裏的Network就是crdnetwork.yaml中定義的kind
kind: Network
metadata:
  name: flannel-network
plugin: flannel
args: '[
        {
                "delegate": {
                        "isDefaultGateway": true
                }
        }
]'
// 生成k8s網絡配置
root@xftony:~/xftony/netYaml# kubectl create -f flannel.yaml

// 查看k8s可用的網絡配置
root@xftony:~/xftony/netYaml# kubectl get network -o wide
NAME                                   AGE
flannel-network                       2s

sriov
此處的sriov是使用我自己修改的sriov版本,新增了pfOnly參數,默認爲false。因此默認情況下需要所在node開啓sriov功能,並有可用的VF。

創建sriov的網絡配置文件;
生成k8s網絡

/* 創建sriov的網絡配置信息 
 * 此處創建了兩個sriov網卡,一個是普通的sriov網卡,另一個是dpdk接管的sriov網卡
 */
root@xftony:~/xftony/netYaml# cat sriov-dpdk-network-enp4s0f1.yaml 
apiVersion: "kubernetes.com/v1"
kind: Network
metadata:
  name: sriov-dpdk-network-enp4s0f1
plugin: sriov
args: '[
       {
                "if0": "enp4s0f1",
                "pfOnly": false,
                "type": "sriov",
                "if0name": "net0-4s0f1",
                "ipam": {
                        "type": "host-local",
                        "subnet": "18.1.0.0/24",
                        "rangeStart": "18.1.0.70",
                        "rangeEnd": "18.1.0.99",
                        "routes": [
                                { "dst": "0.0.0.0/0" }
                        ],
                        "gateway": "18.1.0.1"
                }
       },
       {        "if0": "enp4s0f1",
                "pfOnly": false,
                "type": "sriov",
                "if0name": "Dpdk1-4s0f1",
                "dpdk": {
                        "kernel_driver": "ixgbevf",
                        "dpdk_driver": "igb_uio",
                        "dpdk_tool": "/root/xftony/plugins/dpdk-17.11/usertools/dpdk-devbind.py"
                }
       }
]'
配置並測試pod

通過yaml文件創建pod配置,並創建之。

root@xftony:~/xftony/podYaml# cat xftony-test-pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: xftony-test
  annotations:
    # 網絡選用之前已經配置到了的flannel-network,以及sriov-dpdk-network-enp4s0f1
    networks: '[ 
        { "name": "flannel-network"}, 
        { "name": "sriov-dpdk-network-enp4s0f1"}
   ]'
spec:  # specification of the pod's contents
  nodeSelector:
   name: node1
  containers:
  - name: xftony-test-1
    image: "ubuntu:14.04"
    imagePullPolicy: IfNotPresent
    command: ["top"]
    stdin: true
    tty: true

// 創建pod
root@xftony:~/xftony/podYaml# kubectl create -f xftony-test-pod.yaml

使用kubectl指令,查看pod狀態。

root@ubuntu90:~/xftony/podYaml# kubectl get pods -o wide 
NAME                     READY     STATUS             RESTARTS   AGE       IP             NODE
xftony-test              1/1       Running            0          8s        172.17.2.11    node1

進入pod內,查看其網卡。其中eth0爲flannel創建的網卡,net0-4s0f1爲sriov創建的普通sriov網卡,dpdk網卡無法直接通過ifconfig進行查看。

root@xftony:~# kubectl exec -it xftony-test /bin/bash
root@xfxftony-test-1:/# ifconfig
eth0      Link encap:Ethernet  HWaddr 0a:58:ac:11:02:0b  
          inet addr:172.17.2.11  Bcast:0.0.0.0  Mask:255.255.255.0
          inet6 addr: fe80::cc37:c5ff:fe1c:ea04/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:14 errors:0 dropped:0 overruns:0 frame:0
          TX packets:14 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1116 (1.1 KB)  TX bytes:1116 (1.1 KB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

net0-4s0f1 Link encap:Ethernet  HWaddr 96:80:02:7c:ef:e1  
          inet addr:18.1.0.71  Bcast:0.0.0.0  Mask:255.255.255.0
          inet6 addr: fe80::9480:2ff:fe7c:efe1/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:38 errors:0 dropped:0 overruns:0 frame:0
          TX packets:352 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:3268 (3.2 KB)  TX bytes:30028 (30.0 KB)

dpdk網卡信息查看的步驟如下(把對應目錄掛進來,然後寫個小腳本就行):

1、查看容器的namespace信息;

root@xfxftony-test-1:/# ll /proc/1/ns/
total 0
dr-x--x--x 2 root root 0 Jun  1 02:59 ./
dr-xr-xr-x 9 root root 0 Jun  1 02:49 ../
lrwxrwxrwx 1 root root 0 Jun  1 02:59 cgroup -> cgroup:[4026531835]
lrwxrwxrwx 1 root root 0 Jun  1 02:59 ipc -> ipc:[4026533208]
lrwxrwxrwx 1 root root 0 Jun  1 02:59 mnt -> mnt:[4026532762]
lrwxrwxrwx 1 root root 0 Jun  1 02:59 net -> net:[4026533211]
lrwxrwxrwx 1 root root 0 Jun  1 02:59 pid -> pid:[4026533310]
lrwxrwxrwx 1 root root 0 Jun  1 02:59 user -> user:[4026531837]
lrwxrwxrwx 1 root root 0 Jun  1 02:59 uts -> uts:[4026533309]

2、進入宿主機查看sriov保存的配置文件,/var/lib/cni/sriov/是默認的sriov配置文件位置,4026533211就是其容器的namespace。具體介紹詳見sriov

root@xftony:/var/lib/cni/sriov/4026533211# ls
7d888af4d7da943a41ee063f0606ec0b558eb583de12a165a530e392992e751c-Dpdk1-4s0f1
7d888af4d7da943a41ee063f0606ec0b558eb583de12a165a530e392992e751c-net0-4s0f1

root@xftony:/var/lib/cni/sriov/4026533211# cat 7d888af4d7da943a41ee063f0606ec0b558eb583de12a165a530e392992e751c-Dpdk1-4s0f1 
{"type":"sriov","ipam":{},"dns":{},"DPDKMode":true,"Sharedvf":false,"dpdk":{"pci_addr":"0000:04:10.5","ifname":"Dpdk1-4s0f1","kernel_driver":"ixgbevf","dpdk_driver":"igb_uio","dpdk_tool":"/root/xftony/plugins/dpdk-17.11/usertools/dpdk-devbind.py","VFID":2},"cniDir":"/var/lib/cni/sriov","if0":"enp4s0f1","if0name":"Dpdk1-4s0f1","l2enable":false,"vlan":0,"pfOnly":false,"pci_addr":"0000:04:10.5"}root@ubuntu89:/var/lib/cni/sriov/4026533211# 

Intel寫的白皮書(Enabling New Features with Kubernetes for NFV)裏面有個圖聽適合這裏的,不過,裏面的north0south0都是普通的sriov網卡,對應示例中的net0-4s0f1。所以我稍稍的P了一下,增加了dpdk的口。
image

以上~

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章