Multus-CNI簡介
Multus-CNI是intel開發的一款應用於Kubernetes上的插件,可以作爲其他CNI插件與k8s的中間件,使得kubernetes支持多個網絡,實現網絡冗餘,爲實現控制面與數據面分離提供支持。PS:multus-cni本身不提供網絡配置功能,它是通過用其他滿足CNI規範的插件進行container的網絡配置,例如flannel/sriov等。 Multus-CNI源碼地址。
其workflow如下圖所示:
kubectl
調用RunPod()開始啓動container;- setUpPod()調用網絡配置插件(k8s的CNI);
- 網絡配置插件根據k8s配置(默認位於
/etc/cni/net.d/
下),調用Multus,進入Multus邏輯;- Multus首先調用主pulgin,然後調用其他的plugin創建相應的interface,plugin包括flannel,IPAM, SR-IOV CNI plugin等。
Multus-CNI數據結構
Multus-CNI數據結構如下所示:
type NetConf struct {
types.NetConf
// 用於存放配置信息的位置, 默認"/var/lib/cni/multus, 一般用不到"
CNIDir string `json:"cniDir"`
// 具體的第三方插件的配置信息
Delegates []map[string]interface{} `json:"delegates"`
// k8s的配置文件的絕對路徑
Kubeconfig string `json:"kubeconfig"`
}
type PodNet struct {
Networkname string `json:"name"`
}
type netplugin struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty" description:"standard object metadata"`
Plugin string `json:"plugin"`
Args string `json:"args"`
}
// K8sArgs is the valid CNI_ARGS used for Kubernetes
type K8sArgs struct {
types.CommonArgs
IP net.IP
K8S_POD_NAME types.UnmarshallableString
K8S_POD_NAMESPACE types.UnmarshallableString
K8S_POD_INFRA_CONTAINER_ID types.UnmarshallableString
}
Multus-CNI核心函數
cmdAdd(args *skel.CmdArgs)
主要是讀取網絡配置,然後調用具體的CNI實現配置,cmdDel(args *skel.CmdArgs)
則是其逆過程。配置過程的log可以通過/var/log/syslog
進行查看。 此處僅對cmdAdd
函數進行簡單介紹:
func cmdAdd(args *skel.CmdArgs) error {
var result error
var nopodnet bool
// 讀取配置信息
n, err := loadNetConf(args.StdinData)
if err != nil {
return fmt.Errorf("err in loading netconf: %v", err)
}
// 讀取k8s配置信息
if n.Kubeconfig != "" {
// 獲取k8s的網絡信息;
// “/etc/cni/net.d/” 下的第一個(按照文件名的ASCII順序)網絡配置應該是multus的配置
podDelegate, err := getK8sNetwork(args, n.Kubeconfig)
if err != nil {
if _, ok := err.(*NoK8sNetworkError); ok {
nopodnet = true
if !defaultcninetwork {
return fmt.Errorf("Multus: Err in getting k8s network from the pod spec annotation, check the pod spec or set delegate for the default network, Refer the README.md: %v", err)
}
} else if !defaultcninetwork {
return fmt.Errorf("Multus: Err in getting k8s network from pod: %v", err)
}
}
// 獲取默認配置中的 Delegate 字段,即具體調用的CNI的配置信息
if len(podDelegate) != 0 {
n.Delegates = podDelegate
}
}
for _, delegate := range n.Delegates {
if err := checkDelegate(delegate); err != nil {
return fmt.Errorf("Multus: Err in delegate conf: %v", err)
}
}
if n.Kubeconfig == "" || nopodnet {
if err := saveDelegates(args.ContainerID, n.CNIDir, n.Delegates); err != nil {
return fmt.Errorf("Multus: Err in saving the delegates: %v", err)
}
}
// 獲取在pod內的網卡名
podifName := getifname()
var mIndex int
// 遍歷delegate進行配置,此次遍歷進配置主plugin
for index, delegate := range n.Delegates {
/* delegateAdd 函數就是調用具體的容器網絡配置插件CNI進行網絡配置,並得到具體的配置信息r。
* 最後一個參數true表示是否爲主plugin
* 若配置失敗,則直接退出,容器啓動會失敗
*/
err, r := delegateAdd(podifName, args.IfName, delegate, true)
if err != true {
result = r
mIndex = index
} else if (err != false) && r != nil {
return r
}
}
/* 遍歷delegate進行配置,此次遍歷配置其他plugins
* 若plugin配置失敗,清除該plugin配置
* 清除配置成功,則multus仍可順利完成
*/
for index, delegate := range n.Delegates {
err, r := delegateAdd(podifName, args.IfName, delegate, false)
if err != true {
result = r
} else if (err != false) && r != nil {
perr := clearPlugins(mIndex, index, args.IfName, n.Delegates)
if perr != nil {
return perr
}
return r
}
}
return result
}
Multus配置示例
multus支持Kubernetes Custom Resource Object (CRD)以及Third Party Resource(TPR)兩種網絡配置方式. 我們以CRD爲例,從multus安裝開始詳細介紹其完整配置過程.TRP配置類似.
Multus安裝
//下載源碼
https://github.com/Intel-Corp/multus-cni.git
cd multus-cni
// build可執行文件
./build
// cni的可執行文件默認放置在`/opt/cni/bin/`目錄下
mv bin/multus /opt/cni/bin/
k8s配置使用Multus 並設置主plugin
k8s cni的默認配置路徑是/etc/cni/net.d/
,目錄下的配置文件按照ASCII進行排序。配置結束後需要重啓kubelet。
root@xftony:/etc/cni/net.d# cat 01-crd-multus-cni.conf
{
"name": "multus-network",
"type": "multus",
# 選用admin.conf 而非github上示例的node-kubeconfig.yaml
"kubeconfig": "/etc/kubernetes/admin.conf",
"delegates": [{
"type": "flannel",
"name": "cbr0",
"hairpinMode": true,
"masterplugin": true,
"isDefaultGateway": true
}]
}
// 重啓kubelet
root@xftony:/etc/cni/net.d#systemctl restart kubelet
CRD配置
首先在k8s中定義network
object,即後續網絡配置文件選用的resource類型.
root@xftony:~/xftony/netYaml#cat crdnetwork.yaml
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
# name must match the spec fields below, and be in the form: <plural>.<group>
name: networks.kubernetes.com
spec:
# group name to use for REST API: /apis/<group>/<version>
group: kubernetes.com
# version name to use for REST API: /apis/<group>/<version>
version: v1
# either Namespaced or Cluster
scope: Namespaced
names:
# plural name to be used in the URL: /apis/<group>/<version>/<plural>
plural: networks
# singular name to be used as an alias on the CLI and for display
singular: network
# kind is normally the CamelCased singular type. Your resource manifests use this.
kind: Network
# shortNames allow shorter string to match your resource on the CLI
shortNames:
- net
預置k8s可用的網絡
編寫要使用的網絡配置文件,然後註冊到k8s,以便建立pod是進行調用。此處以最常用的flannel和之前提到過的sriov爲例。
flannel
創建flannel的網絡配置文件;
生成k8s網絡
// 創建flannel的網絡配置信息
root@xftony:~/xftony/netYaml# cat flannel.yaml
apiVersion: "kubernetes.com/v1"
# 這裏的Network就是crdnetwork.yaml中定義的kind
kind: Network
metadata:
name: flannel-network
plugin: flannel
args: '[
{
"delegate": {
"isDefaultGateway": true
}
}
]'
// 生成k8s網絡配置
root@xftony:~/xftony/netYaml# kubectl create -f flannel.yaml
// 查看k8s可用的網絡配置
root@xftony:~/xftony/netYaml# kubectl get network -o wide
NAME AGE
flannel-network 2s
sriov
此處的sriov是使用我自己修改的sriov版本,新增了pfOnly
參數,默認爲false
。因此默認情況下需要所在node開啓sriov功能,並有可用的VF。
創建sriov的網絡配置文件;
生成k8s網絡
/* 創建sriov的網絡配置信息
* 此處創建了兩個sriov網卡,一個是普通的sriov網卡,另一個是dpdk接管的sriov網卡
*/
root@xftony:~/xftony/netYaml# cat sriov-dpdk-network-enp4s0f1.yaml
apiVersion: "kubernetes.com/v1"
kind: Network
metadata:
name: sriov-dpdk-network-enp4s0f1
plugin: sriov
args: '[
{
"if0": "enp4s0f1",
"pfOnly": false,
"type": "sriov",
"if0name": "net0-4s0f1",
"ipam": {
"type": "host-local",
"subnet": "18.1.0.0/24",
"rangeStart": "18.1.0.70",
"rangeEnd": "18.1.0.99",
"routes": [
{ "dst": "0.0.0.0/0" }
],
"gateway": "18.1.0.1"
}
},
{ "if0": "enp4s0f1",
"pfOnly": false,
"type": "sriov",
"if0name": "Dpdk1-4s0f1",
"dpdk": {
"kernel_driver": "ixgbevf",
"dpdk_driver": "igb_uio",
"dpdk_tool": "/root/xftony/plugins/dpdk-17.11/usertools/dpdk-devbind.py"
}
}
]'
配置並測試pod
通過yaml文件創建pod配置,並創建之。
root@xftony:~/xftony/podYaml# cat xftony-test-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: xftony-test
annotations:
# 網絡選用之前已經配置到了的flannel-network,以及sriov-dpdk-network-enp4s0f1
networks: '[
{ "name": "flannel-network"},
{ "name": "sriov-dpdk-network-enp4s0f1"}
]'
spec: # specification of the pod's contents
nodeSelector:
name: node1
containers:
- name: xftony-test-1
image: "ubuntu:14.04"
imagePullPolicy: IfNotPresent
command: ["top"]
stdin: true
tty: true
// 創建pod
root@xftony:~/xftony/podYaml# kubectl create -f xftony-test-pod.yaml
使用kubectl指令,查看pod狀態。
root@ubuntu90:~/xftony/podYaml# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
xftony-test 1/1 Running 0 8s 172.17.2.11 node1
進入pod內,查看其網卡。其中eth0爲flannel創建的網卡,net0-4s0f1爲sriov創建的普通sriov網卡,dpdk網卡無法直接通過ifconfig
進行查看。
root@xftony:~# kubectl exec -it xftony-test /bin/bash
root@xfxftony-test-1:/# ifconfig
eth0 Link encap:Ethernet HWaddr 0a:58:ac:11:02:0b
inet addr:172.17.2.11 Bcast:0.0.0.0 Mask:255.255.255.0
inet6 addr: fe80::cc37:c5ff:fe1c:ea04/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1
RX packets:14 errors:0 dropped:0 overruns:0 frame:0
TX packets:14 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1116 (1.1 KB) TX bytes:1116 (1.1 KB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
net0-4s0f1 Link encap:Ethernet HWaddr 96:80:02:7c:ef:e1
inet addr:18.1.0.71 Bcast:0.0.0.0 Mask:255.255.255.0
inet6 addr: fe80::9480:2ff:fe7c:efe1/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:38 errors:0 dropped:0 overruns:0 frame:0
TX packets:352 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3268 (3.2 KB) TX bytes:30028 (30.0 KB)
dpdk網卡信息查看的步驟如下(把對應目錄掛進來,然後寫個小腳本就行):
1、查看容器的
namespace
信息;
root@xfxftony-test-1:/# ll /proc/1/ns/
total 0
dr-x--x--x 2 root root 0 Jun 1 02:59 ./
dr-xr-xr-x 9 root root 0 Jun 1 02:49 ../
lrwxrwxrwx 1 root root 0 Jun 1 02:59 cgroup -> cgroup:[4026531835]
lrwxrwxrwx 1 root root 0 Jun 1 02:59 ipc -> ipc:[4026533208]
lrwxrwxrwx 1 root root 0 Jun 1 02:59 mnt -> mnt:[4026532762]
lrwxrwxrwx 1 root root 0 Jun 1 02:59 net -> net:[4026533211]
lrwxrwxrwx 1 root root 0 Jun 1 02:59 pid -> pid:[4026533310]
lrwxrwxrwx 1 root root 0 Jun 1 02:59 user -> user:[4026531837]
lrwxrwxrwx 1 root root 0 Jun 1 02:59 uts -> uts:[4026533309]
2、進入宿主機查看sriov保存的配置文件,
/var/lib/cni/sriov/
是默認的sriov配置文件位置,4026533211
就是其容器的namespace
。具體介紹詳見sriov。
root@xftony:/var/lib/cni/sriov/4026533211# ls
7d888af4d7da943a41ee063f0606ec0b558eb583de12a165a530e392992e751c-Dpdk1-4s0f1
7d888af4d7da943a41ee063f0606ec0b558eb583de12a165a530e392992e751c-net0-4s0f1
root@xftony:/var/lib/cni/sriov/4026533211# cat 7d888af4d7da943a41ee063f0606ec0b558eb583de12a165a530e392992e751c-Dpdk1-4s0f1
{"type":"sriov","ipam":{},"dns":{},"DPDKMode":true,"Sharedvf":false,"dpdk":{"pci_addr":"0000:04:10.5","ifname":"Dpdk1-4s0f1","kernel_driver":"ixgbevf","dpdk_driver":"igb_uio","dpdk_tool":"/root/xftony/plugins/dpdk-17.11/usertools/dpdk-devbind.py","VFID":2},"cniDir":"/var/lib/cni/sriov","if0":"enp4s0f1","if0name":"Dpdk1-4s0f1","l2enable":false,"vlan":0,"pfOnly":false,"pci_addr":"0000:04:10.5"}root@ubuntu89:/var/lib/cni/sriov/4026533211#
Intel寫的白皮書(Enabling New Features with Kubernetes for NFV)裏面有個圖聽適合這裏的,不過,裏面的north0
和south0
都是普通的sriov網卡,對應示例中的net0-4s0f1
。所以我稍稍的P了一下,增加了dpdk的口。
以上~