系統環境,處理器爲:Intel® Core™ i7-4790K CPU @ 4.00GHz。
# cat /etc/issue
Ubuntu 20.04 LTS \n \l
#
# uname -a
Linux flyingshark 5.4.0-31-generic #35-Ubuntu SMP Thu May 7 20:20:34 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
#
BIOS需要開啓VT-x、VT-d支持,並且內核啓動參數增加:"iommu=pt intel_iommu=on"參數。
DPDK版本使用的是:dpdk-stable-20.02.1。編譯命令如下編譯完成l2fwd程序,ubuntu 20.04需要安裝libnuma-dev包。
# make config T=x86_64-native-linux-gcc
# make T=x86_64-native-linux-gcc
# export RTE_SDK=`pwd`
# cd examples/l2fwd
# make
準備工作,設置hugepages數量爲64。
# echo 64 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
# mkdir -p /mnt/huge
# mount -t hugetlbfs nodev /mnt/huge
如下所示,Hugepagesize的大小爲2M,設置數量爲64,即總的大小爲128M。
# cat /proc/meminfo | grep Huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 64
HugePages_Free: 47
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 131072 kB
默認情況下,ubuntu 20.04將vfio-pci驅動編譯進了內核中,不需要動態加載。修改vfio權限。
# ls /sys/bus/pci/drivers/vfio-pci
# chmod a+x /dev/vfio
最後,綁定網絡設備到vfio-pci驅動,系統中有如下網絡設備。其中4個X710萬兆網口,4個82599ES萬兆網卡。測試l2fwd,準備將01:00.0和01:00.1兩個X710網卡換成VFIO驅動。
# lspci -t -v
-[0000:00]-+-00.0 Intel Corporation 4th Gen Core Processor DRAM Controller
+-01.0-[01]--+-00.0 Intel Corporation Ethernet Controller X710 for 10GbE SFP+
| +-00.1 Intel Corporation Ethernet Controller X710 for 10GbE SFP+
| +-00.2 Intel Corporation Ethernet Controller X710 for 10GbE SFP+
| \-00.3 Intel Corporation Ethernet Controller X710 for 10GbE SFP+
+-01.1-[02-06]----00.0-[03-06]--+-00.0-[04]--+-00.0 Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection
| | \-00.1 Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection
| +-01.0-[05]--+-00.0 Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection
| | \-00.1 Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection
| \-08.0-[06]--
+-02.0 Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller
+-03.0 Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller
如下綁定兩個網卡到vfio-pci驅動:
# echo "0000:01:00.0" > /sys/bus/pci/drivers/i40e/unbind
# echo "0000:01:00.1" > /sys/bus/pci/drivers/i40e/unbind
# echo "vfio-pci" > /sys/bus/pci/devices/0000:01:00.0/driver_override
# echo "0000:01:00.0" > /sys/bus/pci/drivers/vfio-pci/bind
# echo "\00" > /sys/bus/pci/devices/0000:01:00.0/driver_override
# echo "vfio-pci" > /sys/bus/pci/devices/0000:01:00.1/driver_override
# echo "0000:01:00.1" > /sys/bus/pci/drivers/vfio-pci/bind
# echo "\00" > /sys/bus/pci/devices/0000:01:00.1/driver_override
查看網卡驅動,如下01:00.0已經換成了vfio-pci驅動。
# lspci -n -s 0000:01:00.0 -v
01:00.0 0200: 8086:1572 (rev 02)
Subsystem: 8086:0000
Flags: fast devsel, IRQ 16
Memory at f0000000 (64-bit, prefetchable) [size=8M]
Memory at f2800000 (64-bit, prefetchable) [size=32K]
Expansion ROM at f7c80000 [disabled] [size=512K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Capabilities: [70] MSI-X: Enable- Count=129 Masked-
Capabilities: [a0] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Device Serial Number 7d-c8-6f-ff-ff-e0-60-00
Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
Capabilities: [1a0] Transaction Processing Hints
Capabilities: [1b0] Access Control Services
Capabilities: [1d0] Secondary PCI Express
Kernel driver in use: vfio-pci
Kernel modules: i40e
最後,啓動l2fwd。發生如下的錯誤:VFIO group is not viable! Not all devices in IOMMU group bound to VFIO or unbound。
# ./l2fwd -l 0-3 -n 4 -- -q 8 -p 3
EAL: Detected 8 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:01:00.0 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 8086:1572 net_i40e
EAL: 0000:01:00.0 VFIO group is not viable! Not all devices in IOMMU group bound to VFIO or unbound
EAL: Requested device 0000:01:00.0 cannot be used
EAL: PCI device 0000:01:00.1 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 8086:1572 net_i40e
EAL: 0000:01:00.1 VFIO group is not viable! Not all devices in IOMMU group bound to VFIO or unbound
經過查詢,需要將使用vfio-pci驅動的兩個網絡接口所在的IOMMU組中的所有設備綁定到VFIO,或者卸載。如下所示,兩個網卡位於組1內。
$ readlink /sys/bus/pci/devices/0000:01:00.0/iommu_group
../../../../kernel/iommu_groups/1
$
$ readlink /sys/bus/pci/devices/0000:01:00.1/iommu_group
../../../../kernel/iommu_groups/1
# ls /dev/vfio/1 -l
crw------- 1 root root 243, 0 May 26 07:18 /dev/vfio/1
如下IOMMU組1內的設備,分別爲4個X710網卡(0000:01:00.0 0000:01:00.2 0000:01:00.1 0000:01:00.3),4個82599網卡(0000:04:00.0 0000:04:00.1 0000:05:00.0 0000:05:00.1)。
另外,兩個Intel PCI網橋(0000:00:01.0 0000:00:01.1),其它爲PLX的PCI Switch。這些IOMMU組1內的設備結構圖參見以上的lspci顯示。
$ ls /sys/kernel/iommu_groups/1/devices/
0000:00:01.0 0000:01:00.0 0000:01:00.2 0000:02:00.0 0000:03:01.0 0000:04:00.0 0000:05:00.0
0000:00:01.1 0000:01:00.1 0000:01:00.3 0000:03:00.0 0000:03:08.0 0000:04:00.1 0000:05:00.1
$
之後,將4個82599網卡替換爲vfio-pci驅動(將這些接口unbind,刪除ixgbe驅動,應該也可以),再次啓動l2fwd。
# ./l2fwd -l 0-3 -n 4 -- -q 8 -p 3
EAL: Detected 8 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
...
Checking link statusdone
Port0 Link Up. Speed 10000 Mbps - full-duplex
Port1 Link Up. Speed 10000 Mbps - full-duplex
L2FWD: lcore 1 has nothing to do
L2FWD: lcore 2 has nothing to do
L2FWD: lcore 3 has nothing to do
L2FWD: entering main loop on lcore 0
L2FWD: -- lcoreid=0 portid=0
L2FWD: -- lcoreid=0 portid=1
測試儀測試兩個X710端口的UDP吞吐,64字節小包,雙向流量測試,結果是,每個方向能達到5784Mbps,帶寬的一半多。這是在使用兩個核心的情況下,如果配置多隊列多核心處理,應當可以達到線速。對比內核的i40e驅動,把個隊列平均分配到8個核心上的情況,每個方向流量也是到50%左右。
# ./build/app/testpmd -l 0-3 -n 4 -- -i --portmask=0x1 --nb-cores=2
testpmd>
testpmd> show port info all
********************* Infos for port 0 *********************
MAC address: 00:60:E0:6F:C8:7D
Device name: 0000:01:00.0
Driver name: net_i40e
...
Current number of RX queues: 1
Max possible RX queues: 192
...
Current number of TX queues: 1
Max possible TX queues: 192