網卡中斷與多隊列

X86系統採用中斷機制協同處理CPU與其他設備工作。長久以來網卡的中斷默認由cpu0處理，在大量小包的網絡環境下可能出現cpu0負載高，而其他cpu空閒。後來出現網卡多隊列技術解決這個問題。

通過命令cat /proc/interrupts 查看系統中斷信息，應該是長下面這個樣子。第一列是中斷號，比如eth0對應的中斷號是24，後面是每個cpu的中斷數。

[~]# cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       
  0:        124          0          0          0   IO-APIC-edge      timer
  1:          0          3          2          1   IO-APIC-edge      i8042
  8:          0       1434          2        224   IO-APIC-edge      rtc0
  9:          0          0          0          0   IO-APIC-fasteoi   acpi
 11:          0          7          8          6   IO-APIC-fasteoi   uhci_hcd:usb1
 12:          0         40         38         37   IO-APIC-edge      i8042
 14:          0          0          0          0   IO-APIC-edge      ata_piix
 15:          0       1827         36        221   IO-APIC-edge      ata_piix
 24:          0          0          0          0   PCI-MSI-edge      eth0
 25:          0    7725709       1718       1717   PCI-MSI-edge      eth1
...

中斷綁定

我們可以綁定中斷號與處理CPU之間的關係，Linux系統用irqbalance服務優化中斷分配，它能自動收集數據，調度中斷請求。爲了瞭解中斷綁定，我們把irqbalance服務關掉，手工調整綁定關係。

/proc/irq/{IRQ_ID}/smp_affinity，中斷IRQ_ID的CPU親和配置文件，16進制
/proc/irq/{IRQ ID}/smp_affinity_list，10進制，與smp_affinity相通，修改一個相應改變。

[ ~]# cat /proc/irq/24/smp_affinity
0001
[ ~]# cat /proc/irq/24/smp_affinity_list 
0
#上面表示0001對應cpu0，可以直接修改綁定關係
[ ~]# echo 4 > /proc/irq/24/smp_affinity
[ ~]# cat /proc/irq/24/smp_affinity_list 
2
#此時中斷號24對應的處理CPU爲cpu2

[ ~]# mpstat -P ALL 1 1  
Linux 2.6.32-504.23.4.el6.x86_64    03/02/2017  _x86_64_    (10 CPU)

03:04:22 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
03:04:23 PM  all    1.51    0.00    2.41    0.00    0.00    2.91    0.00    0.00   93.17
03:04:23 PM    0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
03:04:23 PM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
03:04:23 PM    2   15.62    0.00   25.00    0.00    0.00   30.21    0.00    0.00   29.17
03:04:23 PM    3    0.00    0.00    0.99    0.00    0.00    0.00    0.00    0.00   99.01

也可以通過查看/proc/interrupts，此時壓測eth0，發現只有cpu2處理的中斷數增加。

[ ~]# cat /proc/interrupts | grep 24:
 24:    5249258          0    1304158    2074483   PCI-MSI-edge      eth0
[ ~]# cat /proc/interrupts | grep 24:
 24:    5249258          0    1516771    2074483   PCI-MSI-edge      eth0

親緣性文件smp_affinity是16進制掩碼，可以配置一箇中斷號和多個cpu綁定，單測試結果並沒有將中斷自動分配到多個CPU。

[ ~]# echo 11 > /proc/irq/24/smp_affinity
[ ~]# cat /proc/irq/24/smp_affinity      
0011
#16進制11，表示二進制0000,0000,0001,0001，代表cpu0, cpu4
[ ~]# cat /proc/irq/24/smp_affinity_list 
0,4

網卡多隊列

RSS（Receive Side Scaling）是網卡的硬件特性，實現了多隊列，可以將不同的流分發到不同的CPU上。

通過將中斷號綁定到多CPU並沒有真正實現中斷的分配。支持RSS的網卡，通過多隊列技術，每個隊列對應一箇中斷號，通過對每個中斷的綁定，可以實現網卡中斷在cpu多核上的分配。

[ ~]# ls /sys/class/net/eth0/queues/
rx-0  rx-2  rx-4  rx-6  tx-0  tx-2  tx-4  tx-6
rx-1  rx-3  rx-5  rx-7  tx-1  tx-3  tx-5  tx-7

#eth0都多個隊列，/proc/interrupts截取一段，長下面的樣子
  95:          1          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0
  96:        161    2175974    2046333    4627889   74362460          0          0          0          0          0          0          0          0          0       8340   39971887     111995        452          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-0
  97:         18   27180874    5828740    3181746    1673296          0          0          0          0          0          0          0          0          0          0    7981462          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-1
  98:       4255   20655084    5985539    3175797    2903580          0          0          0          0          0          0          0          0          0          0   11786675       2485          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-2
  99:         26   14077166    9826129    3129857    3050199          0          0          0          0          0          0          0          0          0          0   15454795          0       1252          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-3
 100:         80   13133364    9766015    2728504    3768519          0          0          0          0          0          0          0          0          0          0   14714758          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-4
 101:         18   11351909   15644814    3581350    3822988          0          0          0          0          0          0          0          0          0          0   13055960          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-5
 102:       2962    7283522   25860133   11902055    4747040          0          0          0          0          0          0          0          0          0          0    9042550        200          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-6
 103:         18   12908096   12612013    3411346    5934445          0          0          0          0          0          0          0          0          0          0   10059911          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-7

這樣就可以通過對eth0的不同隊列的中斷號進行綁定。

RPS/RFS

RSS需要硬件支持，在不支持RSS的環境中，RPS/RFS提供了軟件的解決方案。RPS（Receive Packet Steering）是把一個rx隊列的軟中斷分發到多個CPU核上，從而達到負載均衡的目的。RFS（Receive Flow Steering）是RPS的擴展，RPS只依靠hash來控制數據包，提供負載平衡，但是沒有考慮到應用程序的位置（指應用程序所在CPU）。RFS目標是通過指派應用線程正在運行的CPU處理中斷，增加數據緩存的命中率。

[ ~]# echo 7 > /sys/class/net/eth0/queues/rx-0/rps_cpus
#開啓rps，16進制7代表二進制111，對應CPU0-2

[ ~]# mpstat -P ALL 1 1
03:32:42 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
03:32:43 PM  all    1.72    0.00    2.63    0.00    0.00    3.43    0.00    0.00   92.22
03:32:43 PM    0    7.95    0.00   12.50    0.00    0.00   10.23    0.00    0.00   69.32
03:32:43 PM    1    2.88    0.00    8.65    0.00    0.00   14.42    0.00    0.00   74.04
03:32:43 PM    2    5.94    0.00    5.94    0.00    0.00   10.89    0.00    0.00   77.23
03:32:43 PM    3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
此時進行壓測結果顯示軟中斷基本分配到cpu0-2

注意

由於設備比較多，測試數據太長，上面的數據都是經過編輯，改動的地方不影響對中斷、綁定和多隊列的相關驗證。

網卡中斷與多隊列

網卡中斷與多隊列

中斷綁定

網卡多隊列

RPS/RFS

注意

KVM網絡虛擬化（一）

OVS 配置虛機vlan

Golang調度管理筆記--如何保證G不會被其他G的系統調用、網絡調用阻塞？

網卡中斷與多隊列

Python下劃線和私有變量

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結