switchdev qos

overview:

https://github.com/Mellanox/mlxsw/wiki/Quality-of-Service

Packet進入交換機之後,會被分配一個switch priority。Packet的switch priority(SP 0-7)可以根據8021q優先級或者ip頭DSCP字段進行設置,如果是基於8021q優先級,那麼它們之間是1對1的映射關係;如果基於DSCP,那麼根據下面的命令進行設置:

lldptool -T -i sw1p5 -v APP      app=3,5,24  #insert map sp 3 to dscp 24
lldptool -T -i sw1p5 -v APP -d app=3,5,24  #delete
lldptool -T -i sw1p5 -v APP -c app                #show
當端口的第一條APP規則被添加之後,就切換到使用DSCP了,否則默認使用8021q進行映射。

然後Packet被根據SP(switch priority)放入到端口的headroom buffer。端口的headroom buffer(就是PG buffer,priority group buffer)用來存儲端口的incoming packets(在packet被交換機的pipeline處理的過程中,報文一直放在這裏,就是尋找出接口的過程吧),也用來存儲不允許進入shared buffer的lossless flow。使用lldp ets up2tc設置SP到PG buffer的映射關係。

lldptool -T -i sw1p5 -V ETS-CFG up2tc=0:0,1:1,2:2,3:3,4:4,5:5,6:6,7:7
這個命令創建了SP到TC的映射
也創建了SP到PG buffer的映射

lldptool -T -i sw1p5 -V ETS-CFG tsa=0:ets,1:ets,2:ets,3:ets,4:ets,5:ets,6:ets,7:ets tcbw=12,12,12,12,13,13,13,13(和必須是100)
lldptool -T -i sw1p5 -V ETS-CFG tsa=0:strict,1:strict,2:strict,3:strict,4:strict,5:strict,6:strict,7:strict 

devlink sb
# 可以設置出方向每個接口每個tc使用的pool以及quota
# 可以設置入方向每個接口每個tc使用的pool以及quota


一旦經過交換機的pipeline處理完成之後,packet的ingress port、Switch Priority (SP)、egress port、TC就被確認了。根據這些信息,packet被分類放到不同的ingress和engress shared buffer(只有一個shared buffer,ingress pool和engress pool只是一虛擬的種容器,讓你可以方便的控制admission rules。當有packet匹配pool之後,它們就會增加該pool的使用情況計數。
Note that there’s one shared buffers and the pools are simply containers meant to help you formulate the admission rules to the shared buffer.
As I explained above, there’s really one shard buffer. The devlink command you typed simply means that the packet will be counted as part of pool 4.
You can have up to 4 pools for each direction.
)。在進入shared buffer之前,與packet相關的shared buffer quota會被檢查,確定packet是否允許進入shared buffer。(devlink sb)

packet駐留在shared buffer中直到被出端口發送出去。packet被根據它的TC放到不同的隊列中。然後根據各個TC的TSA進行調度(有Strict Priority algorithm和ets兩種)。使用lldp ets設置SP到TC的映射和各個TC的TSA。

> Thank you, Ido.
>
> So,according to your explain, my understanding is:
>
> The packet only have one TC(i think the packet have different ingress TC and egress TC before),
> and the TC is determined by egress port's up2tc setting ?

Ingress TC = PG.

When a packet arrives, it's classified to a PG buffer based on its
802.1p priority and up2tc mapping you configured on the ingress port.

The packet then goes through the switch's pipeline which determines its
egress port. The egress TC is determined based on the packet's 802.1p
priority and the egress port's up2tc mapping.

You now have the following information about the packet: Ingress Port,
Ingress PG, Egress Port, Egress TC, which the switch uses the check
admission for the shared buffer, where the packet is stored prior to
transmission.

> Once out of the switch's pipeline, the packet's TC is known, we assume it is TC1.
>
> If the TC1 packet pass below Admission Rules, it will be sent to shared buffer.
>     Ingress{Port}.Usage < Thres && Ingress{Port,PG}.Usage < Thres && Egress{Port}.Usage < Thres && Egress{Port,TC}.Usage < Thres
>
> We assume the packet is received from Port1 and egress port2.
> And the mapping between port TC to pool is like this:
>     devlink sb tc bind set Port1 tc 1 type ingress pool 0 th 9

Packet will be mapped to ingress TC (PG) 1 according to up2tc
configuration on Port1.

>     devlink sb tc bind set Port2 tc 1 type egress pool 4 th 9

Packet will be mapped to egress TC 1 according to up2tc configuration of
Port2.

> Then the packet will be counted as part of pool 0(because the packet is received from Port1, and it's TC is 1, so map it to pool 0 according above setting),
> and the packet will also be conuted as part of pool 4(because it will egress Port2, and it's TC is 1, so map it to pool 4 according above setting).
>
> Is this right ?

Yes. When a packet is admitted to the shared buffer it increments four
quotas:
Ingress{Port}, Ingress{Port, PG}, Egress{Port}, Egress{Port, TC}

Please let me know if further clarifications are required.

如果packet屬於lossless flow(它所屬的priority開啓了PFC,就是lossless flow),並且這個packet不允許進入shared buffer,那麼它會被存放到headroom中。

ETS:

The transmit path of a network port is modeled as a set of queues called traffic classes which are numbered 0 through N-1, where N is in the range 1 to 8. The user priorities 0-7 are mapped to the set of traffic classes. Further details and definition of the default priority to traffic class mappings are provided in the IEEE Standard 802.1Q-2011.

A transmission selection algorithm is used to select which traffic class is chosen next to dequeue a frame and transmit to the LAN. The default transmission selection algorithm is the Strict Priority algorithm. This algorithm always selects the highest numbered traffic class which has frames to transmit first before a lower numbered traffic class is selected.

Since the Strict Priority algorithm could allow a traffic flow on a higher numbered traffic class to block a lower numbered traffic class from getting a chance to transmit, another traffic selection algorithm has been defined for DCB called the Enhanced Transmission Selection (ETS) algorithm. ETS works by assigning a percentage of available bandwidth to traffic classes. Available bandwidth is defined as the amount of bandwidth left after higher priority transmission algorithms (like Strict Priority) have executed. The bandwidth percentage allocated to an ETS traffic class is the guaranteed amount of available bandwidth which will be made available to that traffic class. If an ETS traffic class does not use all of the bandwidth allocated to it, then other ETS traffic classes may be able to exceed their bandwidth allocations.

ETS allows multiple traffic flows operating on different traffic classes to each receive their fair share of network bandwidth. Obviously, if the strict priority algorithm is used in combination with the ETS algorithm, then care should be taken to ensure that the traffic flows on the strict priority traffic classes are relatively low volume flows.

lldptool Priority-based Flow Control (PFC)

To enable PFC for priorities 1, 2 and 3, run:
$ lldptool -T -i sw1p5 -V PFC enabled=1,2,3

devlink sb

To bind packets originating from a {Port, PG} to an ingress pool, run:
devlink sb tc bind set pci/0000:03:00.0/1 tc 0 type ingress pool 0 th 9
Similarly for egress, to bind packets directed to a {Port, TC} to an egress pool, run:
devlink sb tc bind set sw1p17 tc 0 type egress pool 4 th 9

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章