TCP擁塞窗口驗證

如果在一個RTO時長內,擁塞窗口沒有被完全的使用,TCP發送端將減小擁塞窗口。因爲此時TCP發送端的擁塞窗口可能並非當前的網絡狀況,所以發送端應減小擁塞窗口。根據RFC2861,ssthresh應設置爲其當前值與3/4倍的擁塞窗口值兩者之間的最大值,而擁塞窗口設置爲實際使用的量和當前擁塞窗口值之和的一半。

在如下發送函數tcp_write_xmit中,如果實際執行了發送報文操作,即sent_pkts數量不爲零,在發送之後,調用tcp_cwnd_validate函數驗證擁塞窗口。其參數is_cwnd_limited表明報文發送是否被擁塞窗口所限,其由兩個部分決定,其一是函數tcp_tso_should_defer中的賦值;其二是判斷網絡中的報文數量是否大於擁塞窗口,兩處賦值爲或的關係。

static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, int push_one, gfp_t gfp)
{
    max_segs = tcp_tso_segs(sk, mss_now);
    while ((skb = tcp_send_head(sk))) {
        ...
        tso_segs = tcp_init_tso_segs(skb, mss_now);

        if (tso_segs == 1) {
        } else {
            if (!push_one &&
                tcp_tso_should_defer(sk, skb, &is_cwnd_limited,
                         &is_rwnd_limited, max_segs))
                break;
        }
        ...
    }
    ...
    if (likely(sent_pkts)) {
        ...
        is_cwnd_limited |= (tcp_packets_in_flight(tp) >= tp->snd_cwnd);
        tcp_cwnd_validate(sk, is_cwnd_limited);
        return false;
    }

以下爲tcp_tso_should_defer函數,在發送單個報文時不執行。如果判斷到擁塞窗口小於發送窗口,並且擁塞窗口小於等於報文長度時,意味着當前報文不能發送,設置擁塞窗口限制標誌is_cwnd_limited。

static bool tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb,
                 bool *is_cwnd_limited, bool *is_rwnd_limited, u32 max_segs)
{
    send_win = tcp_wnd_end(tp) - TCP_SKB_CB(skb)->seq;

    /* From in_flight test above, we know that cwnd > in_flight.  */
    cong_win = (tp->snd_cwnd - in_flight) * tp->mss_cache;

    ...
    /* Ok, it looks like it is advisable to defer.
     * Three cases are tracked :
     * 1) We are cwnd-limited
     * 2) We are rwnd-limited
     * 3) We are application limited.
     */
    if (cong_win < send_win) {
        if (cong_win <= skb->len) {
            *is_cwnd_limited = true;
            return true;
        }
    } else {

以下擁塞窗口驗證函數,第一次執行此函數時,max_packets_out和max_packets_seq均未賦值,分別爲兩者賦值爲packets_out和SND.NXT的值。之後再次執行此函數時,只有進入下一個發送窗口期或者發送報文大於記錄值時進行更新。由此可見,前者max_packets_out記錄的爲上一個窗口發送的最大報文數量;而後者max_packets_seq記錄的爲最大的發送序號。

static void tcp_cwnd_validate(struct sock *sk, bool is_cwnd_limited)
{
    const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops;
    struct tcp_sock *tp = tcp_sk(sk);

    /* Track the maximum number of outstanding packets in each
     * window, and remember whether we were cwnd-limited then.
     */
    if (!before(tp->snd_una, tp->max_packets_seq) ||
        tp->packets_out > tp->max_packets_out) {
        tp->max_packets_out = tp->packets_out;
        tp->max_packets_seq = tp->snd_nxt;
        tp->is_cwnd_limited = is_cwnd_limited;
    }

參數is_cwnd_limited記錄了上一個發送窗口期是否受到了擁塞窗口的限制。函數tcp_is_cwnd_limited判斷連接的發送是否受限於擁塞窗口,爲真表明當前發送使用了全部可用網絡資源,反之,表明存在空閒的網絡資源。

在後一種情況下,記錄當前網絡中報文數量到變量snd_cwnd_used中。如果內核配置開啓了在空閒時長超過RTO之後,復位擁塞窗口的功能,即tcp_slow_start_after_idle爲真,並且空閒時長大於等於RTO,並且擁塞控制算法未定義相關處理,這裏調用tcp_cwnd_application_limited函數,處理應用限速的情況。

    if (tcp_is_cwnd_limited(sk)) {
        /* Network is feed fully. */
        tp->snd_cwnd_used = 0;
        tp->snd_cwnd_stamp = tcp_jiffies32;
    } else {
        /* Network starves. */
        if (tp->packets_out > tp->snd_cwnd_used)
            tp->snd_cwnd_used = tp->packets_out;

        if (sock_net(sk)->ipv4.sysctl_tcp_slow_start_after_idle &&
            (s32)(tcp_jiffies32 - tp->snd_cwnd_stamp) >= inet_csk(sk)->icsk_rto &&
            !ca_ops->cong_control)
            tcp_cwnd_application_limited(sk);

以下檢查是否爲發送端緩存不足引起的空閒,首先排除擁塞窗口的原因,其次是發送隊列爲空,而且應用程序遇到了緩存限值,記錄下發送緩存限值標誌TCP_CHRONO_SNDBUF_LIMITED。

        /* The following conditions together indicate the starvation
         * is caused by insufficient sender buffer:
         * 1) just sent some data (see tcp_write_xmit)
         * 2) not cwnd limited (this else condition)
         * 3) no more data to send (tcp_write_queue_empty())
         * 4) application is hitting buffer limit (SOCK_NOSPACE)
         */
        if (tcp_write_queue_empty(sk) && sk->sk_socket &&
            test_bit(SOCK_NOSPACE, &sk->sk_socket->flags) &&
            (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT))
            tcp_chrono_start(sk, TCP_CHRONO_SNDBUF_LIMITED);

對於是否爲擁塞窗口受限,內核的判斷與RFC2861略有不同,RFC2861建議如果CWND沒有全部的使用,不應增加其值,這正符合內核在擁塞避免階段的實現。但是,對於慢啓動階段,內核允許擁塞窗口增長到使用量的一倍。參見tcp_is_cwnd_limited相關注釋,在初始窗口爲10,發送了9個報文後,如果所有報文都被確認了,將窗口增加到18。這將有利於限速應用程序更好的探測網絡帶寬。

/* We follow the spirit of RFC2861 to validate cwnd but implement a more
 * flexible approach. The RFC suggests cwnd should not be raised unless
 * it was fully used previously. And that's exactly what we do in
 * congestion avoidance mode. But in slow start we allow cwnd to grow
 * as long as the application has used half the cwnd.
 * Example :
 *    cwnd is 10 (IW10), but application sends 9 frames.
 *    We allow cwnd to reach 18 when all frames are ACKed.
 * This check is safe because it's as aggressive as slow start which already
 * risks 100% overshoot. The advantage is that we discourage application to
 * either send more filler packets or data to artificially blow up the cwnd
 * usage, and allow application-limited process to probe bw more aggressively.
 */
static inline bool tcp_is_cwnd_limited(const struct sock *sk)
{
    const struct tcp_sock *tp = tcp_sk(sk);

    /* If in slow start, ensure cwnd grows to twice what was ACKed. */
    if (tcp_in_slow_start(tp))
        return tp->snd_cwnd < 2 * tp->max_packets_out;

    return tp->is_cwnd_limited;
}

以下函數tcp_cwnd_application_limited在網絡空閒(未充分利用)RTO時長之後調整擁塞窗口,此調整不針對重傳階段,以及應用程序受到發送緩存限值的情況。首先獲得窗口的使用情況,取值爲初始窗口值和tcp_cwnd_validate函數中記錄的使用值snd_cwnd_used之間的較大值。擁塞窗口值調整爲原窗口值與窗口使用值之和的一半。

/* RFC2861, slow part. Adjust cwnd, after it was not full during one rto.
 * As additional protections, we do not touch cwnd in retransmission phases,
 * and if application hit its sndbuf limit recently.
 */
static void tcp_cwnd_application_limited(struct sock *sk)
{
    struct tcp_sock *tp = tcp_sk(sk);

    if (inet_csk(sk)->icsk_ca_state == TCP_CA_Open &&
        sk->sk_socket && !test_bit(SOCK_NOSPACE, &sk->sk_socket->flags)) {
        /* Limited by application or receiver window. */
        u32 init_win = tcp_init_cwnd(tp, __sk_dst_get(sk));
        u32 win_used = max(tp->snd_cwnd_used, init_win);
        if (win_used < tp->snd_cwnd) {
            tp->snd_ssthresh = tcp_current_ssthresh(sk);
            tp->snd_cwnd = (tp->snd_cwnd + win_used) >> 1;
        }
        tp->snd_cwnd_used = 0;
    }
    tp->snd_cwnd_stamp = tcp_jiffies32;
}

內核版本 5.0

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章