深入探索 Linux listen() 函數 backlog 的含義

轉載至：https://blog.csdn.net/yangbodong22011/article/details/60399728

1：listen()回顧以及問題引入

listen()函數是網絡編程中用來使服務器端開始監聽端口的系統調用，首先來回顧下listen()函數的定義：

有關於第二個參數含義的問題網上有好幾種說法，我總結了下主要有這麼3種：

Kernel會爲LISTEN狀態的socket維護一個隊列，其中存放SYN RECEIVED和ESTABLISHED狀態的套接字，backlog就是這個隊列的大小。
Kernel會爲LISTEN狀態的socket維護兩個隊列，一個是SYN RECEIVED狀態，另一個是ESTABLISHED狀態，而backlog就是這兩個隊列的大小之和。
第三種和第二種模型一樣，但是backlog是隊列ESTABLISHED的長度。

有關上面說的兩個狀態SYN RECEIVED狀態和ESTABLISHED狀態，是TCP三次握手過程中的狀態轉化，具體可以參考下面的圖（在新窗口打開圖片）：

2：正確的解釋

那上面三種說法到底哪個是正確的呢？我下面的說法翻譯自這個鏈接：

http://veithen.github.io/2014/01/01/how-tcp-backlog-works-in-linux.html

下面我翻譯下作者的文章：

When an application puts a socket into LISTEN state using the listen syscall, it needs to specify a backlog for that socket. The backlog is usually described as the limit for the queue of incoming connections.

當一個應用使用listen系統調用讓socket進入LISTEN狀態時，它需要爲該套接字指定一個backlog。backlog通常被描述爲連接隊列的限制。

Because of the 3-way handshake used by TCP, an incoming connection goes through an intermediate state SYN RECEIVED before it reaches the ESTABLISHED state and can be returned by the accept syscall to the application (see the part of the TCP state diagram reproduced above). This means that a TCP/IP stack has two options to implement the backlog queue for a socket in LISTEN state：

由於TCP使用的3次握手，連接在到達ESTABLISHED狀態之前經歷中間狀態SYN RECEIVED，並且可以由accept系統調用返回到應用程序。這意味着TCP / IP堆棧有兩個選擇來爲LISTEN狀態的套接字實現backlog隊列：

（備註：一種就是兩種狀態在一個隊列，一種是分別在一個隊列）

1 : The implementation uses a single queue, the size of which is determined by the backlog argument of the listen syscall. When a SYN packet is received, it sends back a SYN/ACK packet and adds the connection to the queue. When the corresponding ACK is received, the connection changes its state to ESTABLISHED and becomes eligible for handover to the application. This means that the queue can contain connections in two different state: SYN RECEIVED and ESTABLISHED. Only connections in the latter state can be returned to the application by the accept syscall.

1：使用單個隊列實現，其大小由listen syscall的backlog參數確定。當收到SYN數據包時，它發送回SYN/ACK數據包，並將連接添加到隊列。當接收到相應的ACK時，連接將其狀態改變爲已建立。這意味着隊列可以包含兩種不同狀態的連接：SYN RECEIVED和ESTABLISHED。只有處於後一狀態的連接才能通過accept syscall返回給應用程序。

2 : The implementation uses two queues, a SYN queue (or incomplete connection queue) and an accept queue (or complete connection queue). Connections in state SYN RECEIVED are added to the SYN queue and later moved to the accept queue when their state changes to ESTABLISHED, i.e. when the ACK packet in the 3-way handshake is received. As the name implies, the accept call is then implemented simply to consume connections from the accept queue. In this case, the backlog argument of the listen syscall determines the size of the accept queue.

2 ：使用兩個隊列實現，一個SYN隊列（或半連接隊列）和一個accept隊列（或完整的連接隊列）。處於SYN RECEIVED狀態的連接被添加到SYN隊列，並且當它們的狀態改變爲ESTABLISHED時，即當接收到3次握手中的ACK分組時，將它們移動到accept隊列。顯而易見，accept系統調用只是簡單地從完成隊列中取出連接。在這種情況下，listen syscall的backlog參數表示完成隊列的大小。

Historically, BSD derived TCP implementations use the first approach. That choice implies that when the maximum backlog is reached, the system will no longer send back SYN/ACK packets in response to SYN packets. Usually the TCP implementation will simply drop the SYN packet (instead of responding with a RST packet) so that the client will retry.

歷史上，BSD 派生系統實現的TCP使用第一種方法。該選擇意味着當達到最大backlog時，系統將不再響應於SYN分組發送回SYN/ACK分組。通常，TCP的實現將簡單地丟棄SYN分組，使得客戶端重試。

On Linux, things are different, as mentioned in the man page of the listen syscall:
The behavior of the backlog argument on TCP sockets changed with Linux 2.2. Now it specifies the queue length for completely established sockets waiting to be accepted, instead of the number of incomplete connection requests. The maximum length of the queue for incomplete sockets can be set using /proc/sys/net/ipv4/tcp_max_syn_backlog.

在Linux上，是和上面不同的。如在listen系統調用的手冊中所提到的：
在Linux內核2.2之後，socket backlog參數的形爲改變了，現在它指等待accept的完全建立的套接字的隊列長度，而不是不完全連接請求的數量。 不完全連接的長度可以使用/proc/sys/net/ipv4/tcp_max_syn_backlog設置。

This means that current Linux versions use the second option with two distinct queues: a SYN queue with a size specified by a system wide setting and an accept queue with a size specified by the application.

這意味着當前Linux版本使用上面第二種說法，有兩個隊列：具有由系統範圍設置指定的大小的SYN隊列 和應用程序（也就是backlog參數）指定的accept隊列。

OK，說到這裏，相信backlog含義已經解釋的非常清楚了，下面我們用實驗驗證下這種說法：

3：實驗驗證

驗證環境：

RedHat 7
Linux version 3.10.0-514.el7.x86_64

驗證思路：

1：客戶端開多個線程分別創建socket去連接服務端。
2：服務端在listen之後，不去調用accept，也就是不會從已完成隊列中取走socket連接。
3：觀察結果，到底服務端會怎麼樣？處於ESTABLISHED狀態的套接字個數是不是就是backlog參數指定的大小呢？

我們定義backlog的大小爲5:

# define BACKLOG 5

看下我係統上默認的SYN隊列大小：

也就是我現在兩個隊列的大小分別是：

SYN隊列大小：256
ACCEPT隊列大小：5

看看我們的服務端程序 server.c :

#include<stdio.h>
#include<sys/types.h>
#include<sys/socket.h>
#include<sys/time.h>
#include<netinet/in.h>
#include<arpa/inet.h>
#include<errno.h>
#include<stdlib.h>
#include<string.h>
#include<unistd.h>

#define PORT  8888    //端口號
#define BACKLOG 5     //BACKLOG大小

void my_err(const char* msg,int line) 
{
    fprintf(stderr,"line:%d",line);
    perror(msg);
}


int main(int argc,char *argv[])
{
    int conn_len;
    int sock_fd,conn_fd;
    struct sockaddr_in serv_addr,conn_addr;


    if((sock_fd = socket(AF_INET,SOCK_STREAM,0)) == -1) { 
        my_err("socket",__LINE__); 
        exit(1);
    }

    memset(&serv_addr,0,sizeof(struct sockaddr_in));
    serv_addr.sin_family = AF_INET;
    serv_addr.sin_port = htons(PORT);
    serv_addr.sin_addr.s_addr = htonl(INADDR_ANY);


    if(bind(sock_fd,(struct sockaddr *)&serv_addr,sizeof(struct sockaddr_in)) == -1) {
        my_err("bind",__LINE__);
        exit(1);
    }

    if(listen(sock_fd,BACKLOG) == -1) {
        my_err("sock",__LINE__);
        exit(1);
    }

    conn_len = sizeof(struct sockaddr_in);


    sleep(10);                  //sleep 10s之後接受一個連接
    printf("I will accept one\n");
    accept(sock_fd,(struct sockaddr *)&conn_addr,(socklen_t *)&conn_len);

    sleep(10);                  //同理，再接受一個
    printf("I will accept one\n");
    accept(sock_fd,(struct sockaddr *)&conn_addr,(socklen_t *)&conn_len);

    sleep(10);                  //同理，再次接受一個
    printf("I will accept one\n");
    accept(sock_fd,(struct sockaddr *)&conn_addr,(socklen_t *)&conn_len);


    while(1) {}  //之後進入while循環,不釋放連接
    return 0;
}

客戶端程序 client.c：

#include<stdio.h>
#include<sys/types.h>
#include<sys/socket.h>
#include<netinet/in.h>
#include<arpa/inet.h>
#include<string.h>
#include<strings.h>
#include<stdlib.h>
#include<unistd.h>
#include<pthread.h>

#define PORT 8888
#define thread_num 10  //定義創建的線程數量

struct sockaddr_in serv_addr;

void *func() 
{
    int conn_fd;
    conn_fd = socket(AF_INET,SOCK_STREAM,0);
    printf("conn_fd : %d\n",conn_fd);

    if( connect(conn_fd,(struct sockaddr *)&serv_addr,sizeof(struct sockaddr_in)) == -1) {
        printf("connect error\n");
    }

    while(1) {}
}

int main(int argc,char *argv[])
{
    memset(&serv_addr,0,sizeof(struct sockaddr_in));
    serv_addr.sin_family = AF_INET;
    serv_addr.sin_port = htons(PORT);
    inet_aton("192.168.30.155",(struct in_addr *)&serv_addr.sin_addr); //此IP是局域網中的另一臺主機
    int retval;

    //創建線程並且等待線程完成
    pthread_t pid[thread_num];
    for(int i = 0 ; i < thread_num; ++i)
    {
        pthread_create(&pid[i],NULL,&func,NULL);

    }

    for(int i = 0 ; i < thread_num; ++i)
    {
        pthread_join(pid[i],(void*)&retval);
    }

    return 0;
}

編譯運行程序，並用netstat命令監控服務端8888端口的情況：

$ gcc server.c -o server
$ gcc client.c -o client -lpthread -std=c99
watch -n 1 “netstat -natp | grep 8888” //root執行
//watch -n 1 表示每秒顯示一次引號中命令的結果
//netstat n: 以數字化顯示 a:all t:tcp p:顯示pid和進程名字然後我們grep端口號8888就行了。
$ ./server
$ ./client

結果如下：

首先是watch的情況：

因爲我們客戶端用10個線程去連接服務器，因此服務器上有10條連接。
第一行的./server狀態是LISTEN，這是服務器進程。
倒數第三行的./server是服務器已經執行了一次accept。
6條ESTABLISHED狀態比我們的BACKLOG參數5大1。
剩餘的SYN_RECV狀態即使收到了客戶端第三次握手迴應的ACK也不能成爲ESTABLISHED狀態，因爲BACKLOG隊列中沒有位置。

然後過了10s左右，等到服務器執行了第二個accept之後，服務器情況如下，它執行了第二個accept：

此時watch監控的畫面如下:

和上面相比，服務器再次accept之後，多了一條./server的連接。
有一條連接從SYN_RECV狀態轉換到了ESTABLISHED狀態，原因是accept函數從BACKlOG完成的隊列中取出了一個連接，接着有空間之後，SYN隊列的一個鏈接就可以轉換成ESTABLISHED狀態然後放入BACKlOG完成隊列了。

好了，分析到這裏，有關BACKLOG的問題已經解決了，至於繼續上面的實驗將backlog的參數調大會怎麼樣呢？我試過了，就是ESTABLISHED狀態的數量也會增大，值會是BACKLOG+1，至於爲什麼是BACKLOG+1呢？？？我也沒有搞懂。歡迎指教。

當然，還有別的有意思的問題是 : 如果ESTABLISHED隊列滿了，可是有連接需要從SYN隊列轉移過來時會發生什麼？

請參考：如果ESTABLISHED隊列滿了，可是有連接需要從SYN隊列轉移

深入探索 Linux listen() 函數 backlog 的含義

1：listen()回顧以及問題引入

2：正確的解釋

http://veithen.github.io/2014/01/01/how-tcp-backlog-works-in-linux.html

3：實驗驗證

Wireshark 安裝+使用（一）

博客園商業化之路-衆包平臺：繼續召集早期合作開發者

Boost庫安裝及使用記錄

django第一個項目127.0.0.1:8000不能訪問解決方案

Elasticsearch multi-index join實踐

MySQL常用日期函數使用筆記

supervisor 簡明使用

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結