redis網絡通信模塊源碼分析(上)

導讀

時下的業界,相對於傳統的關係型數據庫,以key-value思想實現的nosql內存數據庫也是非常流行,而提到內存數據庫,很多人第一反應就是redis。確實,redis以其高效的性能和優雅的實現成爲衆多內存數據庫中的翹楚。前面章節介紹了單個服務器的基本結構,這個章節我們再來一個實戰演習,即以redis爲例來講解實際項目的中服務器結構是怎樣的。當然,本文介紹的角度與前面的章節思路不一樣,前面的章節是先給結論,然後再加以論證,而本文則是假設預先不清楚redis網絡通信層的結構,結合gdb調試,以探究的方式逐步搞清楚redis的網絡通信模塊結構。

redis源碼下載與編譯

redis的最新源碼下載地址可以在redis官網(https://redis.io/)獲得。筆者使用的是CentOS 7.0系統,因此使用wget命令將redis源碼文件下載下來:

[root@localhost gdbtest]# wget http://download.redis.io/releases/redis-4.0.11.tar.gz
--2018-09-08 13:08:41--  http://download.redis.io/releases/redis-4.0.11.tar.gz
Resolving download.redis.io (download.redis.io)... 109.74.203.151
Connecting to download.redis.io (download.redis.io)|109.74.203.151|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1739656 (1.7M) [application/x-gzip]
Saving to: ‘redis-4.0.11.tar.gz’

54% [==================================================================>                                                         ] 940,876     65.6KB/s  eta 9s

解壓:

[root@localhost gdbtest]# tar zxvf redis-4.0.11.tar.gz 

進入生成的redis-4.0.11目錄使用makefile進行編譯:

[root@localhost gdbtest]# cd redis-4.0.11
[root@localhost redis-4.0.11]# make -j 4

編譯成功後,會在src目錄下生成多個可執行程序,其中redis-serverredis-cli是我們需要即將調試的程序。

我們可以進入src目錄,使用gdb啓動redis-server這個程序:

[root@localhost src]# gdb redis-server 
Reading symbols from /root/redis-4.0.9/src/redis-server...done.
(gdb) r
Starting program: /root/redis-4.0.9/src/redis-server 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
31212:C 17 Sep 11:59:50.781 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
31212:C 17 Sep 11:59:50.781 # Redis version=4.0.9, bits=64, commit=00000000, modified=0, pid=31212, just started
31212:C 17 Sep 11:59:50.781 # Warning: no config file specified, using the default config. In order to specify a config file use /root/redis-4.0.9/src/redis-server /path/to/redis.conf
31212:M 17 Sep 11:59:50.781 * Increased maximum number of open files to 10032 (it was originally set to 1024).
[New Thread 0x7ffff07ff700 (LWP 31216)]
[New Thread 0x7fffefffe700 (LWP 31217)]
[New Thread 0x7fffef7fd700 (LWP 31218)]
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 4.0.9 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 31212
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

31212:M 17 Sep 11:59:50.793 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
31212:M 17 Sep 11:59:50.793 # Server initialized
31212:M 17 Sep 11:59:50.793 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
31212:M 17 Sep 11:59:50.794 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
31212:M 17 Sep 11:59:50.794 * DB loaded from disk: 0.000 seconds
31212:M 17 Sep 11:59:50.794 * Ready to accept connections

以上是redis-server的啓動成功後的畫面。

我們再開一個session,再次進入redis源碼所在的src目錄,然後使用gdb啓動redis客戶端redis-cli:

[root@localhost src]# gdb redis-cli
Reading symbols from /root/redis-4.0.9/src/redis-cli...done.
(gdb) r
Starting program: /root/redis-4.0.9/src/redis-cli 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
127.0.0.1:6379> 

以上是redis-cli啓動成功後的畫面。

通信示例

我們本章節的目的是爲了學習和研究redis的網絡通信模塊,我們並不關心redis其他一些內容,所以爲了說明問題方便,我們使用的一個簡單的通信實例,即通過redis-cli產生一個key爲"hello",值爲"world"的key-value數據,然後得到redis-server的響應。我們通過這樣一個實例來研究redis的網絡通信模塊。

127.0.0.1:6379> set hello world
OK
127.0.0.1:6379> 

探究redis-server端的網絡通信模塊

我們先研究redis-server端的通信模塊。

傾聽socket初始化工作

通過前面章節的介紹,我們知道網絡通信的本質在應用層上的大致流程如下:

1.服務器端創建偵聽socket

2.將偵聽socket綁定到需要的ip地址和端口上(調用Socket API bind函數)

3.偵聽(調用socket API listen函數)

4.無限等待客戶端連接到來,調用Socket API accept函數接受客戶端連接,併產生一個與該客戶端對應的客戶端socket。

5.處理在客戶端socket上的網絡數據的收發,必要時關閉該socket。

根據上面的流程,我們先來探究流程中的1、2、3這三步。由於redis-server默認對客戶端的端口號是6379,我們可以使用這個信息作爲依據。

全局搜索一下redis的代碼,我們尋找調用了bind函數的代碼,經過過濾和篩選,我們確定了位於anet.c的anetListen函數。

static int anetListen(char *err, int s, struct sockaddr *sa, socklen_t len, int backlog) {
    if (bind(s,sa,len) == -1) {
        anetSetError(err, "bind: %s", strerror(errno));
        close(s);
        return ANET_ERR;
    }

    if (listen(s, backlog) == -1) {
        anetSetError(err, "listen: %s", strerror(errno));
        close(s);
        return ANET_ERR;
    }
    return ANET_OK;
}

用gdb的b命令在這個函數上加個斷點,然後重新運行redis-server:

(gdb) b anetListen
Breakpoint 1 at 0x426cd0: file anet.c, line 440.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /root/redis-4.0.9/src/redis-server 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
31546:C 17 Sep 14:20:43.861 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
31546:C 17 Sep 14:20:43.861 # Redis version=4.0.9, bits=64, commit=00000000, modified=0, pid=31546, just started
31546:C 17 Sep 14:20:43.861 # Warning: no config file specified, using the default config. In order to specify a config file use /root/redis-4.0.9/src/redis-server /path/to/redis.conf
31546:M 17 Sep 14:20:43.862 * Increased maximum number of open files to 10032 (it was originally set to 1024).

Breakpoint 1, anetListen (err=0x745bb0 <server+560> "", s=10, sa=0x75dfe0, len=28, backlog=511) at anet.c:440
440     static int anetListen(char *err, int s, struct sockaddr *sa, socklen_t len, int backlog) {

當gdb中斷在這個函數時,我們使用bt命令查看一下此時的調用堆棧:

(gdb) bt
#0  anetListen (err=0x745bb0 <server+560> "", s=10, sa=0x75dfe0, len=28, backlog=511) at anet.c:440
#1  0x0000000000426e25 in _anetTcpServer (err=err@entry=0x745bb0 <server+560> "", port=port@entry=6379, bindaddr=bindaddr@entry=0x0, af=af@entry=10, backlog=511)
    at anet.c:487
#2  0x000000000042792d in anetTcp6Server (err=err@entry=0x745bb0 <server+560> "", port=port@entry=6379, bindaddr=bindaddr@entry=0x0, backlog=<optimized out>)
    at anet.c:510
#3  0x000000000042b01f in listenToPort (port=6379, fds=fds@entry=0x745ae4 <server+356>, count=count@entry=0x745b24 <server+420>) at server.c:1728
#4  0x000000000042f917 in initServer () at server.c:1852
#5  0x0000000000423803 in main (argc=<optimized out>, argv=0x7fffffffe588) at server.c:3857

通過這個堆棧,結合堆棧#2的6379主線程(因爲從堆棧上看,最頂層堆棧是main**函數)中進行的。

我們看下堆棧#1處的代碼:

static int _anetTcpServer(char *err, int port, char *bindaddr, int af, int backlog)
{
    int s = -1, rv;
    char _port[6];  /* strlen("65535") */
    struct addrinfo hints, *servinfo, *p;

    snprintf(_port,6,"%d",port);
    memset(&hints,0,sizeof(hints));
    hints.ai_family = af;
    hints.ai_socktype = SOCK_STREAM;
    hints.ai_flags = AI_PASSIVE;    /* No effect if bindaddr != NULL */

    if ((rv = getaddrinfo(bindaddr,_port,&hints,&servinfo)) != 0) {
        anetSetError(err, "%s", gai_strerror(rv));
        return ANET_ERR;
    }
    for (p = servinfo; p != NULL; p = p->ai_next) {
        if ((s = socket(p->ai_family,p->ai_socktype,p->ai_protocol)) == -1)
            continue;

        if (af == AF_INET6 && anetV6Only(err,s) == ANET_ERR) goto error;
        if (anetSetReuseAddr(err,s) == ANET_ERR) goto error;
        if (anetListen(err,s,p->ai_addr,p->ai_addrlen,backlog) == ANET_ERR) goto error;
        goto end;
    }
    if (p == NULL) {
        anetSetError(err, "unable to bind socket, errno: %d", errno);
        goto error;
    }

error:
    if (s != -1) close(s);
    s = ANET_ERR;
end:
    freeaddrinfo(servinfo);
    return s;
}

將堆棧切換至#1,然後輸入info arg可以查看傳入給這個函數的參數:

(gdb) f 1
#1  0x0000000000426e25 in _anetTcpServer (err=err@entry=0x745bb0 <server+560> "", port=port@entry=6379, bindaddr=bindaddr@entry=0x0, af=af@entry=10, backlog=511)
    at anet.c:487
487             if (anetListen(err,s,p->ai_addr,p->ai_addrlen,backlog) == ANET_ERR) s = ANET_ERR;
(gdb) info args
err = 0x745bb0 <server+560> ""
port = 6379
bindaddr = 0x0
af = 10
backlog = 511

這裏使用系統API getaddrinfo來解析得到當前主機的ip地址和端口信息。這裏沒有選擇使用gethostbyname這個API是因爲gethostbyname僅能用於解析ipv4相關的主機信息,而getaddrinfo既可以用於ipv4也可以用於ipv6,這個函數的簽名如下:

int getaddrinfo(const char *node, const char *service,
                       const struct addrinfo *hints,
                       struct addrinfo **res);

您可以在linux man手冊上查看這個函數的具體用法。通常服務器端在調用getaddrinfo之前,將hints參數的ai_flags設置AI_PASSIVE,用於bind;主機名nodename通常會設置爲NULL,返回通配地址[::]。 當然,客戶端調用getaddrinfo時,hints參數的ai_flags一般不設置AI_PASSIVE,但是主機名node和服務名service(更願意稱之爲端口)則應該不爲空。

解析完畢協議信息後,利用得到的協議信息創建偵聽socket,並開啓該socket的reuseAddr選項。然後調用anetListen函數,在該函數中先bind後listen。至此,redis-server就可以在6379端口上接受客戶端連接了。

接受客戶端連接

同樣的道理,我們要研究redis-server如何接受客戶端連接,我們只要搜索socket API accept函數即可。

經定位,我們最終在anet.c文件中找到anetGenericAccept函數:

static int anetGenericAccept(char *err, int s, struct sockaddr *sa, socklen_t *len) {
    int fd;
    while(1) {
        fd = accept(s,sa,len);
        if (fd == -1) {
            if (errno == EINTR)
                continue;
            else {
                anetSetError(err, "accept: %s", strerror(errno));
                return ANET_ERR;
            }
        }
        break;
    }
    return fd;
}

我們用b命令在這個函數處加個斷點,然後重新運行redis-server。一直到程序全部運行起來,gdb都沒有觸發該斷點,我們新打開一個redis-cli,以模擬新客戶端連接到redis-server上的行爲。斷點觸發了,我們查看下此時的調用堆棧。

Breakpoint 2, anetGenericAccept (err=0x745bb0 <server+560> "", s=s@entry=11, sa=sa@entry=0x7fffffffe2b0, len=len@entry=0x7fffffffe2ac) at anet.c:531
531     static int anetGenericAccept(char *err, int s, struct sockaddr *sa, socklen_t *len) {
(gdb) bt
#0  anetGenericAccept (err=0x745bb0 <server+560> "", s=s@entry=11, sa=sa@entry=0x7fffffffe2b0, len=len@entry=0x7fffffffe2ac) at anet.c:531
#1  0x0000000000427a1d in anetTcpAccept (err=<optimized out>, s=s@entry=11, ip=ip@entry=0x7fffffffe370 "\317P\237[", ip_len=ip_len@entry=46, 
    port=port@entry=0x7fffffffe36c) at anet.c:552
#2  0x0000000000437fb1 in acceptTcpHandler (el=<optimized out>, fd=11, privdata=<optimized out>, mask=<optimized out>) at networking.c:689
#3  0x00000000004267f0 in aeProcessEvents (eventLoop=eventLoop@entry=0x7ffff083a0a0, flags=flags@entry=11) at ae.c:440
#4  0x0000000000426adb in aeMain (eventLoop=0x7ffff083a0a0) at ae.c:498
#5  0x00000000004238ef in main (argc=<optimized out>, argv=0x7fffffffe588) at server.c:3894

分析這個調用堆棧,我們梳理一下這個調用流程。在main函數的initServer函數中創建偵聽socket、綁定地址然後開啓偵聽,接着調用aeMain函數啓動一個循環不斷地處理“事件”。

void aeMain(aeEventLoop *eventLoop) {
    eventLoop->stop = 0;
    while (!eventLoop->stop) {
        if (eventLoop->beforesleep != NULL)
            eventLoop->beforesleep(eventLoop);
        aeProcessEvents(eventLoop, AE_ALL_EVENTS|AE_CALL_AFTER_SLEEP);
    }
}

循環的退出條件是eventLoop->stop爲1。事件處理的代碼如下:

int aeProcessEvents(aeEventLoop *eventLoop, int flags)
{
    int processed = 0, numevents;

    /* Nothing to do? return ASAP */
    if (!(flags & AE_TIME_EVENTS) && !(flags & AE_FILE_EVENTS)) return 0;

    /* Note that we want call select() even if there are no
     * file events to process as long as we want to process time
     * events, in order to sleep until the next time event is ready
     * to fire. */
    if (eventLoop->maxfd != -1 ||
        ((flags & AE_TIME_EVENTS) && !(flags & AE_DONT_WAIT))) {
        int j;
        aeTimeEvent *shortest = NULL;
        struct timeval tv, *tvp;

        if (flags & AE_TIME_EVENTS && !(flags & AE_DONT_WAIT))
            shortest = aeSearchNearestTimer(eventLoop);
        if (shortest) {
            long now_sec, now_ms;

            aeGetTime(&now_sec, &now_ms);
            tvp = &tv;

            /* How many milliseconds we need to wait for the next
             * time event to fire? */
            long long ms =
                (shortest->when_sec - now_sec)*1000 +
                shortest->when_ms - now_ms;

            if (ms > 0) {
                tvp->tv_sec = ms/1000;
                tvp->tv_usec = (ms % 1000)*1000;
            } else {
                tvp->tv_sec = 0;
                tvp->tv_usec = 0;
            }
        } else {
            /* If we have to check for events but need to return
             * ASAP because of AE_DONT_WAIT we need to set the timeout
             * to zero */
            if (flags & AE_DONT_WAIT) {
                tv.tv_sec = tv.tv_usec = 0;
                tvp = &tv;
            } else {
                /* Otherwise we can block */
                tvp = NULL; /* wait forever */
            }
        }

        /* Call the multiplexing API, will return only on timeout or when
         * some event fires. */
        numevents = aeApiPoll(eventLoop, tvp);

        /* After sleep callback. */
        if (eventLoop->aftersleep != NULL && flags & AE_CALL_AFTER_SLEEP)
            eventLoop->aftersleep(eventLoop);

        for (j = 0; j < numevents; j++) {
            aeFileEvent *fe = &eventLoop->events[eventLoop->fired[j].fd];
            int mask = eventLoop->fired[j].mask;
            int fd = eventLoop->fired[j].fd;
            int rfired = 0;

        /* note the fe->mask & mask & ... code: maybe an already processed
             * event removed an element that fired and we still didn't
             * processed, so we check if the event is still valid. */
            if (fe->mask & mask & AE_READABLE) {
                rfired = 1;
                fe->rfileProc(eventLoop,fd,fe->clientData,mask);
            }
            if (fe->mask & mask & AE_WRITABLE) {
                if (!rfired || fe->wfileProc != fe->rfileProc)
                    fe->wfileProc(eventLoop,fd,fe->clientData,mask);
            }
            processed++;
        }
    }
    /* Check time events */
    if (flags & AE_TIME_EVENTS)
        processed += processTimeEvents(eventLoop);

    return processed; /* return the number of processed file/time events */
}

該段代碼先通過flag參數檢查是否有件事需要處理。如果有定時器事件(AE_TIME_EVENTS標誌),則尋找最近要到期的定時器。

/* Search the first timer to fire.
 * This operation is useful to know how many time the select can be
 * put in sleep without to delay any event.
 * If there are no timers NULL is returned.
 *
 * Note that's O(N) since time events are unsorted.
 * Possible optimizations (not needed by Redis so far, but...):
 * 1) Insert the event in order, so that the nearest is just the head.
 *    Much better but still insertion or deletion of timers is O(N).
 * 2) Use a skiplist to have this operation as O(1) and insertion as O(log(N)).
 */
static aeTimeEvent *aeSearchNearestTimer(aeEventLoop *eventLoop)
{
    aeTimeEvent *te = eventLoop->timeEventHead;
    aeTimeEvent *nearest = NULL;

    while(te) {
        if (!nearest || te->when_sec < nearest->when_sec ||
                (te->when_sec == nearest->when_sec &&
                 te->when_ms < nearest->when_ms))
            nearest = te;
        te = te->next;
    }
    return nearest;
}

這段代碼有詳細的註釋,也非常好理解。代碼作者註釋告訴我們,由於這裏的定時器集合是無序的,所以需要遍歷一下這個鏈表,算法複雜度是O(n)。同時,註釋中也“暗示”了我們將來redis在這塊的優化方向,即把這個鏈表按到期時間從小到大排下序,這樣鏈表的頭部就是我們要的最近時間點的定時器對象,算法複雜度是O(1)。或者使用redis中的skiplist,算法複雜度是O(log(N))。

接着獲取當前系統時間(aeGetTime(&now_sec, &now_ms);)將最早要到期的定時器減去當前系統時間獲得一個間隔。這個時間間隔作爲numevents = aeApiPoll(eventLoop, tvp);調用的參數,aeApiPoll()在linux平臺上使用的epoll技術,redis在這個IO複用技術上在不同的操作系統平臺上使用不同的系統函數,在Windows系統上使用select,在Mac系統上使用kqueue。這裏我們重點看下linux平臺下的實現:

static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp) {
    aeApiState *state = eventLoop->apidata;
    int retval, numevents = 0;

    retval = epoll_wait(state->epfd,state->events,eventLoop->setsize,
            tvp ? (tvp->tv_sec*1000 + tvp->tv_usec/1000) : -1);
    if (retval > 0) {
        int j;

        numevents = retval;
        for (j = 0; j < numevents; j++) {
            int mask = 0;
            struct epoll_event *e = state->events+j;

            if (e->events & EPOLLIN) mask |= AE_READABLE;
            if (e->events & EPOLLOUT) mask |= AE_WRITABLE;
            if (e->events & EPOLLERR) mask |= AE_WRITABLE;
            if (e->events & EPOLLHUP) mask |= AE_WRITABLE;
            eventLoop->fired[j].fd = e->data.fd;
            eventLoop->fired[j].mask = mask;
        }
    }
    return numevents;
}

epoll_wait這個函數的簽名如下:

int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);

最後一個參數timeout的設置非常有講究,如果傳入進來的tvp是NULL,根據上文的分析,說明沒有定時器事件,則將等待時間設置爲-1,這會讓epoll_wait無限期的掛起來,直到有事件時纔會被喚醒。掛起的好處就是不浪費cpu時間片。反之,將timeout設置成最近的定時器事件間隔,將epoll_wait的等待時間設置爲最近的定時器事件來臨的時間間隔,可以及時喚醒epoll_wait,這樣程序流可以儘快處理這個到期的定時器事件(下文會介紹)。

對於epoll_wait這種系統調用,所有的fd(對於網絡通信,也叫socket)信息包括偵聽fd和普通客戶端fd都記錄在事件循環對象aeEventLoop的apidata字段中,當某個fd上有事件觸發時,從apidata中找到該fd,並把事件類型(mask字段)一起記錄到aeEventLoop的fired字段中去。我們先把這個流程介紹完,再介紹epoll_wait函數中使用的epfd是在何時何地創建的,偵聽fd、客戶端fd是如何掛載到epfd上去的。

在得到了有事件的fd以後,接下來就要處理這些事件了。在主循環aeProcessEvents中從aeEventLoop對象的fired數組中取出上一步記錄的fd,然後根據事件類型(讀事件和寫事件)分別進行處理。

for (j = 0; j < numevents; j++) {
            aeFileEvent *fe = &eventLoop->events[eventLoop->fired[j].fd];
            int mask = eventLoop->fired[j].mask;
            int fd = eventLoop->fired[j].fd;
            int rfired = 0;

        /* note the fe->mask & mask & ... code: maybe an already processed
             * event removed an element that fired and we still didn't
             * processed, so we check if the event is still valid. */
            if (fe->mask & mask & AE_READABLE) {
                rfired = 1;
                fe->rfileProc(eventLoop,fd,fe->clientData,mask);
            }
            if (fe->mask & mask & AE_WRITABLE) {
                if (!rfired || fe->wfileProc != fe->rfileProc)
                    fe->wfileProc(eventLoop,fd,fe->clientData,mask);
            }
            processed++;
        }

讀事件字段rfileProc和寫事件字段wfileProc都是函數指針,在程序早期設置好,這裏直接調用就可以了。

typedef void aeFileProc(struct aeEventLoop *eventLoop, int fd, void *clientData, int mask);

/* File event structure */
typedef struct aeFileEvent {
    int mask; /* one of AE_(READABLE|WRITABLE) */
    aeFileProc *rfileProc;
    aeFileProc *wfileProc;
    void *clientData;
} aeFileEvent;

epfd的創建

我們通過搜索關鍵字epoll_create在ae_poll.c文件中找到epfd的創建函數aeApiCreate。

static int aeApiCreate(aeEventLoop *eventLoop) {
    aeApiState *state = zmalloc(sizeof(aeApiState));

    if (!state) return -1;
    state->events = zmalloc(sizeof(struct epoll_event)*eventLoop->setsize);
    if (!state->events) {
        zfree(state);
        return -1;
    }
    state->epfd = epoll_create(1024); /* 1024 is just a hint for the kernel */
    if (state->epfd == -1) {
        zfree(state->events);
        zfree(state);
        return -1;
    }
    eventLoop->apidata = state;
    return 0;
}

使用gdb的b命令在這個函數上加個斷點,然後使用run命令重新運行一下redis-server,觸發斷點,使用bt命令查看此時的調用堆棧。啊哈,發現epfd也是在上文介紹的initServer函數中創建的。

(gdb) bt
#0  aeCreateEventLoop (setsize=10128) at ae.c:79
#1  0x000000000042f542 in initServer () at server.c:1841
#2  0x0000000000423803 in main (argc=<optimized out>, argv=0x7fffffffe588) at server.c:3857

在aeCreateEventLoop中不僅創建了epfd,也創建了整個事件循環需要的aeEventLoop對象,並把這個對象記錄在redis的一個全局變量的el字段中。這個全局變量叫server,這是一個結構體類型。其定義如下:

//位於server.c文件中
struct redisServer server; /* Server global state */


//位於server.h文件中
struct redisServer {
    /* General */
    //省略部分字段...
    aeEventLoop *el;
    unsigned int lruclock;      /* Clock for LRU eviction */
    //太長了,省略部分字段...
}

偵聽fd與客戶端fd是如何掛載到epfd上去的

同樣的方式,要把一個fd掛載到epfd上去,需要調用系統API epoll_ctl,全部搜索一下這個函數名。在文件ae_epoll.c中我們找到aeApiAddEvent函數:

static int aeApiAddEvent(aeEventLoop *eventLoop, int fd, int mask) {
    aeApiState *state = eventLoop->apidata;
    struct epoll_event ee = {0}; /* avoid valgrind warning */
    /* If the fd was already monitored for some event, we need a MOD
     * operation. Otherwise we need an ADD operation. */
    int op = eventLoop->events[fd].mask == AE_NONE ?
            EPOLL_CTL_ADD : EPOLL_CTL_MOD;

    ee.events = 0;
    mask |= eventLoop->events[fd].mask; /* Merge old events */
    if (mask & AE_READABLE) ee.events |= EPOLLIN;
    if (mask & AE_WRITABLE) ee.events |= EPOLLOUT;
    ee.data.fd = fd;
    if (epoll_ctl(state->epfd,op,fd,&ee) == -1) return -1;
    return 0;
}

當把一個fd綁定到epfd上去的時候,先從eventLoop(aeEventLoop類型)中尋找是否存在已關注的件類型,如果已經有了,說明使用epoll_ctl是更改已綁定的fd事件類型(EPOLL_CTL_MOD),否則就是添加fd到epfd上。

我們在aeApiAddEvent加個斷點,再重啓下redis-server。觸發斷點後的調用堆棧如下:

#0  aeCreateFileEvent (eventLoop=0x7ffff083a0a0, fd=15, mask=mask@entry=1, proc=0x437f50 <acceptTcpHandler>, clientData=clientData@entry=0x0) at ae.c:145
#1  0x000000000042f83b in initServer () at server.c:1927
#2  0x0000000000423803 in main (argc=<optimized out>, argv=0x7fffffffe588) at server.c:3857

同樣在initServer函數中。我們結合上文分析的偵聽fd的創建過程,去掉無關代碼,抽出這個函數的主脈絡得到如下僞代碼:

void initServer(void) {

    //記錄程序進程ID   
    server.pid = getpid();

    //創建程序的aeEventLoop對象和epfd對象
    server.el = aeCreateEventLoop(server.maxclients+CONFIG_FDSET_INCR);

    //創建偵聽fd
    listenToPort(server.port,server.ipfd,&server.ipfd_count) == C_ERR)

    //將偵聽fd設置爲非阻塞的
    anetNonBlock(NULL,server.sofd);

    //創建redis的定時器,用於執行定時任務cron
    /* Create the timer callback, this is our way to process many background
     * operations incrementally, like clients timeout, eviction of unaccessed
     * expired keys and so forth. */
    aeCreateTimeEvent(server.el, 1, serverCron, NULL, NULL) == AE_ERR

    //將偵聽fd綁定到epfd上去
    /* Create an event handler for accepting new connections in TCP and Unix
     * domain sockets. */
     aeCreateFileEvent(server.el, server.ipfd[j], AE_READABLE, acceptTcpHandler,NULL) == AE_ERR

    //創建一個管道,用於在需要時去喚醒epoll_wait掛起的整個EventLoop
    /* Register a readable event for the pipe used to awake the event loop
     * when a blocked client in a module needs attention. */
    aeCreateFileEvent(server.el, server.module_blocked_pipe[0], AE_READABLE, moduleBlockedClientPipeReadable,NULL) == AE_ERR)
}

注意:這裏所說的“主脈絡”是指我們這裏關心的網絡通信的主脈絡,不代表這個函數中其他代碼就不是主要的。

我們如何驗證這個斷點處掛載到epfd上的fd就是偵聽fd呢?這個很容易,創建偵聽fd時,我們用gdb記錄下這個fd的值。例如,筆者的電腦某次運行時,偵聽fd的值是15。如下圖(調試工具用的是cgdb):

然後在運行程序至綁定fd的地方,確認一下綁定到epfd上的fd值:

這裏的fd值也是15,說明綁定的fd是偵聽fd。當然在綁定偵聽fd時,同時也指定了只關注可讀事件,並設置時間回調函數爲acceptTcpHandler。對於偵聽fd,我們一般只要關注可讀事件就可以了,一般當觸發可讀事件,說明有新的連接到來。

aeCreateFileEvent(server.el, server.ipfd[j], AE_READABLE, acceptTcpHandler,NULL) == AE_ERR

acceptTcpHandler函數定義如下(位於文件networking.c中):


void acceptTcpHandler(aeEventLoop *el, int fd, void *privdata, int mask) {
    int cport, cfd, max = MAX_ACCEPTS_PER_CALL;
    char cip[NET_IP_STR_LEN];
    UNUSED(el);
    UNUSED(mask);
    UNUSED(privdata);

    while(max--) {
        cfd = anetTcpAccept(server.neterr, fd, cip, sizeof(cip), &cport);
        if (cfd == ANET_ERR) {
            if (errno != EWOULDBLOCK)
                serverLog(LL_WARNING,
                    "Accepting client connection: %s", server.neterr);
            return;
        }
        serverLog(LL_VERBOSE,"Accepted %s:%d", cip, cport);
        acceptCommonHandler(cfd,0,cip);
    }
}

anetTcpAccept函數中調用就是我們上文中說的anetGenericAccept函數了。

int anetTcpAccept(char *err, int s, char *ip, size_t ip_len, int *port) {
    int fd;
    struct sockaddr_storage sa;
    socklen_t salen = sizeof(sa);
    if ((fd = anetGenericAccept(err,s,(struct sockaddr*)&sa,&salen)) == -1)
        return ANET_ERR;

    if (sa.ss_family == AF_INET) {
        struct sockaddr_in *s = (struct sockaddr_in *)&sa;
        if (ip) inet_ntop(AF_INET,(void*)&(s->sin_addr),ip,ip_len);
        if (port) *port = ntohs(s->sin_port);
    } else {
        struct sockaddr_in6 *s = (struct sockaddr_in6 *)&sa;
        if (ip) inet_ntop(AF_INET6,(void*)&(s->sin6_addr),ip,ip_len);
        if (port) *port = ntohs(s->sin6_port);
    }
    return fd;
}

至此,這段流程總算連起來了。我們在acceptTcpHandler上加個斷點,然後重新運行一下redis-server,再開個redis-cli去連接redis-server。看看是否能觸發該斷點,如果能觸發該斷點,說明我們的分析時正確的。

經驗證,確實觸發了該斷點。

在acceptTcpHandler中成功接受新連接後,產生客戶端fd,然後調用acceptCommonHandler函數,在該函數中調用createClient函數,在createClient函數中先將客戶端fd設置成非阻塞的,然後將該fd關聯到epfd上去,同時記錄到整個程序的aeEventLoop對象上。注意,這裏客戶端fd綁定到epfd上時也只關注可讀事件。我們將無關的代碼去掉,然後抽出我們關注的部分,整理後如下(位於networking.c文件中):

client *createClient(int fd) {
    //將客戶端fd設置成非阻塞的
    anetNonBlock(NULL,fd);
    //啓用tcp NoDelay選項
    anetEnableTcpNoDelay(NULL,fd);
    //根據配置,決定是否啓動tcpkeepalive選項
    if (server.tcpkeepalive)
        anetKeepAlive(NULL,fd,server.tcpkeepalive);
    //將客戶端fd綁定到epfd,同時記錄到aeEventLoop上,關注的事件爲AE_READABLE,回調函數爲
    //readQueryFromClient
    aeCreateFileEvent(server.el,fd,AE_READABLE, readQueryFromClient, c) == AE_ERR

    return c;
}

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章