redis源碼解析----epoll的使用

平時做項目，涉及到網絡層的都是epoll，前幾年發現redis的epoll實現起來非常的精簡，好用。因爲提供的接口簡單，愛並實現的很高效。於是，我就提取出來，直接使用。

今天又打開該文件詳細的看看他的實現細節。

首先簡單介紹epoll，它是linux內核下的一個高效的處理大批量的文件操作符的一個實現。不僅限於socket fd。

他在超時時間內會喚醒有事件的操作符。其中有兩種模式 1、水平觸發（Level Triggered）2、邊緣觸發（Edge Triggered）

簡單概括這兩種，水平觸發是是缺省的工作方式，並且同時支持block和no-block socket.在這種做法中，內核告訴你一個文件描述符是否就緒了，然後你可以對這個就緒的fd進行IO操作。如果你不作任何操作，內核還是會繼續通知你的，所以，這種模式編程出錯誤可能性要小一點。傳統的select/poll都是這種模型的代表。

而邊緣模式是有讀寫等事件，只會通知你一次，直到下一次事件再一次觸發。所以，使用該模式的時候，一般情況下比較複雜，要對操作符讀取數據到完全爲空。才能保證數據不會丟失

epoll 提供了三個接口，

首先通過epoll_create(int maxfds)來創建一個epoll的句柄

之後在你的網絡主循環裏面，每一幀的調用epoll_wait(int epfd, epoll_event *events, int max events, int timeout)來查詢所有的網絡接口，看哪一個可以讀，哪一個可以寫了。

epoll_ctl用來添加/修改/刪除需要偵聽的文件描述符及其事件。

好了，當我們瞭解瞭如何使用這三個函數後，redis ae 做得就是如何友好的使用這三個函數了，並給我們提供方面的接口，讓我們只關注數據包的處理。

首先了解一下ae的結構體eventloop

/* State of an event based program */
typedef struct aeEventLoop {
int maxfd; /* highest file descriptor currently registered */
int setsize; /* max number of file descriptors tracked */
long long timeEventNextId;
time_t lastTime; /* Used to detect system clock skew */
aeFileEvent *events; /* Registered events */
aeFiredEvent *fired; /* Fired events */
aeTimeEvent *timeEventHead;
int stop;
void *apidata; /* This is used for polling API specific data */
aeBeforeSleepProc *beforesleep;
} aeEventLoop;

我們首先只關注epoll相關，maxfd，表示能夠註冊的最大操作符數，也就是aeFileEvent *events的最大數組，

int setsize; /* max number of file descriptors tracked */

同上，能夠分配的最大數組的數量。events 成員保存了我們要註冊到epoll裏的操作符，以及對該操作符事件到來的時候進行的操作的相關函數，具體看一下起結構體我們就明白了。

/* File event structure */
typedef struct aeFileEvent {
 int mask; /* one of AE_(READABLE|WRITABLE) */
 aeFileProc *rfileProc;
 aeFileProc *wfileProc;
 void *clientData;
} aeFileEvent;

mask表示我們對改操作符所要關心的時間，比如可讀，可寫時間的掩碼。rfileProc爲當我們有可讀事件的時候，進行對其回調，wfileProc表示當有可寫的事件的時候，進行回調。clientData爲函數參數。

一般都是以fd作爲aeFileEvent的數組下標，當有fd有事件時候，我們可以直接用fd定位到相應的位置，直接調用相應的函數。

redis 通過提供int aeCreateFileEvent(aeEventLoop *eventLoop, int fd, int mask,

aeFileProc *proc, void *clientData) 該方法，將fd註冊進入。

eventloop中的fired 用來臨時保存epoll_wait中要有事件觸發的操作符。

相應的結構體爲

/* A fired event */
typedef struct aeFiredEvent {
 int fd;
 int mask;
} aeFiredEvent;

有了這個結構體，我們就可以根據fd 找到相應的struct aeFileEvent相對應的的數組元素了。

那麼最終是什麼時候被填充呢，下面我們就要看epoll_wait函數的調用了。

 retval = epoll_wait(state->epfd,state->events,eventLoop->setsize,
 tvp ? (tvp->tv_sec*1000 + tvp->tv_usec/1000) : -1);

首先關注一下，第二個參數，struct epoll_event * events

這個結構體是epoll的參數，它是什麼樣子呢？

//保存觸發事件的某個文件描述符相關的數據（與具體使用方式有關） 
 
typedef union epoll_data { 
 void *ptr; 
 int fd; 
 __uint32_t u32; 
 __uint64_t u64; 
} epoll_data_t; 
 //感興趣的事件和被觸發的事件 
struct epoll_event { 
 __uint32_t events; /* Epoll events */ 
 epoll_data_t data; /* User data variable */ 
};

而redis將該結構體放到了，

typedef struct aeApiState {
 int epfd;
 struct epoll_event *events;
} aeApiState;

內，epfd是epoll_create的返回句柄，events用來保存epoll_wait的的第二個參數結果。能夠保存的數目也就是我們之前提到的setSize大小了。

好，當我們調用epoll_wait後，就會有相應的epoll_event填充到state內，那麼，我們就要對這些fd進行操作了。

請看代碼。

static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp) {
 aeApiState *state = eventLoop->apidata;
 int retval, numevents = 0;
 retval = epoll_wait(state->epfd,state->events,eventLoop->setsize,
 tvp ? (tvp->tv_sec*1000 + tvp->tv_usec/1000) : -1);
 if (retval > 0) {
 int j;
 numevents = retval;
 for (j = 0; j < numevents; j++) {
 int mask = 0;
 struct epoll_event *e = state->events+j;
 if (e->events & EPOLLIN) mask |= AE_READABLE;
 if (e->events & EPOLLOUT) mask |= AE_WRITABLE;
 if (e->events & EPOLLERR) mask |= AE_WRITABLE;
 if (e->events & EPOLLHUP) mask |= AE_WRITABLE;
 eventLoop->fired[j].fd = e->data.fd;
 eventLoop->fired[j].mask = mask;
 }
 }
 return numevents;
}

我們可以清晰的認識到，epoll_wait返回值是本次觸發的時間數量，然後將其便利，相應的事件放入到fired中，

緊接着，對fired進行遍歷操作

 numevents = aeApiPoll(eventLoop, tvp);
 for (j = 0; j < numevents; j++) {
 aeFileEvent *fe = &eventLoop->events[eventLoop->fired[j].fd];
 int mask = eventLoop->fired[j].mask;
 int fd = eventLoop->fired[j].fd;
 int rfired = 0;
 /* note the fe->mask & mask & ... code: maybe an already processed
 * event removed an element that fired and we still didn't
 * processed, so we check if the event is still valid. */
 if (fe->mask & mask & AE_READABLE) {
 rfired = 1;
 fe->rfileProc(eventLoop,fd,fe->clientData,mask);
 }
 if (fe->mask & mask & AE_WRITABLE) {
 if (!rfired || fe->wfileProc != fe->rfileProc)
 fe->wfileProc(eventLoop,fd,fe->clientData,mask);
 }
 processed++;
 }
 }

這就完成了一次對操作符的操作實現

那麼爲什麼中間還弄了一個fired的臨時存儲fd的成員呢，多了一次循環操作，我想應該是爲了實現kqueue,select,epoll的提供共了一個通用的結構。

是不是很簡單？

因此，我在項目中是直接拿來使用的。非常好用方便。

http://blog.chinaunix.net/uid-24517549-id-4051156.html 這篇文章對epoll的使用有很詳細的講解。

redis源碼解析----epoll的使用

《Python進階》學習筆記

微服務實踐之使用 Visual Studio 2022 調試Dapr 應用程序

wpf附加屬性理解 WPF附加屬性

dynamic_cast, reinterpret_cast, static_cast and const_cast 學習

支持vector,map，list序列化，反序列化實現

]linux內核學習之網絡篇——接收分組

Linux 服務器後臺系統架構的高性能設計

linux內核學習之網絡篇——通過socket通信

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結