Linux epoll API概述

大家是不是想知道,當我們在用戶空間調用linux epoll的三個函數
int epoll_create(int size);
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);
時,內核空間發生了什麼?很幸運,從網上找到一篇文章,其中有介紹這一部分。

原文:《EPOLL_CTL_DISABLE and multithreaded applications》(By Michael Kerrisk, October 17, 2012)(鏈接

下面是摘錄:

The (Linux-specific) epoll API allows an application to monitor multiple file descriptors in order to determine which of the descriptors are ready to perform I/O. The API was designed as a more efficient replacement for the traditional select() and poll() system calls. Roughly speaking, the performance of those older APIs scales linearly with the number of file descriptors being monitored. That behavior makes select() and poll() poorly suited for modern network applications that may handle thousands of file descriptors simultaneously.

The poor performance of select() and poll() is an inescapable consequence of their design. For each monitoring operation, both system calls require the application to give the kernel a complete list of all of the file descriptors that are of interest. And on each call, the kernel must re-examine the state of all of those descriptors and then pass a data structure back to the application that describes the readiness of the descriptors.

The underlying problem of the older APIs is that they don't allow an application to inform the kernel about its ongoing interest in a (typically unchanging) set of file descriptors. If the kernel had that information, then, as each file descriptor became ready, it could record the fact in preparation for the next request by the application for the set of ready file descriptors. The epoll API allows exactly that approach, by splitting the monitoring API up across three system calls:

  • epoll_create() creates an internal kernel data structure ("an epoll instance") that is used to record the set of file descriptors that the application is interested in monitoring. The call returns a file descriptor that is used in the remaining epoll APIs.

     

  • epoll_ctl() allows the application to inform the kernel about the set of file descriptors it would like to monitor by adding (EPOLL_CTL_ADD) and removing (EPOLL_CTL_DEL) file descriptors from the interest list of the epoll instance. epoll_ctl() can also modify (EPOLL_CTL_MOD) the set of events that are to be monitored for a file descriptor that is already in the interest list. Once a file descriptor has been recorded in the interest list, the kernel tracks I/O events for the file descriptor (e.g., the arrival of new input); if the event causes the file descriptor to become ready, the kernel places the descriptor on the ready list of the epoll instance, in preparation for the next call to epoll_wait().

     

  • epoll_wait() requests the kernel to return one or more ready file descriptors. The kernel satisfies this request by simply fetching items from the ready list (the call can block if there are no descriptors that are yet ready). The application uses epoll_wait() each time it wants to check for changes in the readiness of file descriptors. What is notable about epoll_wait() is that the application does not need to pass in a list of file descriptors on each call: the kernel already has that information via preceding calls to epoll_ctl(). In addition, there is no need to rescan the complete set of file descriptors to see which are ready; the kernel has already been recording that information on an ongoing basis because it knows which file descriptors the application is interested in.

Schematically, the epoll API operates as shown in the following diagram:

[Overview of the epoll API]

Because the kernel is able to maintain internal state about the set of file descriptors in which the application is interested, epoll_wait() is much more efficient than select() and poll(). Roughly speaking, its performance scales according to the number of ready file descriptors, rather than the total number of file descriptors being monitored.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章