Nio與Epoll
一直對nio和epoll沒有系統的認識,最近看了下openjdk,簡單的做個記錄。
- Linux2.6之後支持epoll
- windows支持select而不支持epoll
- 不同系統下nio的實現是不一樣的,包括Sunos linux 和windows
- select的複雜度爲O(N)
- select有最大fd限制,默認爲1024
- 修改sys/select.h可以改變select的fd數量限制
- epoll的事件模型,無fd數量限制,複雜度O(1),不需要遍歷fd
個人對於Nio不算太熟,所以用參考《netty權威指南》,寫了一個TimeServer,從這個代碼入手分析nio的實現原理。
public class NioTimeServer {
public static void main(String[] args) {
int port = 8080;
MultiplexerTimeServer timeServer = new MultiplexerTimeServer(port);
new Thread(timeServer).start();
}
static final class MultiplexerTimeServer implements Runnable {
private Selector selector;
private ServerSocketChannel servChannel;
private volatile boolean stop;
public MultiplexerTimeServer(int port) {
try {
selector = Selector.open();
servChannel = ServerSocketChannel.open();
servChannel.configureBlocking(false);
servChannel.socket().bind(new InetSocketAddress(port), 1024);
servChannel.register(selector, SelectionKey.OP_ACCEPT);
} catch (IOException e) {
e.printStackTrace();
System.exit(1);
}
}
public void stop() {
this.stop = true;
}
@Override
public void run() {
while (!stop) {
try {
selector.select(1000);
Set<SelectionKey> selectedKeys = selector.selectedKeys();
Iterator<SelectionKey> it = selectedKeys.iterator();
SelectionKey key = null;
while (it.hasNext()) {
key = it.next();
it.remove();
try {
handleInput(key);
} catch (Exception e) {
if (key != null) {
key.cancel();
if (key.channel() != null)
key.channel().close();
}
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
private void handleInput(SelectionKey key) throws IOException {
if (key.isValid()) {
if (key.isAcceptable()) {
ServerSocketChannel ssc = (ServerSocketChannel) key.channel();
SocketChannel sc = ssc.accept();
sc.configureBlocking(false);
sc.register(selector, SelectionKey.OP_READ);
}
if (key.isReadable()) {
SocketChannel sc = (SocketChannel) key.channel();
ByteBuffer readBuf = ByteBuffer.allocate(1024);
int readBytes = sc.read(readBuf);
if (readBytes > 0) {
readBuf.flip();
byte[] bytes = new byte[readBuf.remaining()];
readBuf.get(bytes);
String body = new String(bytes, "UTF-8");
System.out.println("The time server receive order :" + body);
String currentTime = "QUERY TIME ORDER".equalsIgnoreCase(body)
? new Date(System.currentTimeMillis()).toString() : "BAD ORDER";
doWrite(sc, currentTime);
} else if (readBytes < 0) {
key.cancel();
sc.close();
}
}
}
}
/**
* @param sc
* @param currentTime
* @throws IOException
*/
private void doWrite(SocketChannel sc, String response) throws IOException {
if (response != null && response.trim().length() > 0) {
byte[] bytes = response.getBytes();
ByteBuffer writeBuf = ByteBuffer.allocate(bytes.length);
writeBuf.put(bytes);
writeBuf.flip();
sc.write(writeBuf);
}
}
}
大概的過程如下:
1.創建一個ServerSocketChannel,設置爲非阻塞模式,同時綁定監聽端口,並註冊channel到選擇器上(註冊感興趣的key),
2.用一個線程去輪詢選擇器,調用選擇器的select方法,獲取所有就緒的key,key和channel是相關的,通過key的狀態來決定進一步的處理。
我們重點看的只有一個地方,那就是selector.select(1000);先看如何獲取selector:
public static Selector open() throws IOException {
return SelectorProvider.provider().openSelector();
}
這是使用了SelectorProvider去創建一個Selector,看下SelectorProvider的默認實例:
public static SelectorProvider provider() {
synchronized (lock) {
if (provider != null)
return provider;
return AccessController.doPrivileged(
new PrivilegedAction<SelectorProvider>() {
public SelectorProvider run() {
if (loadProviderFromProperty())
return provider;
if (loadProviderAsService())
return provider;
provider = sun.nio.ch.DefaultSelectorProvider.create();
return provider;
}
});
}
}
重點只看其中這一行:
provider = sun.nio.ch.DefaultSelectorProvider.create();
這裏用到了DefaultSelectorProvider,看下create()方法:
public static SelectorProvider create() {
String osname = AccessController.doPrivileged(
new GetPropertyAction("os.name"));
if ("SunOS".equals(osname)) {
return new sun.nio.ch.DevPollSelectorProvider();
}
// use EPollSelectorProvider for Linux kernels >= 2.6
if ("Linux".equals(osname)) {
String osversion = AccessController.doPrivileged(
new GetPropertyAction("os.version"));
String[] vers = osversion.split("\\.", 0);
if (vers.length >= 2) {
try {
int major = Integer.parseInt(vers[0]);
int minor = Integer.parseInt(vers[1]);
if (major > 2 || (major == 2 && minor >= 6)) {
return new sun.nio.ch.EPollSelectorProvider();
}
} catch (NumberFormatException x) {
// format not recognized
}
}
}
return new sun.nio.ch.PollSelectorProvider();
}
重點到了,我們看到create方法中是通過區分操作系統來返回不同的Provider的。其中SunOs就是Solaris返回的是DevPollSelectorProvider,對於Linux,返回的Provder是EPollSelectorProvider,其餘操作系統,返回的是PollSelectorProvider(比如Windows,是不支持epoll的,見註釋)
繼續看下EPollSelectorProvider
public class EPollSelectorProvider
extends SelectorProviderImpl
{
public AbstractSelector openSelector() throws IOException {
return new EPollSelectorImpl(this);
}
public Channel inheritedChannel() throws IOException {
return InheritedChannel.getChannel();
}
}
這裏用到的是EPollSelectorImpl,由此可知,epoll在nio的實現就在這裏了。
EPollSelectorImpl 中select的實現如下:
protected int doSelect(long timeout)
throws IOException
{
if (closed)
throw new ClosedSelectorException();
processDeregisterQueue();
try {
begin();
pollWrapper.poll(timeout);
} finally {
end();
}
processDeregisterQueue();
int numKeysUpdated = updateSelectedKeys();
if (pollWrapper.interrupted()) {
// Clear the wakeup pipe
pollWrapper.putEventOps(pollWrapper.interruptedIndex(), 0);
synchronized (interruptLock) {
pollWrapper.clearInterrupted();
IOUtil.drain(fd0);
interruptTriggered = false;
}
}
return numKeysUpdated;
}
只看這一句
pollWrapper.poll(timeout);
其中,pollWrapper:
// The poll object
EPollArrayWrapper pollWrapper;
關於EPollArrayWrapper:
/**
* Manipulates a native array of epoll_event structs on Linux:
*
* typedef union epoll_data {
* void *ptr;
* int fd;
* __uint32_t u32;
* __uint64_t u64;
* } epoll_data_t;
*
* struct epoll_event {
* __uint32_t events;
* epoll_data_t data;
* };
*
* The system call to wait for I/O events is epoll_wait(2). It populates an
* array of epoll_event structures that are passed to the call. The data
* member of the epoll_event structure contains the same data as was set
* when the file descriptor was registered to epoll via epoll_ctl(2). In
* this implementation we set data.fd to be the file descriptor that we
* register. That way, we have the file descriptor available when we
* process the events.
*
* All file descriptors registered with epoll have the POLLHUP and POLLERR
* events enabled even when registered with an event set of 0. To ensure
* that epoll_wait doesn't poll an idle file descriptor when the underlying
* connection is closed or reset then its registration is deleted from
* epoll (it will be re-added again if the event set is changed)
*/
這是類註釋,說明了epoll的數據結構等
此類是epoll在openjdk中的實現類,肯定有epoll相關的jni:
private native int epollCreate();
private native void epollCtl(int epfd, int opcode, int fd, int events);
private native int epollWait(long pollAddress, int numfds, long timeout,
int epfd) throws IOException;
private static native int sizeofEPollEvent();
private static native int offsetofData();
private static native int fdLimit();
private static native void interrupt(int fd);
private static native void init();
重點在poll方法:
int poll(long timeout) throws IOException {
updateRegistrations();
updated = epollWait(pollArrayAddress, NUM_EPOLLEVENTS, timeout, epfd);
for (int i=0; i<updated; i++) {
if (getDescriptor(i) == incomingInterruptFD) {
interruptedIndex = i;
interrupted = true;
break;
}
}
return updated;
}
首先調用epollCtl系統調用,更新fd到epoll實例,然後調用epollWait系統調用,線程在此處阻塞,超時或有fd就緒時會被喚醒,返回值是一個fd的集合,0表示無就緒時間,-1表示report error and abort,否則遍歷並處理fd。
關於epoll可以參考此文 http://www.ulduzsoft.com/2014/01/select-poll-epoll-practical-difference-for-system-architects/ 。
腳註
The syscall select is available in Windows but select processing is O(n) in the number of file descriptors unlike the modern constant-time multiplexers like epoll which makes select unacceptable for high-concurrency servers. This document will describe how high-concurrency programs are designed in Windows.
Instead of epoll or kqueue, Windows has its own I/O multiplexer called I/O completion ports (IOCPs). IOCPs are the objects used to poll overlapped I/O for completion. IOCP polling is constant time (REF?).
Windows支持select系統調用,(時間複雜度O(N)),但是不支持Epoll,Windows自身的 multiplexer是IOCPs