Netty学习笔记(三)EventLoopGroup开篇

使用Netty都需要定义EventLoopGroup，也就是线程池

前面讲过在客户端只需要一个EventLoopGroup就够了,而在服务端就需要两个Group--bossGroup和workerGroup,这与Netty的线程模型有关，使用的是主从Reactor多线程模型，两个线程池，一个用于监听端口，创建新连接(boosGroup)，一个用于处理每一条连接的数据读写和业务逻辑(workerGroup)

以下的代码里都去掉了一些try...catch和非核心代码，只保留了主要的代码流程

EventLoopGroup初始化

其类图如下所示:

可以发现EventLoopGroup都实现了ScheduledExecutorService,本质是一个带有schedule的线程池
NioEventLoopGroup有很多重载的构造方法，最后都调用了如下方法:

public NioEventLoopGroup(int nThreads, ThreadFactory threadFactory,
        final SelectorProvider selectorProvider, final SelectStrategyFactory selectStrategyFactory) {
        super(nThreads, threadFactory, selectorProvider, selectStrategyFactory, RejectedExecutionHandlers.reject());
    }

调用其父类MultithreadEventLoopGroup的构造方法:

private static final int DEFAULT_EVENT_LOOP_THREADS;

    static {
        DEFAULT_EVENT_LOOP_THREADS = Math.max(1, SystemPropertyUtil.getInt(
                "io.netty.eventLoopThreads", Runtime.getRuntime().availableProcessors() * 2));
    } 
protected MultithreadEventLoopGroup(int nThreads, ThreadFactory threadFactory, Object... args) {
        super(nThreads == 0 ? DEFAULT_EVENT_LOOP_THREADS : nThreads, threadFactory, args);
    }

这里会判断当前nThreads是否为0，如果为0的话则使用默认的Threads数，其实就是处理器核心数*2 ，我的demo里都没有指定线程数，那么最终生成的EventLoopGroup的线程数就处理器核心数*2

再跟踪下去，最后会调用MultithreadEventExecutorGroup的如下构造方法

 protected MultithreadEventExecutorGroup(int nThreads, Executor executor,
                                            EventExecutorChooserFactory chooserFactory, Object... args) {
        if (executor == null) {
            executor = new ThreadPerTaskExecutor(newDefaultThreadFactory());
        }
        children = new EventExecutor[nThreads];
        for (int i = 0; i < nThreads; i ++) {
            boolean success = false;
            try {
                children[i] = newChild(executor, args);
                success = true;
        }

        chooser = chooserFactory.newChooser(children);
    }

上面的代码会先创建一个executor,然后再初始化一个EventExecutor数组(长度就是nThreads),然后调用newChild对每个元素进行初始化，然后调用newChooser方法创建一个chooser

先看下这里的executor的创建，其实就是创建一个Executor的实例对象，对于execute传入的command，都会创建一个线程并启动来执行,线程id为poolName + '-' + poolId.incrementAndGet() + '-'+ nextId.incrementAndGet()

public final class ThreadPerTaskExecutor implements Executor {
    private final ThreadFactory threadFactory;

    public ThreadPerTaskExecutor(ThreadFactory threadFactory) {
        this.threadFactory = threadFactory;
    }

    @Override
    public void execute(Runnable command) {
        threadFactory.newThread(command).start();
    }
}

这里的newChild方法，就是实例化一个 NioEventLoop 对象, 并返回，所以EventLoopGroup里的每一个元素都是NioEventLoop，源码如下:

 @Override
    protected EventLoop newChild(Executor executor, Object... args) throws Exception {
        return new NioEventLoop(this, executor, (SelectorProvider) args[0],
            ((SelectStrategyFactory) args[1]).newSelectStrategy(), (RejectedExecutionHandler) args[2]);
    }

看下这里NioEventLoop的类图:注意下这里的NioEventLoop是实现了SingleThreadEventExecutor，参数Executor最后也会保存在该类的executor属性字段里

接下来看下newChooser方法的实现 : 如果executor,length是2的幂次其实就是nThreads是2的幂次，那么就会使用PowerOfTowEventExecutorChooser来进行选择，否则就使用普通的选择器

   public EventExecutorChooser newChooser(EventExecutor[] executors) {
        if (isPowerOfTwo(executors.length)) {
            return new PowerOfTowEventExecutorChooser(executors);
        } else {
            return new GenericEventExecutorChooser(executors);
        }
    }

 private static boolean isPowerOfTwo(int val) {
        return (val & -val) == val;
    }

两个选择器实现的区别在于获取下一个EventExecutor的方法next(),普通选择器是对idx递增后对nThreads取模
PowerOfTow实现的也是这个逻辑，只不过使用了位运算符，运算速度更快

private static final class PowerOfTowEventExecutorChooser implements EventExecutorChooser {
        private final AtomicInteger idx = new AtomicInteger();
        private final EventExecutor[] executors;
        PowerOfTowEventExecutorChooser(EventExecutor[] executors) {
            this.executors = executors;
        }
        @Override
        public EventExecutor next() {
            return executors[idx.getAndIncrement() & executors.length - 1];
        }
    }

    private static final class GenericEventExecutorChooser implements EventExecutorChooser {
        private final AtomicInteger idx = new AtomicInteger();
        private final EventExecutor[] executors;
        GenericEventExecutorChooser(EventExecutor[] executors) {
            this.executors = executors;
        }
        @Override
        public EventExecutor next() {
            return executors[Math.abs(idx.getAndIncrement() % executors.length)];
        }
    }

总结下EventLoopGroup的初始化:

EventLoopGroup的父类MultithreadEventExecutorGroup内部维护一个类型为 EventExecutor的线程数组, 其大小是 nThreads
如果实例化NioEventLoopGroup 时，没有指定默认值nThreads就等于处理器*2
MultithreadEventExecutorGroup 中通过newChild()抽象方法来初始化 children 数组，每个元素都是NioEventLoop
根据nThreads数选择不同的chooser

EventLoopGroup执行

在ServerBootstrap 初始化时，调用了serverBootstrap.group(bossGroup,workerGroup)设置了两个EventLoopGroup，我们跟
踪进去以后会看到：

 public ServerBootstrap group(EventLoopGroup parentGroup, EventLoopGroup childGroup) {
        super.group(parentGroup);
        if (childGroup == null) {
            throw new NullPointerException("childGroup");
        }
        if (this.childGroup != null) {
            throw new IllegalStateException("childGroup set already");
        }
        this.childGroup = childGroup;
        return this;
    }

这个方法初始化了两个字段，一个是在 super.group(parentGroup)中完成初始化，另一个是通过this.childGroup = childGroup，分别将bossGroup和workerGroup保存在AbstractBootstrap的group属性和ServerBootstrap的childGroup属性

接着从应用程序的启动代码 serverBootstrap.bind()来监听一个本地端口
通过bind方法会调用eventLoop()的execute()方法，最后会进入SingleThreadEventExecutor的execute()方法

    private static void doBind0(
            final ChannelFuture regFuture, final Channel channel,
            final SocketAddress localAddress, final ChannelPromise promise) {
        channel.eventLoop().execute(new Runnable() {
            @Override
            public void run() {
                if (regFuture.isSuccess()) {
                    channel.bind(localAddress, promise).addListener(ChannelFutureListener.CLOSE_ON_FAILURE);
                } else {
                    promise.setFailure(regFuture.cause());
                }
            }
        });
    }

SingleThreadEventExecutor对于添加进来的task,会判断当前执行的currentThread是否等于SingleThreadEventExecutor的thread，如果第一次添加或者当前调用的线程不是SingleThreadEventExecutor的thread，inEventLoop()就会返回false，就会先执行启动当前SingleThreadEventExecutor的startThread()方法再添加task到任务队列(LinkedBlockingQueue)；否则就直接添加任务到任务队列

    private final Queue<Runnable> taskQueue;

    public void execute(Runnable task) {
        if (task == null) {
            throw new NullPointerException("task");
        }
        boolean inEventLoop = inEventLoop();
        if (inEventLoop) {
            addTask(task);
        } else {
            startThread();
            addTask(task);
            if (isShutdown() && removeTask(task)) {
                reject();
            }
        }
        //对于有新任务添加，就会执行wakeup
        if (!addTaskWakesUp && wakesUpForTask(task)) {
            wakeup(inEventLoop);
        }
    }

简单来说，这里的inEventLoop()就是判断当前线程是否是reactor线程，这样的作用是:

1.让task只在reactor线程进行，保证单线程

2.第一次判断会帮我们启动reactor线程

这里的startThread()就是通过一个标志判断reactor线程是否已启动，如果没有启动就执行doStartThread来启动,
SingleThreadEventExecutor 在执行doStartThread()方法的时候，会调用executor的execute方法，会将调用NioEventLoop(SingleThreadEventExecutor 的子类)的run方法封装成一个Runnable让线程池executor去执行(还会将当前线程保存在SingleThreadEventExecutor的thread属性字段里)。这里的executor就是前面讲到的ThreadPerTaskExecutor ，它的execute会对每个传入的Runnable创建一个FastThreadLocalThread线程对象并调用它的start方法去执行

 private void startThread() {
        //判断当前EventLoop线程是否有启动
        if (STATE_UPDATER.get(this) == ST_NOT_STARTED) {
            //进行了一次CAS操作，为了保证线程安全
            if (STATE_UPDATER.compareAndSet(this, ST_NOT_STARTED, ST_STARTED)) {
                doStartThread();
            }
        }
    }

private void doStartThread() {
        assert thread == null;
        executor.execute(new Runnable() {
            @Override
            public void run() {
                thread = Thread.currentThread();
                ...
                boolean success = false;
                updateLastExecutionTime();
                try {
                    SingleThreadEventExecutor.this.run();
                    success = true;
                } catch (Throwable t) {
                    logger.warn("Unexpected exception from an event executor: ", t);
                } 
                ...
            }
        });
    }

通过前面的分析我们可以看出，最终执行的主体方法是:NioEventLoop的run方法，那么我们看下这里的run方法到底执行了什么

@Override
    protected void run() {
        for (;;) {
            try {           
                switch (selectStrategy.calculateStrategy(selectNowSupplier, hasTasks())) {
                    case SelectStrategy.CONTINUE:
                        continue;
                    case SelectStrategy.SELECT:
                        //select轮询, 设置wakenUp为false并返回之前的wakenUp值
                        select(wakenUp.getAndSet(false));
                        if (wakenUp.get()) {
                            selector.wakeup();
                        }
                    default:
                        // fallthrough
                }
                //去除了无关紧要的代码
                processSelectedKeys();
                runAllTasks();                
            } catch (Throwable t) {
                handleLoopException(t);
            }
            // Always handle shutdown even if the loop processing threw an exception.
           ...
        }
    }

先看下这里的策略选择

@Override
    public int calculateStrategy(IntSupplier selectSupplier, boolean hasTasks) throws Exception {
        return hasTasks ? selectSupplier.get() : SelectStrategy.SELECT;
    }

如果任务队列里没有task,就返回策略SELECT,否则就执行selectSupplier.get(),实际就是执行了一次selectNow(非阻塞)方法并返回

可以看到，上面的代码是一个死循环，做的事情主要是以下三个:

轮询注册到reactor线程上的对应的selector的所有channel的IO事件
根据不同的SelectKeys进行处理 processSelectedKeys();
处理任务队列 runAllTasks();

轮询Select

 private void select(boolean oldWakenUp) throws IOException {
        Selector selector = this.selector;
            int selectCnt = 0;
            long currentTimeNanos = System.nanoTime();
            long selectDeadLineNanos = currentTimeNanos + delayNanos(currentTimeNanos);
            for (;;) {
                long timeoutMillis = (selectDeadLineNanos - currentTimeNanos + 500000L) / 1000000L;
                //第一个退出条件
                if (timeoutMillis <= 0) {
                    if (selectCnt == 0) {
                        selector.selectNow();
                        selectCnt = 1;
                    }
                    break;
                }

                // If a task was submitted when wakenUp value was true, the task didn't get a chance to call
                // Selector#wakeup. So we need to check task queue again before executing select operation.
                // If we don't, the task might be pended until select operation was timed out.
                // It might be pended until idle timeout if IdleStateHandler existed in pipeline.
                //第二个退出条件 
                if (hasTasks() && wakenUp.compareAndSet(false, true)) {
                    selector.selectNow();
                    selectCnt = 1;
                    break;
                }

                int selectedKeys = selector.select(timeoutMillis);
                selectCnt ++;

                //第三个退出条件
                if (selectedKeys != 0 || oldWakenUp || wakenUp.get() || hasTasks() || hasScheduledTasks()) {
                    // - Selected something,
                    // - waken up by user, or
                    // - the task queue has a pending task.
                    // - a scheduled task is ready for processing
                    break;
                }
              
                ...
    }

不难看出这里的select是一个死循环,它的退出条件有三种:

距离当前截止时间快到了(<=0.5ms)就跳出循环，如果此时还没有执行select,就执行一次selectNow

 long timeoutMillis = (selectDeadLineNanos - currentTimeNanos + 500000L) / 1000000L;
 timeoutMillis <= 0;

如果任务队列里有任务需要执行就退出(避免由于select阻塞导致任务不能及时执行),退出前也执行一下selectNow
selector.select(XX)的阻塞被唤醒后，如果满足上面的条件就会退出(selectedKeys不为0，任务队列里有任务等)

前面提到过，如果SingleThreadEventExecutor执行execute(Runnable task)添加任务会执行wakeup方法,然后会执行NioEventLoop重写的wakeup方法

@Override
public void execute(Runnable task) {
    //addTaskWakesUp 默认是false  如果是外部线程添加的,inEventLoop就会是false
    if (!addTaskWakesUp && wakesUpForTask(task)) {
        wakeup(inEventLoop);
    }
}

当inEventLoop为false,并且wakenUp变量CAS操作成功(由false变为true,保证线程安全),则调用selector.wakeup()唤醒阻塞的select方法

 @Override
    protected void wakeup(boolean inEventLoop) {
        if (!inEventLoop && wakenUp.compareAndSet(false, true)) {
            selector.wakeup();
        }
    }

Netty解决JDK空轮训Bug

出现此 Bug 是因为当 Selector 的轮询结果为空，也没有wakeup 或新消息处理，则发生空
轮询，CPU 使用率达到100%,导致Nio Server不可用，Netty通过一种巧妙的方式来避开了这个空轮询问题

private void select(boolean oldWakenUp) throws IOException {
    long currentTimeNanos = System.nanoTime();
    for (;;) {
        ...
        int selectedKeys = selector.select(timeoutMillis);
        selectCnt ++;
        //解决jdk的nio bug
        long time = System.nanoTime();
        if (time - TimeUnit.MILLISECONDS.toNanos(timeoutMillis) >= currentTimeNanos) {
            selectCnt = 1;
        } else if (SELECTOR_AUTO_REBUILD_THRESHOLD > 0 && selectCnt >= SELECTOR_AUTO_REBUILD_THRESHOLD) {
            rebuildSelector();
            selector = this.selector;
            selector.selectNow();
            selectCnt = 1;
            break;
        }
        currentTimeNanos = time; 
    ...
 }
}

从上面的代码中可以看出,Selector每一次轮询都会进行计数,selectCnt++，开始轮询和轮询完成都会把当前时间戳赋值给currentTimeNanos和time,两个时间的时间差就是本次轮询消耗的时间

如果持续的时间大于等于timeoutMillis(轮询的时间)，说明就是一次有效的轮询，重置selectCnt标志，否则，表明该阻塞方法并没有阻塞这么长时间，可能触发了jdk的空轮询bug，当空轮询的次数超过一个阀值的时候，默认是512，就开始重建selector

 public void rebuildSelector() {
        final Selector oldSelector = selector;
        final Selector newSelector;
        newSelector = openSelector();
        int nChannels = 0;
        for (;;) {
            try {
                for (SelectionKey key: oldSelector.keys()) {
                    Object a = key.attachment();
                    if (!key.isValid() || key.channel().keyFor(newSelector) != null) {
                          continue;
                    }
                    int interestOps = key.interestOps();
                    key.cancel();
                    SelectionKey newKey = key.channel().register(newSelector, interestOps, a);
                    if (a instanceof AbstractNioChannel) {
                         // Update SelectionKey
                         ((AbstractNioChannel) a).selectionKey = newKey;
                    }
                    nChannels ++;
                }
            } catch (ConcurrentModificationException e) {
                // Probably due to concurrent modification of the key set.
                continue;
            }
            break;
        }
        selector = newSelector;
        oldSelector.close();
    }

rebuildSelector主要做了三件事:

创建一个新的 Selector。
将原来Selector 中注册的事件全部取消。
将可用事件重新注册到新的 Selector 中，并激活。

参考:
netty源码分析之揭开reactor线程的面纱

Netty 源码分析-EventLoop

Netty学习笔记(三)EventLoopGroup开篇

EventLoopGroup初始化

EventLoopGroup执行

Netty解决JDK空轮训Bug

《Python进阶》学习笔记

Leetcode 3161. 物块放置查询

一个docker容器暴露多个端口

leetcode 60 排列序列

微服务实践之使用 Visual Studio 2022 调试Dapr 应用程序

wpf附加属性理解 WPF附加属性

QUIC報文格式詳解

Auto packing the repository in background for optimum performance.

《MySQL實戰45講》實踐篇 24-29 學習筆記 (主備篇)

Netty學習筆記(四)EventLoopGroup續篇

手寫實現RPC框架基礎功能

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結