小白帶你認識netty(三)之NioEventLoop的線程(或者reactor線程)啓動(一) 原

在第一章中,我們看關於NioEventLoopGroup的初始化,我們知道了NioEventLoopGroup對象中有一組EventLoop數組,並且數組中的每個EventLoop對象都對應一個線程FastThreadLocalThread,那麼這個線程是啥時候啓動的呢?今天來繼續研究下源碼。。

還記得這個方法麼?就是initAndRegister方法中的register方法,這裏有個if(eventLoop.inEventLoop())的邏輯判斷,上一節我們分析了,這裏走else的邏輯,因此會執行eventLoop.execute方法,那麼這個方法就是NioEventLoop啓動的入口。我們跟進這個execute方法,因爲SingleThreadEventExecutor是NioEventLoop的子類,所以,會執行SingleThreadEventExecutor的execute方法:

同理,依然執行的是else中的方法:首先是startThread()方法:

然後調用doStartThread方法:

看一下executor.execute方法,這個executor就是第一章說的ThreadPerTaskExecutor對象。因此executor就是調用的ThreadPerTaskExecutor這個類裏面的:

之前分析過,這個newThread就是創建一個FastThreadLocalThread線程對象,因此這裏就是開啓一個線程。在這個線程中,將該線程對象賦值給SingleThreadEventExecutor對象的thread成員變量, thread = Thread.currentThread();至此,inEventLoop()方法將返回true了。。。然後接着執行SingleThreadEventExecutor.this.run();方法。進入該方法:

protected void run() {
        for (;;) {
            try {
                switch (selectStrategy.calculateStrategy(selectNowSupplier, hasTasks())) {
                    case SelectStrategy.CONTINUE:
                        continue;
                    case SelectStrategy.SELECT:
                        select(wakenUp.getAndSet(false));

                        // 'wakenUp.compareAndSet(false, true)' is always evaluated
                        // before calling 'selector.wakeup()' to reduce the wake-up
                        // overhead. (Selector.wakeup() is an expensive operation.)
                        //
                        // However, there is a race condition in this approach.
                        // The race condition is triggered when 'wakenUp' is set to
                        // true too early.
                        //
                        // 'wakenUp' is set to true too early if:
                        // 1) Selector is waken up between 'wakenUp.set(false)' and
                        //    'selector.select(...)'. (BAD)
                        // 2) Selector is waken up between 'selector.select(...)' and
                        //    'if (wakenUp.get()) { ... }'. (OK)
                        //
                        // In the first case, 'wakenUp' is set to true and the
                        // following 'selector.select(...)' will wake up immediately.
                        // Until 'wakenUp' is set to false again in the next round,
                        // 'wakenUp.compareAndSet(false, true)' will fail, and therefore
                        // any attempt to wake up the Selector will fail, too, causing
                        // the following 'selector.select(...)' call to block
                        // unnecessarily.
                        //
                        // To fix this problem, we wake up the selector again if wakenUp
                        // is true immediately after selector.select(...).
                        // It is inefficient in that it wakes up the selector for both
                        // the first case (BAD - wake-up required) and the second case
                        // (OK - no wake-up required).

                        if (wakenUp.get()) {
                            selector.wakeup();
                        }
                    default:
                        // fallthrough
                }

                cancelledKeys = 0;
                needsToSelectAgain = false;
                final int ioRatio = this.ioRatio;
                if (ioRatio == 100) {
                    try {
                        processSelectedKeys();
                    } finally {
                        // Ensure we always run tasks.
                        runAllTasks();
                    }
                } else {
                    final long ioStartTime = System.nanoTime();
                    try {
                        processSelectedKeys();
                    } finally {
                        // Ensure we always run tasks.
                        final long ioTime = System.nanoTime() - ioStartTime;
                        runAllTasks(ioTime * (100 - ioRatio) / ioRatio);
                    }
                }
            } catch (Throwable t) {
                handleLoopException(t);
            }
            // Always handle shutdown even if the loop processing threw an exception.
            try {
                if (isShuttingDown()) {
                    closeAll();
                    if (confirmShutdown()) {
                        return;
                    }
                }
            } catch (Throwable t) {
                handleLoopException(t);
            }
        }
    }

居然是個死循環,表面該線程就一直處於循環之中。

該run方法主要做三件事1、首先輪詢註冊到reactor線程對用的selector上的所有的channel的IO事件。2、處理IO事件。3、處理異步任務隊列。

1、檢查是否有IO事件:

那個switch中的代碼就是判斷task隊列中是否有任務的。

如果沒有任務,就返回SelectStrategy.SELECT,接着執行select方法:

這個select的中的參數的意思就是將wakenUp表示是否應該喚醒正在阻塞的select操作,可以看到netty在進行一次新的loop之前,都會將wakenUp被設置成false。然後進入select方法:

private void select(boolean oldWakenUp) throws IOException {
        Selector selector = this.selector;
        try {
            int selectCnt = 0;
            long currentTimeNanos = System.nanoTime();
            long selectDeadLineNanos = currentTimeNanos + delayNanos(currentTimeNanos);
            for (;;) {
                long timeoutMillis = (selectDeadLineNanos - currentTimeNanos + 500000L) / 1000000L;
                if (timeoutMillis <= 0) {
                    if (selectCnt == 0) {
                        selector.selectNow();
                        selectCnt = 1;
                    }
                    break;
                }

                // If a task was submitted when wakenUp value was true, the task didn't get a chance to call
                // Selector#wakeup. So we need to check task queue again before executing select operation.
                // If we don't, the task might be pended until select operation was timed out.
                // It might be pended until idle timeout if IdleStateHandler existed in pipeline.
                if (hasTasks() && wakenUp.compareAndSet(false, true)) {
                    selector.selectNow();
                    selectCnt = 1;
                    break;
                }

                int selectedKeys = selector.select(timeoutMillis);
                selectCnt ++;

                if (selectedKeys != 0 || oldWakenUp || wakenUp.get() || hasTasks() || hasScheduledTasks()) {
                    // - Selected something,
                    // - waken up by user, or
                    // - the task queue has a pending task.
                    // - a scheduled task is ready for processing
                    break;
                }
                if (Thread.interrupted()) {
                    // Thread was interrupted so reset selected keys and break so we not run into a busy loop.
                    // As this is most likely a bug in the handler of the user or it's client library we will
                    // also log it.
                    //
                    // See https://github.com/netty/netty/issues/2426
                    if (logger.isDebugEnabled()) {
                        logger.debug("Selector.select() returned prematurely because " +
                                "Thread.currentThread().interrupt() was called. Use " +
                                "NioEventLoop.shutdownGracefully() to shutdown the NioEventLoop.");
                    }
                    selectCnt = 1;
                    break;
                }

                long time = System.nanoTime();
                if (time - TimeUnit.MILLISECONDS.toNanos(timeoutMillis) >= currentTimeNanos) {
                    // timeoutMillis elapsed without anything selected.
                    selectCnt = 1;
                } else if (SELECTOR_AUTO_REBUILD_THRESHOLD > 0 &&
                        selectCnt >= SELECTOR_AUTO_REBUILD_THRESHOLD) {
                    // The selector returned prematurely many times in a row.
                    // Rebuild the selector to work around the problem.
                    logger.warn(
                            "Selector.select() returned prematurely {} times in a row; rebuilding Selector {}.",
                            selectCnt, selector);

                    rebuildSelector();
                    selector = this.selector;

                    // Select again to populate selectedKeys.
                    selector.selectNow();
                    selectCnt = 1;
                    break;
                }

                currentTimeNanos = time;
            }

            if (selectCnt > MIN_PREMATURE_SELECTOR_RETURNS) {
                if (logger.isDebugEnabled()) {
                    logger.debug("Selector.select() returned prematurely {} times in a row for Selector {}.",
                            selectCnt - 1, selector);
                }
            }
        } catch (CancelledKeyException e) {
            if (logger.isDebugEnabled()) {
                logger.debug(CancelledKeyException.class.getSimpleName() + " raised by a Selector {} - JDK bug?",
                        selector, e);
            }
            // Harmless exception - log anyway
        }
    }

首先,看下long selectDeadLineNanos = currentTimeNanos + delayNanos(currentTimeNanos);這一行代碼:嗯?delayNanos是什麼鬼?跟進去看一下:

等等,peekScheduledTask又是什麼鬼?再進去瞅瞅。。。。

哎呀,這個scheduledTaskQueue是什麼隊列?

哦,原來是一個優先級隊列,其實是一個按照定時任務將要執行的時間排序的一個隊列。因此peekScheduledTask隊列返回的是最近要執行的一個任務。所以,這個delayNanos返回的是到以一個定時任務的時間,如果定時任務隊列沒有值,那麼默認就是1秒,即1000000000納秒。因此selectDeadLineNanos就表示當前時間+到第一個要執行的定時任務的時間。

下面在select方法中又是一個循環,在循環中第一句:long timeoutMillis = (selectDeadLineNanos - currentTimeNanos + 500000L) / 1000000L;這句話表示是否當前的定時任務隊列中有任務的截止事件快到了(<=0.5ms):

如果當前的定時任務中的事件快到了(還有不到0.5ms的時間,定時任務就要執行了),然後就進入if裏面,selectCnt表示的是執行select的次數。如果一次都沒有select過,就立馬進行selector.selectNow,該方法是非阻塞的,會立馬返回,並將selectCnt設置爲1,然後跳出循環。如果當前的定時任務中的事件的執行離當前時間還差0.5ms以上,則繼續向下執行:

在這個if中,netty會判斷任務隊列中是否又任務並且wekenUp標記爲是否被設置爲了true,如果if滿足了,表明任務隊列已經有了任務,要結束本次的select的操作了,同樣,立馬進行selector.selectNow,並並將selectCnt設置爲1,跳出循環。否則的話,將繼續執行。

selector.select(timeoutMillis)是一個阻塞的select,阻塞時間就是當前時間到定時任務執行前的0.5ms的這一段時間。然後將selectCnt++。這裏有個問題,如果離第一個定時任務執行還有20分鐘,那這個方法豈不是要阻塞接近20分鐘麼?是的,沒錯,那如果這個時候,任務隊列裏又了任務了怎麼辦:

所以當有外部線程向任務隊列中放入任務的時候,selector會喚醒阻塞的select操作。

等阻塞的select執行完成後,netty會判斷是否已經有IO時間或者oldWakeUp爲true,或者用戶主動喚醒了select,或者task隊列中已經有任務了或者第一個定時任務將要被執行了,滿足其中一個條件,則表明要跳出本次的select方法了。

netty會在每次進行阻塞select之前記錄一下開始時時間currentTimeNanos,在select之後記錄一下結束時間,判斷select操作是否至少持續了timeoutMillis秒(這裏將time - TimeUnit.MILLISECONDS.toNanos(timeoutMillis) >= currentTimeNanos改成time - currentTimeNanos >= TimeUnit.MILLISECONDS.toNanos(timeoutMillis)或許更好理解一些),
如果持續的時間大於等於timeoutMillis,說明就是一次有效的輪詢,重置selectCnt標誌,表明選擇超時,並沒有IO時間。

這裏有一個NIO的空輪詢bug,該bug會導致Selector一直空輪詢,最終導致CPU飆升100%,nio Server不可用,那麼這個else部分的邏輯就是netty規避空輪詢的bug。如果阻塞select返回了,並不是超時返回的,那麼就說明已經出現了空輪詢現象,那麼就進入了該else邏輯。該邏輯會判斷空輪詢的次數是否大於SELECTOR_AUTO_REBUILD_THRESHOLD這個數,這個數是多少呢?

默認是512次。即空輪詢不能超過512次。如果超過了,那麼就執行rebuildSelector方法,該方法的名字是要重新構建一個selector。的確是這樣:

public void rebuildSelector() {
        if (!inEventLoop()) {
            execute(new Runnable() {
                @Override
                public void run() {
                    rebuildSelector();
                }
            });
            return;
        }

        final Selector oldSelector = selector;
        //定義一個新的Selector對象
        final Selector newSelector;

        if (oldSelector == null) {
            return;
        }

        try {
            //重新實例化該Selector對象
            newSelector = openSelector();
        } catch (Exception e) {
            logger.warn("Failed to create a new Selector.", e);
            return;
        }

        // Register all channels to the new Selector.
        int nChannels = 0;
        for (;;) {
            try {
                //遍歷原有的selector上的key
                for (SelectionKey key: oldSelector.keys()) {
                    //獲取註冊到selector上的NioServerSocketChannel
                    Object a = key.attachment();
                    try {
                        if (!key.isValid() || key.channel().keyFor(newSelector) != null) {
                            continue;
                        }

                        int interestOps = key.interestOps();
                        //取消該key在舊的selector上的事件註冊
                        key.cancel();
                        //將該key對應的channel註冊到新的selector上
                        SelectionKey newKey = key.channel().register(newSelector, interestOps, a);
                        if (a instanceof AbstractNioChannel) {
                            // Update SelectionKey
                            //重新綁定新key和channel的關係
                            ((AbstractNioChannel) a).selectionKey = newKey;
                        }
                        nChannels ++;
                    } catch (Exception e) {
                        logger.warn("Failed to re-register a Channel to the new Selector.", e);
                        if (a instanceof AbstractNioChannel) {
                            AbstractNioChannel ch = (AbstractNioChannel) a;
                            ch.unsafe().close(ch.unsafe().voidPromise());
                        } else {
                            @SuppressWarnings("unchecked")
                            NioTask<SelectableChannel> task = (NioTask<SelectableChannel>) a;
                            invokeChannelUnregistered(task, key, e);
                        }
                    }
                }
            } catch (ConcurrentModificationException e) {
                // Probably due to concurrent modification of the key set.
                continue;
            }

            break;
        }

        selector = newSelector;

        try {
            // time to close the old selector as everything else is registered to the new one
            oldSelector.close();
        } catch (Throwable t) {
            if (logger.isWarnEnabled()) {
                logger.warn("Failed to close the old Selector.", t);
            }
        }

        logger.info("Migrated " + nChannels + " channel(s) to the new Selector.");
    }

然後用新的selector直接調用selectNow:

這就是Netty規避Nio空輪詢的bug問題。至此NioEventLoop的線程啓動(或者說netty的reactor線程)的檢查是否有IO事件分析完了,下一章繼續分析2和3兩個知識點。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章