Springboot shutdown 耗時太長的分析使用btrace

背景

從本文你可以學到如何分析jvm無法正常關閉的問題? 知道why and how.

沒怎麼用過springboot, 但是還是咬牙上了. 在這篇使用springboottest和h2來構建數據庫測試的採坑記錄中就發現我們的應用在測試用例跑完了無法自動關閉. 而且還總是等了2分鐘就自動關閉了. 然後最開始以爲是test case纔有問題 結果發現是應用本身運行的時候正常關閉也有問題.
如下圖:(測試已經完了,springboot開始shutdown 但是進程本身沒有退出)
在這裏插入圖片描述

先google

發現都是說的如何gracefully shutdown的… 並沒有立即shutdown的… 開始以爲是springboot的問題, 寫了個簡單demo發現可以正常快速關閉…

初步診斷

一個簡單辦法是後臺應用額外啓動一個線程, 不斷打印線程堆棧, 看看有哪些非daemon的線程,

        Thread th = new Thread(new Runnable() {
            @Override
            public void run() {
                while(true) {
                    try {
                        Thread.sleep(1000 * 5);
                    }
                    catch (InterruptedException e) {
                        e.printStackTrace();
                    }

                    Thread.getAllStackTraces().forEach((th, els) -> {
                        System.out.println("-----------------");

                        if (!th.isDaemon()) {
                            System.out.println("non daemon:" + th);
                            for (StackTraceElement e : els) {
                                System.out.println("\t\t" + e);
                            }
                        } else {
                            System.out.println("Daemon thread:" + th);
                        }


                        System.out.println("-----------------");
                    });

                }
            }
        });
        th.setName("PrintThread");
        th.setDaemon(true);
        th.start();

我發現了這個:

Daemon thread:Thread[pool-8-thread-1,5,main]
-----------------
-----------------
non daemon:Thread[nioEventLoopGroup-2-4,10,main]
		sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
		sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
		sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
		sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
		sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
		io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62)
		io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:753)
		io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:408)
		io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
		io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
		java.lang.Thread.run(Thread.java:748)
-----------------
-----------------
Daemon thread:Thread[Attach Listener,9,system]
-----------------
-----------------
Daemon thread:Thread[BTrace Command Queue Processor,5,main]
-----------------
-----------------
Daemon thread:Thread[RMI TCP Accept-0,5,system]
-----------------
-----------------
Daemon thread:Thread[Abandoned connection cleanup thread,5,main]
-----------------
-----------------
non daemon:Thread[pool-3-thread-1,5,main]
		sun.misc.Unsafe.park(Native Method)
		java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
		java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
		java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
		java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
		java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
		java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
		java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
		java.lang.Thread.run(Thread.java:748)
-----------------
-----------------
Daemon thread:Thread[RMI TCP Connection(3)-127.0.0.1,5,RMI Runtime]
-----------------
-----------------
Daemon thread:Thread[PrintThread,5,main]
-----------------
-----------------
non daemon:Thread[pool-6-thread-1,5,main]
		sun.misc.Unsafe.park(Native Method)
		java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
		java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
		java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
		java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
		java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
		java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
		java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
		java.lang.Thread.run(Thread.java:748)
-----------------
-----------------
Daemon thread:Thread[Monitor Ctrl-Break,5,main]
-----------------
-----------------
non daemon:Thread[nioEventLoopGroup-2-3,10,main]
		sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
		sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
		sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
		sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
		sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
		io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62)
		io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:753)
		io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:408)
		io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
		io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
		java.lang.Thread.run(Thread.java:748)
-----------------
-----------------
non daemon:Thread[nioEventLoopGroup-2-5,10,main]
		sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
		sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
		sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
		sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
		sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
		io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62)
		io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:753)
		io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:408)
		io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
		io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
		java.lang.Thread.run(Thread.java:748)
-----------------
-----------------
Daemon thread:Thread[COThread-kb,5,main]

有很多netty的線程沒有關閉. 那麼問題來了 : 如何知道是誰創建的這些線程呢? 在一個複雜項目中

大殺器 BTrace

我的另外一篇博客: 記錄一次TCP連接異常問題-使用btrace
完整的代碼參考github的md: btrace_usage.md 裏面的0.1 Add an example of how to run 部分.
以前也有用過btrace, 發現btrace從 com.sun開源出來了… 給oracle點贊… 所以纔有了更新後的文檔.

迴歸正題

在這裏插入圖片描述可以看到是我們引用的一個外部組件初始化的netty. 想辦法加入springboot shutdownhook中就可以了. ps結果還發現了項目中其他多個地方非daemon線程. 統一修改後就可以了. 比如用guava的ThreadFactoryBuilder修飾一下就可以了

Executors.newSingleThreadScheduledExecutor(new ThreadFactoryBuilder().setDaemon(true).setNameFormat("cleanup-expirecode").build()).scheduleAtFixedRate(() 

思考問題

  1. 前面我有說到, 在自己的應用啓動了一個額外的進程來打印堆棧, 實際上這個可以通過btrace實現.就留給大家思考啦.
  2. springboot的DelayedShutdownHook 解決完自身的非daemon後發現還剩一個這個:
non daemon:Thread[DelayedShutdownHook-for-java.util.concurrent.ThreadPoolExecutor@2c47a053[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0],5,main]
		sun.misc.Unsafe.park(Native Method)
		java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
		java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
		java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1475)
		com.google.common.util.concurrent.MoreExecutors$Application$1.run(MoreExecutors.java:203)
		java.lang.Thread.run(Thread.java:748)

如何通過btrace找到這個線程池是誰創建的呢? (ps: 跟前面監控線程創建類似類似)
結果發現是guava的線程池封裝:

我們的代碼:
    // private final ExecutorService _executor = Executors.newSingleThreadExecutor();
    private final ExecutorService _executor = MoreExecutors.getExitingExecutorService((ThreadPoolExecutor)
            Executors.newFixedThreadPool(1));
guava的代碼:
com.google.common.util.concurrent.MoreExecutors.Application#getExitingExecutorService(java.util.concurrent.ThreadPoolExecutor)
    final ExecutorService getExitingExecutorService(ThreadPoolExecutor executor) {
      return getExitingExecutorService(executor, 120, TimeUnit.SECONDS);
    }

是的沒錯, 就是2分鐘!!! 問題到此解決了.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章