利用jstack命令分析JVM線程

筆者是WEB JAVA後臺開發,最近在線上遇到過幾次服務不可用問題,基本現象是接口請求無響應或響應非常慢達到分鐘級別。一般問題發生時我們都會去查看日誌,經常遇到沒有日誌的情況(此時服務無法響應client請求),甚至要去找幾個小時前的日誌現象發生時,有些接口甚至沒有日誌打印,查找起來很困難,利用jvm的線程棧工具jstack對於查找問題有很大幫助。

文章以SpringBoot爲框架開發一個web demo應用,以接口代碼示例幾種可能會導致服務無法響應的案例,並講述如何以jstack等工具排查問題。
環境:單核CPU虛擬機CentOS6 + JAVA8 + SpringBoot

JAVA jstack日誌文件中有以下幾種狀態需要關注的:
1.死鎖,Deadlock,線程死鎖;
2.執行中,Runnable,線程執行過程中可能會遇到第三方IO等阻塞或循環,仍需要關注;
3.等待資源, Waiting on condition,線程等待條件,可能是在等待網絡資源響應請求,具體需結合棧信息stacktrace進行分析;
4.等待獲取監視器,Waiting on monitor entry,一般是互斥鎖實現線程同步;
5.條件等待/定時等待,Object.wait() 或 TIMED_WAITING,Object.wait()是讓當前線程阻塞,並出讓當前線程的擁有的Object鎖,直到被持有Object鎖的其它線程調用Object.notify()喚醒才繼續執行
6.停止/停止中:Parked/Parking。

死循環

死循環或長時間循環計算,佔用CPU計算資源,導致CPU佔滿。本例虛擬機CPU爲1核,所以CPU佔用用率達到100%,如果是多核,則佔用率爲1/n,如四核則爲25%

代碼示例

@RequestMapping("loop")
public void threadLoopDemo() throws Exception{
    int num = 0;
    long start = System.currentTimeMillis() / 1000;
    while (true) {
        log.info("====>  測試 Loop");
        num++;
        if (num == Integer.MAX_VALUE) {
            log.info("====> rest num");
            num = 0;
        }

        if (System.currentTimeMillis() / 1000 - start > 1000) {
            return;
        }
    }
}

現象說明

top -c CPU佔用情況,發現此時CPU佔用100%,說明以上死循環獨佔CPU資源。
ps -mp 3168 -o THREAD,tid,timetop -H -p 3168(3168爲進程號<pid>) 可打印出進程對應的線程id及運行時間time,可以看到nid=3187的線程佔用CPU82.8%,且運行時間爲2min
死循環CPU升高
線程執行情況

線程堆棧

jstack 3168打印輸出找到對應問題的堆棧,從下往上看,發現tomcat NIO Channel被locked,說明該請求的線程未釋放,仍在執行;同時找到出問題的代碼位置。nid=0xc73換成十進制爲nid=3187,即上述佔用CPU高且時間較少的線程。

"http-nio-10015-exec-1" #19 daemon prio=5 os_prio=0 tid=0x00007f75e4050800 nid=0xc73 runnable [0x00007f75e84e4000]
   java.lang.Thread.State: RUNNABLE
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:326)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        - locked <0x00000000eb1a0b48> (a java.io.BufferedOutputStream)  // log日誌,輸出到file日誌文件
        at java.io.PrintStream.write(PrintStream.java:482)
        - locked <0x00000000eb18a468> (a java.io.PrintStream)
        at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
        at ch.qos.logback.core.joran.spi.ConsoleTarget$1.write(ConsoleTarget.java:37)
        at ch.qos.logback.core.encoder.LayoutWrappingEncoder.doEncode(LayoutWrappingEncoder.java:131)
        .... 問題代碼出處
        at com.ljyhust.demo.web.ThreadTestDemoController.threadLoopDemo(ThreadTestDemoController.java:20)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        ..... 此處省略
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1520)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1476)
        - locked <0x00000000ec476d30> (a org.apache.tomcat.util.net.NioChannel)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:748)

網絡/IO阻塞

大量網絡IO阻塞,導致佔用服務線程,導致服務無法響應。
這裏以大家熟悉tomcat服務爲例,tomcat設有最大線程數maxThreads和最大排隊數acceptCount,這兩個參數可以在server.xml文件中配置。tomcat處理請求可分爲以下3種情況:
1.接收一個請求,當啓動的線程數或正在運行的線程數< maxThreads時,則tomcat會啓動一個線程來處理該請求;
2.接收一個請求,當啓動的線程數或正在運行的線程數> maxThreads時,則tomcat會把請求放入等待隊列,等待空閒線程執行請求;
3.接收一個請求,當啓動的線程數或正在運行的線程數> maxThreads && 請求隊列已滿時,則tomcat會直接拒絕請求,此時客戶端現象是connection refused(連接被拒絕)
如果大量的線程在執行請求的過程中由於IO阻塞,則導致線程池佔滿,服務則無法響應新的請求。

代碼示例

blockIo接口是問題代碼,請求第三方google應用,如果請求緩慢或阻塞則會導致請求線程阻塞,當大量請求線程阻塞佔滿tomcat線程池時,則服務無法響應新進來的請求甚至拒絕請求,導致服務“假死”。通過jmeter模擬 3000 併發請求,查看其它接口如getTime是否能正常響應。

@RequestMapping("blockIo")
public Object blockIoDemo() throws Exception {
    JSONObject resJson = new JSONObject();
    try {
        JSONObject resStr = RestClientUtil.getRestTemplate().getForObject("http://10.247.63.25:10015/demo/threadTest/resBlock", JSONObject.class);
        log.info("=====> 獲取text/html  {}", resStr);
    } catch (Exception e) {
        e.printStackTrace();
    }
    resJson.put("code", "100");
    return resJson;
}

@RequestMapping("getTime")
public Object getServerTime() throws Exception {
    log.info("=====> 請求開始");
    JSONObject resJson = new JSONObject();
    String format = DateFormatUtils.format(new Date(), "yyyy-MM-dd HH:mm:ss");
    resJson.put("code", "100");
    resJson.put("reqTime", format);
    log.info("=====> 請求結束");
    return resJson;
}

現象說明

  • 接口響應
    併發前後,請求getTime接口,發現併發前getTime接口響應時間爲32ms,而併發後響應時間則變爲12.19s,說明服務響應已受到影響。
    併發前響應時間
    併發後響應時間

  • CPU狀態
    CPU佔用並不高,此次併發對CPU並未造成明顯影響。
    CPU狀態

  • jstack線程堆棧
    線程堆棧信息如下,其中有大量的TIMED_WAITING線程,跟蹤stacktrace找到at org.apache.http.pool.PoolEntryFuture.await(PoolEntryFuture.java:137)位置,這個是apache httpclient連接池代碼段,這段代碼說明http連接池不夠用,造成大量請求等待。
    我們來看看RUNNABLE狀態的線程nid=0xab7,該線程正在執行請求,跟蹤代碼出處從下往上找,java.net.SocketInputStream.socketRead(SocketInputStream.java:116)說明正在讀取網絡資源。

// 請求進入等待隊列中
"http-nio-10015-exec-188" #208 daemon prio=5 os_prio=0 tid=0x0000000001bdb800 nid=0xab9 waiting on condition [0x00007f920f2af000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000e29149c0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkUntil(LockSupport.java:256)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2120)
        at org.apache.http.pool.PoolEntryFuture.await(PoolEntryFuture.java:137)
        at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:307)
        at org.apache.http.pool.AbstractConnPool.access$000(AbstractConnPool.java:65)
        at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:193)
        at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:186)
        at org.apache.http.pool.PoolEntryFuture.get(PoolEntryFuture.java:108)
        at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:282)
        at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:269)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:191)
        ....
        at com.ljyhust.demo.web.ThreadTestDemoController.blockIoDemo(ThreadTestDemoController.java:46)
        at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:221)
        at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136)
        at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:110)
        at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:832)
        at ....
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1520)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1476)
        - locked <0x00000000ee823bb0> (a org.apache.tomcat.util.net.NioChannel)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:748)

"http-nio-10015-exec-187" #207 daemon prio=5 os_prio=0 tid=0x0000000001bd9800 nid=0xab8 waiting on condition [0x00007f920f3b0000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000e142d7b8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkUntil(LockSupport.java:256)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2120)
        at org.apache.http.pool.PoolEntryFuture.await(PoolEntryFuture.java:137)
        at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:307)
        at org.apache.http.pool.AbstractConnPool.access$000(AbstractConnPool.java:65)
        at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:193)
        at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:186)
        at org.apache.http.pool.PoolEntryFuture.get(PoolEntryFuture.java:108)
        ...
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:528)
        at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1099)
        at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:670)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1520)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1476)
        - locked <0x00000000ee821b30> (a org.apache.tomcat.util.net.NioChannel)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:748)
// 請求第三方資源read
"http-nio-10015-exec-186" #206 daemon prio=5 os_prio=0 tid=0x0000000001bd7800 nid=0xab7 runnable [0x00007f920f4b1000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:171)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
        at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
        at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
        ...
        at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
        at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
        at org.springframework.http.client.HttpComponentsClientHttpRequest.executeInternal(HttpComponentsClientHttpRequest.java:91)
        at org.springframework.http.client.AbstractBufferingClientHttpRequest.executeInternal(AbstractBufferingClientHttpRequest.java:48)
        at org.springframework.http.client.AbstractClientHttpRequest.execute(AbstractClientHttpRequest.java:53)
        at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:596)
        at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:557)
        at org.springframework.web.client.RestTemplate.getForObject(RestTemplate.java:264)
        at com.ljyhust.demo.web.ThreadTestDemoController.blockIoDemo(ThreadTestDemoController.java:46)
        at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
        ...
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1520)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1476)
        - locked <0x00000000ec853808> (a org.apache.tomcat.util.net.NioChannel)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:748)

死鎖

死鎖是由於多線程爭奪互斥資源導致的。例如 1、2線程分別佔用A、B鎖,但同時在臨界區代碼中又需要B、A鎖,由於各自獲取了對方所需要的鎖,最終導致死鎖。
滿足死鎖的條件有以下四個,缺一不可:
1.互斥條件,即不能同時被兩個或兩個以上的線程佔有;
2.不可搶佔條件,即已佔用的鎖不能被其它線程搶奪;
3.佔有且申請條件,即進程已經佔有了一個鎖,但又需要申請/等待另外一個鎖;
4.循環等待條件,即等待其它線程的鎖,而其它線程又等待更多線程的鎖,且形成一個等待循環。

代碼示例

@RequestMapping("deadLock")
public Object deadLockDemo() throws Exception {
    log.info("=====> 請求開始");
    JSONObject resJson = new JSONObject();
    Thread t1 = new Thread(new Runnable() {
        @Override
        public void run() {
            try {
                deadLockThreadDemo.getLockAB();
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    });

    Thread t2 = new Thread(new Runnable() {
        @Override
        public void run() {
            try {
                deadLockThreadDemo.getLockBA();
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    });
    t1.start();
    t2.start();
    log.info("=====> 請求結束");
    resJson.put("code", "100");
    return resJson;
}

public void getLockAB() throws Exception {
    // 鎖A
    synchronized (objectA) {
        try {
            Thread.sleep(2000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        // 鎖B
        log.info("線程1嘗試獲取B鎖");
        synchronized (objectB) {
            log.info("線程1獲取到B鎖");
        }
    }
}

public void getLockBA() throws Exception {
    // 鎖B
    synchronized (objectB) {
        try {
            Thread.sleep(2000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        // 鎖A
        log.info("線程2嘗試獲取A鎖");
        synchronized (objectA) {
            log.info("線程2獲取到A鎖");
        }
    }
}

現象說明

  • jstack線程堆棧
    線程堆棧如下所示,老套路從下往上看棧信息,線程Thread-6先鎖0x00000000ebe97e18對象然後等待0x00000000ebe97e08;線程Thread-5先鎖0x00000000ebe97e08對象然後待0x00000000ebe97e18,兩個線程彼此等待,導致死鎖BLOCKED (on object monitor)
"Thread-6" #24 daemon prio=5 os_prio=0 tid=0x00007f6ce8ca6800 nid=0xbd7 waiting for monitor entry [0x00007f6d081f8000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at com.ljyhust.demo.service.DeadLockThreadDemo.getLockBA(DeadLockThreadDemo.java:42)
        - waiting to lock <0x00000000ebe97e08> (a java.lang.Object)
        - locked <0x00000000ebe97e18> (a java.lang.Object)
        at com.ljyhust.demo.web.ThreadTestDemoController$2.run(ThreadTestDemoController.java:89)
        at java.lang.Thread.run(Thread.java:748)

"Thread-5" #23 daemon prio=5 os_prio=0 tid=0x00007f6ce8576800 nid=0xbd6 waiting for monitor entry [0x00007f6d1cee3000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at com.ljyhust.demo.service.DeadLockThreadDemo.getLockAB(DeadLockThreadDemo.java:26)
        - waiting to lock <0x00000000ebe97e18> (a java.lang.Object)
        - locked <0x00000000ebe97e08> (a java.lang.Object)
        at com.ljyhust.demo.web.ThreadTestDemoController$1.run(ThreadTestDemoController.java:78)
        at java.lang.Thread.run(Thread.java:748)

結論

無論是CPU飆高還是服務響應緩慢,當從日誌中找不出問題甚至沒有日誌打印的時候,可以利用jstack命令打印線程堆棧信息、結合top -H -p <pid>可能會找到問題原因。尤其是當服務調用其它資源較多時,而又找不到具體哪個服務問題時,不妨試下這個命令找找看。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章