利用jstack命令分析JVM线程

笔者是WEB JAVA后台开发,最近在线上遇到过几次服务不可用问题,基本现象是接口请求无响应或响应非常慢达到分钟级别。一般问题发生时我们都会去查看日志,经常遇到没有日志的情况(此时服务无法响应client请求),甚至要去找几个小时前的日志现象发生时,有些接口甚至没有日志打印,查找起来很困难,利用jvm的线程栈工具jstack对于查找问题有很大帮助。

文章以SpringBoot为框架开发一个web demo应用,以接口代码示例几种可能会导致服务无法响应的案例,并讲述如何以jstack等工具排查问题。
环境:单核CPU虚拟机CentOS6 + JAVA8 + SpringBoot

JAVA jstack日志文件中有以下几种状态需要关注的:
1.死锁,Deadlock,线程死锁;
2.执行中,Runnable,线程执行过程中可能会遇到第三方IO等阻塞或循环,仍需要关注;
3.等待资源, Waiting on condition,线程等待条件,可能是在等待网络资源响应请求,具体需结合栈信息stacktrace进行分析;
4.等待获取监视器,Waiting on monitor entry,一般是互斥锁实现线程同步;
5.条件等待/定时等待,Object.wait() 或 TIMED_WAITING,Object.wait()是让当前线程阻塞,并出让当前线程的拥有的Object锁,直到被持有Object锁的其它线程调用Object.notify()唤醒才继续执行
6.停止/停止中:Parked/Parking。

死循环

死循环或长时间循环计算,占用CPU计算资源,导致CPU占满。本例虚拟机CPU为1核,所以CPU占用用率达到100%,如果是多核,则占用率为1/n,如四核则为25%

代码示例

@RequestMapping("loop")
public void threadLoopDemo() throws Exception{
    int num = 0;
    long start = System.currentTimeMillis() / 1000;
    while (true) {
        log.info("====>  测试 Loop");
        num++;
        if (num == Integer.MAX_VALUE) {
            log.info("====> rest num");
            num = 0;
        }

        if (System.currentTimeMillis() / 1000 - start > 1000) {
            return;
        }
    }
}

现象说明

top -c CPU占用情况,发现此时CPU占用100%,说明以上死循环独占CPU资源。
ps -mp 3168 -o THREAD,tid,timetop -H -p 3168(3168为进程号<pid>) 可打印出进程对应的线程id及运行时间time,可以看到nid=3187的线程占用CPU82.8%,且运行时间为2min
死循环CPU升高
线程执行情况

线程堆栈

jstack 3168打印输出找到对应问题的堆栈,从下往上看,发现tomcat NIO Channel被locked,说明该请求的线程未释放,仍在执行;同时找到出问题的代码位置。nid=0xc73换成十进制为nid=3187,即上述占用CPU高且时间较少的线程。

"http-nio-10015-exec-1" #19 daemon prio=5 os_prio=0 tid=0x00007f75e4050800 nid=0xc73 runnable [0x00007f75e84e4000]
   java.lang.Thread.State: RUNNABLE
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:326)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        - locked <0x00000000eb1a0b48> (a java.io.BufferedOutputStream)  // log日志,输出到file日志文件
        at java.io.PrintStream.write(PrintStream.java:482)
        - locked <0x00000000eb18a468> (a java.io.PrintStream)
        at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
        at ch.qos.logback.core.joran.spi.ConsoleTarget$1.write(ConsoleTarget.java:37)
        at ch.qos.logback.core.encoder.LayoutWrappingEncoder.doEncode(LayoutWrappingEncoder.java:131)
        .... 问题代码出处
        at com.ljyhust.demo.web.ThreadTestDemoController.threadLoopDemo(ThreadTestDemoController.java:20)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        ..... 此处省略
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1520)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1476)
        - locked <0x00000000ec476d30> (a org.apache.tomcat.util.net.NioChannel)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:748)

网络/IO阻塞

大量网络IO阻塞,导致占用服务线程,导致服务无法响应。
这里以大家熟悉tomcat服务为例,tomcat设有最大线程数maxThreads和最大排队数acceptCount,这两个参数可以在server.xml文件中配置。tomcat处理请求可分为以下3种情况:
1.接收一个请求,当启动的线程数或正在运行的线程数< maxThreads时,则tomcat会启动一个线程来处理该请求;
2.接收一个请求,当启动的线程数或正在运行的线程数> maxThreads时,则tomcat会把请求放入等待队列,等待空闲线程执行请求;
3.接收一个请求,当启动的线程数或正在运行的线程数> maxThreads && 请求队列已满时,则tomcat会直接拒绝请求,此时客户端现象是connection refused(连接被拒绝)
如果大量的线程在执行请求的过程中由于IO阻塞,则导致线程池占满,服务则无法响应新的请求。

代码示例

blockIo接口是问题代码,请求第三方google应用,如果请求缓慢或阻塞则会导致请求线程阻塞,当大量请求线程阻塞占满tomcat线程池时,则服务无法响应新进来的请求甚至拒绝请求,导致服务“假死”。通过jmeter模拟 3000 并发请求,查看其它接口如getTime是否能正常响应。

@RequestMapping("blockIo")
public Object blockIoDemo() throws Exception {
    JSONObject resJson = new JSONObject();
    try {
        JSONObject resStr = RestClientUtil.getRestTemplate().getForObject("http://10.247.63.25:10015/demo/threadTest/resBlock", JSONObject.class);
        log.info("=====> 获取text/html  {}", resStr);
    } catch (Exception e) {
        e.printStackTrace();
    }
    resJson.put("code", "100");
    return resJson;
}

@RequestMapping("getTime")
public Object getServerTime() throws Exception {
    log.info("=====> 请求开始");
    JSONObject resJson = new JSONObject();
    String format = DateFormatUtils.format(new Date(), "yyyy-MM-dd HH:mm:ss");
    resJson.put("code", "100");
    resJson.put("reqTime", format);
    log.info("=====> 请求结束");
    return resJson;
}

现象说明

  • 接口响应
    并发前后,请求getTime接口,发现并发前getTime接口响应时间为32ms,而并发后响应时间则变为12.19s,说明服务响应已受到影响。
    并发前响应时间
    并发后响应时间

  • CPU状态
    CPU占用并不高,此次并发对CPU并未造成明显影响。
    CPU状态

  • jstack线程堆栈
    线程堆栈信息如下,其中有大量的TIMED_WAITING线程,跟踪stacktrace找到at org.apache.http.pool.PoolEntryFuture.await(PoolEntryFuture.java:137)位置,这个是apache httpclient连接池代码段,这段代码说明http连接池不够用,造成大量请求等待。
    我们来看看RUNNABLE状态的线程nid=0xab7,该线程正在执行请求,跟踪代码出处从下往上找,java.net.SocketInputStream.socketRead(SocketInputStream.java:116)说明正在读取网络资源。

// 请求进入等待队列中
"http-nio-10015-exec-188" #208 daemon prio=5 os_prio=0 tid=0x0000000001bdb800 nid=0xab9 waiting on condition [0x00007f920f2af000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000e29149c0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkUntil(LockSupport.java:256)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2120)
        at org.apache.http.pool.PoolEntryFuture.await(PoolEntryFuture.java:137)
        at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:307)
        at org.apache.http.pool.AbstractConnPool.access$000(AbstractConnPool.java:65)
        at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:193)
        at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:186)
        at org.apache.http.pool.PoolEntryFuture.get(PoolEntryFuture.java:108)
        at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:282)
        at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:269)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:191)
        ....
        at com.ljyhust.demo.web.ThreadTestDemoController.blockIoDemo(ThreadTestDemoController.java:46)
        at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:221)
        at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136)
        at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:110)
        at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:832)
        at ....
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1520)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1476)
        - locked <0x00000000ee823bb0> (a org.apache.tomcat.util.net.NioChannel)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:748)

"http-nio-10015-exec-187" #207 daemon prio=5 os_prio=0 tid=0x0000000001bd9800 nid=0xab8 waiting on condition [0x00007f920f3b0000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000e142d7b8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkUntil(LockSupport.java:256)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2120)
        at org.apache.http.pool.PoolEntryFuture.await(PoolEntryFuture.java:137)
        at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:307)
        at org.apache.http.pool.AbstractConnPool.access$000(AbstractConnPool.java:65)
        at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:193)
        at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:186)
        at org.apache.http.pool.PoolEntryFuture.get(PoolEntryFuture.java:108)
        ...
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:528)
        at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1099)
        at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:670)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1520)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1476)
        - locked <0x00000000ee821b30> (a org.apache.tomcat.util.net.NioChannel)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:748)
// 请求第三方资源read
"http-nio-10015-exec-186" #206 daemon prio=5 os_prio=0 tid=0x0000000001bd7800 nid=0xab7 runnable [0x00007f920f4b1000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:171)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
        at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
        at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
        ...
        at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
        at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
        at org.springframework.http.client.HttpComponentsClientHttpRequest.executeInternal(HttpComponentsClientHttpRequest.java:91)
        at org.springframework.http.client.AbstractBufferingClientHttpRequest.executeInternal(AbstractBufferingClientHttpRequest.java:48)
        at org.springframework.http.client.AbstractClientHttpRequest.execute(AbstractClientHttpRequest.java:53)
        at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:596)
        at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:557)
        at org.springframework.web.client.RestTemplate.getForObject(RestTemplate.java:264)
        at com.ljyhust.demo.web.ThreadTestDemoController.blockIoDemo(ThreadTestDemoController.java:46)
        at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
        ...
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1520)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1476)
        - locked <0x00000000ec853808> (a org.apache.tomcat.util.net.NioChannel)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:748)

死锁

死锁是由于多线程争夺互斥资源导致的。例如 1、2线程分别占用A、B锁,但同时在临界区代码中又需要B、A锁,由于各自获取了对方所需要的锁,最终导致死锁。
满足死锁的条件有以下四个,缺一不可:
1.互斥条件,即不能同时被两个或两个以上的线程占有;
2.不可抢占条件,即已占用的锁不能被其它线程抢夺;
3.占有且申请条件,即进程已经占有了一个锁,但又需要申请/等待另外一个锁;
4.循环等待条件,即等待其它线程的锁,而其它线程又等待更多线程的锁,且形成一个等待循环。

代码示例

@RequestMapping("deadLock")
public Object deadLockDemo() throws Exception {
    log.info("=====> 请求开始");
    JSONObject resJson = new JSONObject();
    Thread t1 = new Thread(new Runnable() {
        @Override
        public void run() {
            try {
                deadLockThreadDemo.getLockAB();
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    });

    Thread t2 = new Thread(new Runnable() {
        @Override
        public void run() {
            try {
                deadLockThreadDemo.getLockBA();
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    });
    t1.start();
    t2.start();
    log.info("=====> 请求结束");
    resJson.put("code", "100");
    return resJson;
}

public void getLockAB() throws Exception {
    // 锁A
    synchronized (objectA) {
        try {
            Thread.sleep(2000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        // 锁B
        log.info("线程1尝试获取B锁");
        synchronized (objectB) {
            log.info("线程1获取到B锁");
        }
    }
}

public void getLockBA() throws Exception {
    // 锁B
    synchronized (objectB) {
        try {
            Thread.sleep(2000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        // 锁A
        log.info("线程2尝试获取A锁");
        synchronized (objectA) {
            log.info("线程2获取到A锁");
        }
    }
}

现象说明

  • jstack线程堆栈
    线程堆栈如下所示,老套路从下往上看栈信息,线程Thread-6先锁0x00000000ebe97e18对象然后等待0x00000000ebe97e08;线程Thread-5先锁0x00000000ebe97e08对象然后待0x00000000ebe97e18,两个线程彼此等待,导致死锁BLOCKED (on object monitor)
"Thread-6" #24 daemon prio=5 os_prio=0 tid=0x00007f6ce8ca6800 nid=0xbd7 waiting for monitor entry [0x00007f6d081f8000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at com.ljyhust.demo.service.DeadLockThreadDemo.getLockBA(DeadLockThreadDemo.java:42)
        - waiting to lock <0x00000000ebe97e08> (a java.lang.Object)
        - locked <0x00000000ebe97e18> (a java.lang.Object)
        at com.ljyhust.demo.web.ThreadTestDemoController$2.run(ThreadTestDemoController.java:89)
        at java.lang.Thread.run(Thread.java:748)

"Thread-5" #23 daemon prio=5 os_prio=0 tid=0x00007f6ce8576800 nid=0xbd6 waiting for monitor entry [0x00007f6d1cee3000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at com.ljyhust.demo.service.DeadLockThreadDemo.getLockAB(DeadLockThreadDemo.java:26)
        - waiting to lock <0x00000000ebe97e18> (a java.lang.Object)
        - locked <0x00000000ebe97e08> (a java.lang.Object)
        at com.ljyhust.demo.web.ThreadTestDemoController$1.run(ThreadTestDemoController.java:78)
        at java.lang.Thread.run(Thread.java:748)

结论

无论是CPU飙高还是服务响应缓慢,当从日志中找不出问题甚至没有日志打印的时候,可以利用jstack命令打印线程堆栈信息、结合top -H -p <pid>可能会找到问题原因。尤其是当服务调用其它资源较多时,而又找不到具体哪个服务问题时,不妨试下这个命令找找看。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章