筆者是WEB JAVA後臺開發,最近在線上遇到過幾次服務不可用問題,基本現象是接口請求無響應或響應非常慢達到分鐘級別。一般問題發生時我們都會去查看日誌,經常遇到沒有日誌的情況(此時服務無法響應client請求),甚至要去找幾個小時前的日誌現象發生時,有些接口甚至沒有日誌打印,查找起來很困難,利用jvm的線程棧工具jstack對於查找問題有很大幫助。
文章以SpringBoot爲框架開發一個web demo應用,以接口代碼示例幾種可能會導致服務無法響應的案例,並講述如何以jstack等工具排查問題。
環境:單核CPU虛擬機CentOS6 + JAVA8 + SpringBoot
JAVA jstack日誌文件中有以下幾種狀態需要關注的:
1.死鎖,Deadlock,線程死鎖;
2.執行中,Runnable,線程執行過程中可能會遇到第三方IO等阻塞或循環,仍需要關注;
3.等待資源, Waiting on condition,線程等待條件,可能是在等待網絡資源響應請求,具體需結合棧信息stacktrace
進行分析;
4.等待獲取監視器,Waiting on monitor entry,一般是互斥鎖實現線程同步;
5.條件等待/定時等待,Object.wait() 或 TIMED_WAITING,Object.wait()是讓當前線程阻塞,並出讓當前線程的擁有的Object鎖,直到被持有Object鎖的其它線程調用Object.notify()喚醒才繼續執行
6.停止/停止中:Parked/Parking。
死循環
死循環或長時間循環計算,佔用CPU計算資源,導致CPU佔滿。本例虛擬機CPU爲1核,所以CPU佔用用率達到100%
,如果是多核,則佔用率爲1/n
,如四核則爲25%
代碼示例
@RequestMapping("loop")
public void threadLoopDemo() throws Exception{
int num = 0;
long start = System.currentTimeMillis() / 1000;
while (true) {
log.info("====> 測試 Loop");
num++;
if (num == Integer.MAX_VALUE) {
log.info("====> rest num");
num = 0;
}
if (System.currentTimeMillis() / 1000 - start > 1000) {
return;
}
}
}
現象說明
top -c
CPU佔用情況,發現此時CPU佔用100%
,說明以上死循環獨佔CPU資源。
ps -mp 3168 -o THREAD,tid,time
或top -H -p 3168
(3168爲進程號<pid>
) 可打印出進程對應的線程id及運行時間time,可以看到nid=3187
的線程佔用CPU82.8%
,且運行時間爲2min
線程堆棧
jstack 3168
打印輸出找到對應問題的堆棧,從下往上看,發現tomcat NIO Channel被locked,說明該請求的線程未釋放,仍在執行;同時找到出問題的代碼位置。nid=0xc73
換成十進制爲nid=3187
,即上述佔用CPU高且時間較少的線程。
"http-nio-10015-exec-1" #19 daemon prio=5 os_prio=0 tid=0x00007f75e4050800 nid=0xc73 runnable [0x00007f75e84e4000]
java.lang.Thread.State: RUNNABLE
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
- locked <0x00000000eb1a0b48> (a java.io.BufferedOutputStream) // log日誌,輸出到file日誌文件
at java.io.PrintStream.write(PrintStream.java:482)
- locked <0x00000000eb18a468> (a java.io.PrintStream)
at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
at ch.qos.logback.core.joran.spi.ConsoleTarget$1.write(ConsoleTarget.java:37)
at ch.qos.logback.core.encoder.LayoutWrappingEncoder.doEncode(LayoutWrappingEncoder.java:131)
.... 問題代碼出處
at com.ljyhust.demo.web.ThreadTestDemoController.threadLoopDemo(ThreadTestDemoController.java:20)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
..... 此處省略
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1520)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1476)
- locked <0x00000000ec476d30> (a org.apache.tomcat.util.net.NioChannel)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
網絡/IO阻塞
大量網絡IO阻塞,導致佔用服務線程,導致服務無法響應。
這裏以大家熟悉tomcat
服務爲例,tomcat設有最大線程數maxThreads
和最大排隊數acceptCount
,這兩個參數可以在server.xml
文件中配置。tomcat處理請求可分爲以下3種情況:
1.接收一個請求,當啓動的線程數或正在運行的線程數< maxThreads
時,則tomcat
會啓動一個線程來處理該請求;
2.接收一個請求,當啓動的線程數或正在運行的線程數> maxThreads
時,則tomcat
會把請求放入等待隊列,等待空閒線程執行請求;
3.接收一個請求,當啓動的線程數或正在運行的線程數> maxThreads
&& 請求隊列已滿時,則tomcat
會直接拒絕請求,此時客戶端現象是connection refused
(連接被拒絕)
如果大量的線程在執行請求的過程中由於IO阻塞,則導致線程池佔滿,服務則無法響應新的請求。
代碼示例
blockIo
接口是問題代碼,請求第三方google應用,如果請求緩慢或阻塞則會導致請求線程阻塞,當大量請求線程阻塞佔滿tomcat
線程池時,則服務無法響應新進來的請求甚至拒絕請求,導致服務“假死”。通過jmeter
模擬 3000 併發請求,查看其它接口如getTime
是否能正常響應。
@RequestMapping("blockIo")
public Object blockIoDemo() throws Exception {
JSONObject resJson = new JSONObject();
try {
JSONObject resStr = RestClientUtil.getRestTemplate().getForObject("http://10.247.63.25:10015/demo/threadTest/resBlock", JSONObject.class);
log.info("=====> 獲取text/html {}", resStr);
} catch (Exception e) {
e.printStackTrace();
}
resJson.put("code", "100");
return resJson;
}
@RequestMapping("getTime")
public Object getServerTime() throws Exception {
log.info("=====> 請求開始");
JSONObject resJson = new JSONObject();
String format = DateFormatUtils.format(new Date(), "yyyy-MM-dd HH:mm:ss");
resJson.put("code", "100");
resJson.put("reqTime", format);
log.info("=====> 請求結束");
return resJson;
}
現象說明
-
接口響應
併發前後,請求getTime
接口,發現併發前getTime
接口響應時間爲32ms
,而併發後響應時間則變爲12.19s
,說明服務響應已受到影響。
-
CPU狀態
CPU佔用並不高,此次併發對CPU並未造成明顯影響。
-
jstack線程堆棧
線程堆棧信息如下,其中有大量的TIMED_WAITING
線程,跟蹤stacktrace
找到at org.apache.http.pool.PoolEntryFuture.await(PoolEntryFuture.java:137)
位置,這個是apache httpclient
連接池代碼段,這段代碼說明http連接池不夠用,造成大量請求等待。
我們來看看RUNNABLE
狀態的線程nid=0xab7
,該線程正在執行請求,跟蹤代碼出處從下往上找,java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
說明正在讀取網絡資源。
// 請求進入等待隊列中
"http-nio-10015-exec-188" #208 daemon prio=5 os_prio=0 tid=0x0000000001bdb800 nid=0xab9 waiting on condition [0x00007f920f2af000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e29149c0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkUntil(LockSupport.java:256)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2120)
at org.apache.http.pool.PoolEntryFuture.await(PoolEntryFuture.java:137)
at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:307)
at org.apache.http.pool.AbstractConnPool.access$000(AbstractConnPool.java:65)
at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:193)
at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:186)
at org.apache.http.pool.PoolEntryFuture.get(PoolEntryFuture.java:108)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:282)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:269)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:191)
....
at com.ljyhust.demo.web.ThreadTestDemoController.blockIoDemo(ThreadTestDemoController.java:46)
at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:221)
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136)
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:110)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:832)
at ....
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1520)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1476)
- locked <0x00000000ee823bb0> (a org.apache.tomcat.util.net.NioChannel)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
"http-nio-10015-exec-187" #207 daemon prio=5 os_prio=0 tid=0x0000000001bd9800 nid=0xab8 waiting on condition [0x00007f920f3b0000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e142d7b8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkUntil(LockSupport.java:256)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2120)
at org.apache.http.pool.PoolEntryFuture.await(PoolEntryFuture.java:137)
at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:307)
at org.apache.http.pool.AbstractConnPool.access$000(AbstractConnPool.java:65)
at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:193)
at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:186)
at org.apache.http.pool.PoolEntryFuture.get(PoolEntryFuture.java:108)
...
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:528)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1099)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:670)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1520)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1476)
- locked <0x00000000ee821b30> (a org.apache.tomcat.util.net.NioChannel)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
// 請求第三方資源read
"http-nio-10015-exec-186" #206 daemon prio=5 os_prio=0 tid=0x0000000001bd7800 nid=0xab7 runnable [0x00007f920f4b1000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
...
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at org.springframework.http.client.HttpComponentsClientHttpRequest.executeInternal(HttpComponentsClientHttpRequest.java:91)
at org.springframework.http.client.AbstractBufferingClientHttpRequest.executeInternal(AbstractBufferingClientHttpRequest.java:48)
at org.springframework.http.client.AbstractClientHttpRequest.execute(AbstractClientHttpRequest.java:53)
at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:596)
at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:557)
at org.springframework.web.client.RestTemplate.getForObject(RestTemplate.java:264)
at com.ljyhust.demo.web.ThreadTestDemoController.blockIoDemo(ThreadTestDemoController.java:46)
at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
...
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1520)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1476)
- locked <0x00000000ec853808> (a org.apache.tomcat.util.net.NioChannel)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
死鎖
死鎖是由於多線程爭奪互斥資源導致的。例如 1、2線程分別佔用A、B鎖,但同時在臨界區代碼中又需要B、A鎖,由於各自獲取了對方所需要的鎖,最終導致死鎖。
滿足死鎖的條件有以下四個,缺一不可:
1.互斥條件,即不能同時被兩個或兩個以上的線程佔有;
2.不可搶佔條件,即已佔用的鎖不能被其它線程搶奪;
3.佔有且申請條件,即進程已經佔有了一個鎖,但又需要申請/等待另外一個鎖;
4.循環等待條件,即等待其它線程的鎖,而其它線程又等待更多線程的鎖,且形成一個等待循環。
代碼示例
@RequestMapping("deadLock")
public Object deadLockDemo() throws Exception {
log.info("=====> 請求開始");
JSONObject resJson = new JSONObject();
Thread t1 = new Thread(new Runnable() {
@Override
public void run() {
try {
deadLockThreadDemo.getLockAB();
} catch (Exception e) {
e.printStackTrace();
}
}
});
Thread t2 = new Thread(new Runnable() {
@Override
public void run() {
try {
deadLockThreadDemo.getLockBA();
} catch (Exception e) {
e.printStackTrace();
}
}
});
t1.start();
t2.start();
log.info("=====> 請求結束");
resJson.put("code", "100");
return resJson;
}
public void getLockAB() throws Exception {
// 鎖A
synchronized (objectA) {
try {
Thread.sleep(2000);
} catch (InterruptedException e) {
e.printStackTrace();
}
// 鎖B
log.info("線程1嘗試獲取B鎖");
synchronized (objectB) {
log.info("線程1獲取到B鎖");
}
}
}
public void getLockBA() throws Exception {
// 鎖B
synchronized (objectB) {
try {
Thread.sleep(2000);
} catch (InterruptedException e) {
e.printStackTrace();
}
// 鎖A
log.info("線程2嘗試獲取A鎖");
synchronized (objectA) {
log.info("線程2獲取到A鎖");
}
}
}
現象說明
- jstack線程堆棧
線程堆棧如下所示,老套路從下往上看棧信息,線程Thread-6
先鎖0x00000000ebe97e18
對象然後等待0x00000000ebe97e08
;線程Thread-5
先鎖0x00000000ebe97e08
對象然後待0x00000000ebe97e18
,兩個線程彼此等待,導致死鎖BLOCKED (on object monitor)
。
"Thread-6" #24 daemon prio=5 os_prio=0 tid=0x00007f6ce8ca6800 nid=0xbd7 waiting for monitor entry [0x00007f6d081f8000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.ljyhust.demo.service.DeadLockThreadDemo.getLockBA(DeadLockThreadDemo.java:42)
- waiting to lock <0x00000000ebe97e08> (a java.lang.Object)
- locked <0x00000000ebe97e18> (a java.lang.Object)
at com.ljyhust.demo.web.ThreadTestDemoController$2.run(ThreadTestDemoController.java:89)
at java.lang.Thread.run(Thread.java:748)
"Thread-5" #23 daemon prio=5 os_prio=0 tid=0x00007f6ce8576800 nid=0xbd6 waiting for monitor entry [0x00007f6d1cee3000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.ljyhust.demo.service.DeadLockThreadDemo.getLockAB(DeadLockThreadDemo.java:26)
- waiting to lock <0x00000000ebe97e18> (a java.lang.Object)
- locked <0x00000000ebe97e08> (a java.lang.Object)
at com.ljyhust.demo.web.ThreadTestDemoController$1.run(ThreadTestDemoController.java:78)
at java.lang.Thread.run(Thread.java:748)
結論
無論是CPU飆高還是服務響應緩慢,當從日誌中找不出問題甚至沒有日誌打印的時候,可以利用jstack
命令打印線程堆棧信息、結合top -H -p <pid>
可能會找到問題原因。尤其是當服務調用其它資源較多時,而又找不到具體哪個服務問題時,不妨試下這個命令找找看。