1 文章概述
DUBBO线程池打满是一个严重问题,本文通过一个实例分析如何排查这个问题,首先我们用代码重现这个异常。
1.1 生产者配置
<beans>
<dubbo:registry address="zookeeper://127.0.0.1:2181" />
<dubbo:protocol name="dubbo" port="8888" />
<dubbo:service interface="com.itxpz.dubbo.demo.provider.HelloService" ref="helloService" />
</beans>
1.2 生产者业务
package com.itxpz.dubbo.demo.provider;
public interface HelloService {
public String sayHello(String name) throws Exception;
}
public class HelloServiceImpl implements HelloService {
public String sayHello(String name) throws Exception {
String result = "hello[" + name + "]";
Thread.sleep(10000L); // 模拟耗时操作
System.out.println("生产者执行结果" + result);
return result;
}
}
1.3 消费者配置
<beans>
<dubbo:registry address="zookeeper://127.0.0.1:2181" />
<dubbo:reference id="helloService" interface="com.itxpz.dubbo.demo.provider.HelloService" />
</beans>
1.4 消费者业务
public class Consumer {
public static void main(String[] args) throws Exception {
testThread();
System.in.read();
}
public static void testThread() {
ClassPathXmlApplicationContext context = new ClassPathXmlApplicationContext(new String[] { "classpath*:METAINF/spring/dubbo-consumer.xml" });
context.start();
// 模拟高并发场景
for (int i = 0; i < 500; i++) {
new Thread(new Runnable() {
@Override
public void run() {
HelloService helloService = (HelloService) context.getBean("helloService");
String result;
try {
result = helloService.sayHello("IT徐胖子");
System.out.println("客户端收到结果" + result);
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
}).start();
}
}
}
2 问题分析
运行程序发现生产者和消费者都抛出异常信息,下面我们从三个维度分析这个问题。
2.1 生产者还是消费者
分析异常发生在生产者还是消费者非常重要,本文提供三个步骤
(1) 生产者和消费者异常日志内容不相同
(2) DubboServerHandler-x.x.x.x:port表示异常服务器地址和端口
(3) 根据服务器地址和端口分析是生产者还是消费者
分析生产者日志DubboServerHandler地址和端口可以得出这是生产者异常
WARN support.AbortPolicyWithReport: Thread pool is EXHAUSTED
Thread Name: DubboServerHandler-1.1.1.1:8888
Pool Size: 200 (active: 200, core: 200, max: 200, largest: 200) Task: 201 (completed: 1)
分析消费者日志DubboServerHandler地址和端口可以分析得出这是生产者异常,再结合Server side信息可以确认异常发生在生产者
Failed to invoke the method sayHello in the service com.itxpz.dubbo.demo.provider.HelloService
Tried 3 times of the providers [1.1.1.1:8888] (1/1) from the registry 127.0.0.1:2181
Server side(1.1.1.1,8888) threadpool is exhausted ,detail msg:Thread pool is EXHAUSTED
Thread Name: DubboServerHandler-1.1.1.1:8888, Pool Size: 200 (active: 200, core: 200, max: 200, largest: 200), Task: 201 (completed: 1)
2.2 消费者分析
通过分析消费者日志我们知道生产者线程池被打满,而且可以定位到哪一个方法报错。消费者需要做好降级策略,例如使用mock机制或者熔断保护系统。我们还可以查找生产者地址在控制台查询这台机器服务运行情况,如果不是本团队维护还要联系相关技术团队迅速处理。
2.3 生产者分析
通过分析生产者日志我们知道生产者线程池被打满,但是不知道哪一个方法报错,这就需要结合线程快照进行分析。DUBBO线程池被打满时拒绝策略会被执行,拒绝策略会输出线程快照文件保护现场,我们通过分析线程快照文件可以定位方法
public class AbortPolicyWithReport extends ThreadPoolExecutor.AbortPolicy {
protected static final Logger logger = LoggerFactory.getLogger(AbortPolicyWithReport.class);
private final String threadName;
private final URL url;
private static volatile long lastPrintTime = 0;
private static Semaphore guard = new Semaphore(1);
public AbortPolicyWithReport(String threadName, URL url) {
this.threadName = threadName;
this.url = url;
}
@Override
public void rejectedExecution(Runnable r, ThreadPoolExecutor e) {
String msg = String.format("Thread pool is EXHAUSTED!" +
" Thread Name: %s, Pool Size: %d (active: %d, core: %d, max: %d, largest: %d), Task: %d (completed: %d)," +
" Executor status:(isShutdown:%s, isTerminated:%s, isTerminating:%s), in %s://%s:%d!",
threadName, e.getPoolSize(), e.getActiveCount(), e.getCorePoolSize(), e.getMaximumPoolSize(), e.getLargestPoolSize(),
e.getTaskCount(), e.getCompletedTaskCount(), e.isShutdown(), e.isTerminated(), e.isTerminating(),
url.getProtocol(), url.getIp(), url.getPort());
logger.warn(msg);
// 打印线程快照
dumpJStack();
throw new RejectedExecutionException(msg);
}
private void dumpJStack() {
long now = System.currentTimeMillis();
// 每10分钟输出线程快照
if (now - lastPrintTime < 10 * 60 * 1000) {
return;
}
if (!guard.tryAcquire()) {
return;
}
ExecutorService pool = Executors.newSingleThreadExecutor();
pool.execute(() -> {
String dumpPath = url.getParameter(Constants.DUMP_DIRECTORY, System.getProperty("user.home"));
System.out.println("AbortPolicyWithReport dumpJStack directory=" + dumpPath);
SimpleDateFormat sdf;
String os = System.getProperty("os.name").toLowerCase();
// linux文件位置/home/xxx/Dubbo_JStack.log.2020-06-09_20:50:15
// windows文件位置/user/xxx/Dubbo_JStack.log.2020-06-09_20-50-15
if (os.contains("win")) {
sdf = new SimpleDateFormat("yyyy-MM-dd_HH-mm-ss");
} else {
sdf = new SimpleDateFormat("yyyy-MM-dd_HH:mm:ss");
}
String dateStr = sdf.format(new Date());
// try-with-resources
try (FileOutputStream jStackStream = new FileOutputStream(new File(dumpPath, "Dubbo_JStack.log" + "." + dateStr))) {
JVMUtil.jstack(jStackStream);
} catch (Throwable t) {
logger.error("dump jStack error", t);
} finally {
guard.release();
}
lastPrintTime = System.currentTimeMillis();
});
// 必须关闭线程池否则会引发OOM
pool.shutdown();
}
}
BLOCKED和TIMED_WAITING线程状态需要我们重点关注,如果分析线程快照文件发现大量线程阻塞或者等待则可以定位到具体方法。定位具体方法后进行优化,这是解决线程池打满问题核心步骤。
"DubboServerHandler-1.1.1.1:8888-thread-200" Id=230 TIMED_WAITING
at java.lang.Thread.sleep(Native Method)
at com.itxpz.dubbo.demo.provider.HelloServiceImpl.sayHello(HelloServiceImpl.java:13)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.dubbo.rpc.proxy.jdk.JdkProxyFactory$1.doInvoke(JdkProxyFactory.java:47)
at org.apache.dubbo.rpc.proxy.AbstractProxyInvoker.invoke(AbstractProxyInvoker.java:88)
at org.apache.dubbo.config.invoker.DelegateProviderMetaDataInvoker.invoke(DelegateProviderMetaDataInvoker.java:56)
at org.apache.dubbo.rpc.protocol.InvokerWrapper.invoke(InvokerWrapper.java:56)
at org.apache.dubbo.rpc.filter.ExceptionFilter.invoke(ExceptionFilter.java:63)
at org.apache.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:80)
at org.apache.dubbo.monitor.support.MonitorFilter.invoke(MonitorFilter.java:88)
at org.apache.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:80)
at org.apache.dubbo.rpc.filter.TimeoutFilter.invoke(TimeoutFilter.java:42)
at org.apache.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:80)
at org.apache.dubbo.rpc.protocol.dubbo.filter.TraceFilter.invoke(TraceFilter.java:80)
at org.apache.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:80)
at org.apache.dubbo.rpc.filter.ContextFilter.invoke(ContextFilter.java:78)
at org.apache.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:80)
at org.apache.dubbo.rpc.filter.GenericFilter.invoke(GenericFilter.java:143)
at org.apache.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:80)
at org.apache.dubbo.rpc.filter.ClassLoaderFilter.invoke(ClassLoaderFilter.java:38)
at org.apache.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:80)
at org.apache.dubbo.rpc.filter.EchoFilter.invoke(EchoFilter.java:39)
at org.apache.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:80)
at org.apache.dubbo.rpc.protocol.dubbo.DubboProtocol$1.reply(DubboProtocol.java:115)
at org.apache.dubbo.remoting.exchange.support.header.HeaderExchangeHandler.handleRequest(HeaderExchangeHandler.java:104)
at org.apache.dubbo.remoting.exchange.support.header.HeaderExchangeHandler.received(HeaderExchangeHandler.java:208)
at org.apache.dubbo.remoting.transport.DecodeHandler.received(DecodeHandler.java:51)
at org.apache.dubbo.remoting.transport.dispatcher.ChannelEventRunnable.run(ChannelEventRunnable.java:57)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
3 文章总结
本文分析了DUBBO线程池打满问题排查思路,第一通过日志分析是生产者还是消费者发生问题,生产者和消费者异常日志信息不同。第二通过线程快照信息定位具体慢服务信息。第三优化慢服务是解决问题核心。
扫描二维码关注公众号【IT徐胖子】获取更多互联网和技术干货,感谢各位支持