最近弄了一个新的应用,专门用来收集一些应用数据作参考以及问题排查定位的,但是上线一段时间之后出现了一个非常非常诡异的问题。
问题描述
收集器采用异步化,启用了一个独立的线程专门收集各个服务发送过来的数据进行消费。刚开始上线的时候数据能够按时进来,但是一个诡异的事情发生了,每隔一段时间该线程不消费了。
这是大概代码:
while (true) {
try {
// 数据消费逻辑 -> 插入数据库
} catch (Throwable e) {
logger.error("异步队列消费失败", e);
sleep(5);
} finally {
// 防止一直占用不释放导致CPU飙高
sleep(5);
}
}
关键是没有任何异常!!!!线程不动了,数据进不去!!!
之前还以为是异常没捕获到,但是Throwable
也不行!
排查思路
分析GC日志
确实很高,但是和该问题联系不上。
会不会是线程挂掉了?
之前通过arthas
的thread
命令定位到的数据,发现线程还在!!!!但是状态是WAITING
,完了有点知识盲区了,然后看了几篇文章对该状态的描述,大意就是需要等待唤醒的意思。我看了看
我的代码貌似没有阻塞唤醒的操作啊!!!
然后想看看WAITING线程栈有没有什么线索!
jps # 找到应用的pid
jstack [pid]
然后查找线程的名字MysqlMQStoreProcess-consumer-thread
"MysqlMQStoreProcess-consumer-thread" #46 prio=5 os_prio=0 tid=0x00007ff7b1357800 nid=0x2fd3 waiting on condition [0x00007ff7241c3000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000732ef9c48> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at com.alibaba.druid.pool.DruidDataSource.takeLast(DruidDataSource.java:1899)
at com.alibaba.druid.pool.DruidDataSource.getConnectionInternal(DruidDataSource.java:1460)
at com.alibaba.druid.pool.DruidDataSource.getConnectionDirect(DruidDataSource.java:1255)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1235)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1225)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:90)
at org.springframework.jdbc.datasource.DataSourceUtils.fetchConnection(DataSourceUtils.java:157)
at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:115)
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:78)
at org.mybatis.spring.transaction.SpringManagedTransaction.openConnection(SpringManagedTransaction.java:80)
at org.mybatis.spring.transaction.SpringManagedTransaction.getConnection(SpringManagedTransaction.java:67)
at org.apache.ibatis.executor.BaseExecutor.getConnection(BaseExecutor.java:336)
at com.baomidou.mybatisplus.core.executor.MybatisSimpleExecutor.prepareStatement(MybatisSimpleExecutor.java:93)
at com.baomidou.mybatisplus.core.executor.MybatisSimpleExecutor.doUpdate(MybatisSimpleExecutor.java:53)
at org.apache.ibatis.executor.BaseExecutor.update(BaseExecutor.java:117)
at org.apache.ibatis.executor.CachingExecutor.update(CachingExecutor.java:76)
at org.apache.ibatis.session.defaults.DefaultSqlSession.update(DefaultSqlSession.java:197)
at org.apache.ibatis.session.defaults.DefaultSqlSession.insert(DefaultSqlSession.java:184)
at sun.reflect.GeneratedMethodAccessor95.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:426)
at com.sun.proxy.$Proxy66.insert(Unknown Source)
at org.mybatis.spring.SqlSessionTemplate.insert(SqlSessionTemplate.java:271)
at com.baomidou.mybatisplus.core.override.MybatisMapperMethod.execute(MybatisMapperMethod.java:58)
at com.baomidou.mybatisplus.core.override.MybatisMapperProxy.invoke(MybatisMapperProxy.java:61)
at com.sun.proxy.$Proxy67.batchPartitionInsert(Unknown Source)
at com.jay.monitor.data.server.store.mysql.MysqlMQStoreProcess.speedUpConsumerList(MysqlMQStoreProcess.java:80)
at com.jay.monitor.data.server.store.AbstractStoreProcess.speedUpConsumer(AbstractStoreProcess.java:131)
at com.jay.monitor.data.server.store.AbstractStoreProcess.run(AbstractStoreProcess.java:157)
at java.lang.Thread.run(Thread.java:748)
然后根据堆栈缕了一下思路,大概就是获取连接的时候获取不到,一直阻塞在那不动了!!!
然后要分析为什么会阻塞
if (maxWait > 0) {
holder = pollLast(nanos);
} else {
// 阻塞代码
holder = takeLast();
}
这里基本上已经猜到大概了,原因就是我没有配置maxWait
这个参数,导致的!!!
改动前:
spring:
datasource:
type: com.alibaba.druid.pool.DruidDataSource
driver-class-name: com.mysql.jdbc.Driver
url: jdbc:mysql://xxxxxx/db?characterEncoding=UTF-8&connectTimeout=60000&socketTimeout=60000
username: xxxx
password: xxxx
改动后:
spring:
datasource:
type: com.alibaba.druid.pool.DruidDataSource
driver-class-name: com.mysql.jdbc.Driver
max-wait: 60000 # 最重要的参数
time-between-eviction-runs-millis: 60000
initial-size: 20
min-idle: 10
max-active: 20
min-evictable-idle-time-millis: 600000
max-evictable-idle-time-millis: 900000
test-on-borrow: false
test-on-return: false
test-while-idle: true
keep-alive: true
url: jdbc:mysql://xxxxxx/db?characterEncoding=UTF-8&connectTimeout=60000&socketTimeout=60000
username: xxxx
password: xxxx
另外配置文件上可以看看有没有加上这个让参数生效:
@ConfigurationProperties(prefix = "spring.datasource")
@Bean
public DruidDataSource druidDataSource() {
return new DruidDataSource();
}
嗯,年轻人,一时大意,没有闪~ 排查好久一身汗