纯干货:死循环线程居然不执行了?

最近弄了一个新的应用,专门用来收集一些应用数据作参考以及问题排查定位的,但是上线一段时间之后出现了一个非常非常诡异的问题。

问题描述

收集器采用异步化,启用了一个独立的线程专门收集各个服务发送过来的数据进行消费。刚开始上线的时候数据能够按时进来,但是一个诡异的事情发生了,每隔一段时间该线程不消费了。

这是大概代码:

while (true) {
    try {
        // 数据消费逻辑 -> 插入数据库
    } catch (Throwable e) {
        logger.error("异步队列消费失败", e);
        sleep(5);
    } finally {
        // 防止一直占用不释放导致CPU飙高
        sleep(5);
    }
}

关键是没有任何异常!!!!线程不动了,数据进不去!!!

之前还以为是异常没捕获到,但是Throwable也不行!

排查思路

分析GC日志

确实很高,但是和该问题联系不上。

会不会是线程挂掉了?

之前通过arthasthread命令定位到的数据,发现线程还在!!!!但是状态是WAITING
,完了有点知识盲区了,然后看了几篇文章对该状态的描述,大意就是需要等待唤醒的意思。我看了看

我的代码貌似没有阻塞唤醒的操作啊!!!

然后想看看WAITING线程栈有没有什么线索!

jps # 找到应用的pid
jstack [pid]

然后查找线程的名字MysqlMQStoreProcess-consumer-thread

"MysqlMQStoreProcess-consumer-thread" #46 prio=5 os_prio=0 tid=0x00007ff7b1357800 nid=0x2fd3 waiting on condition [0x00007ff7241c3000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x0000000732ef9c48> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at com.alibaba.druid.pool.DruidDataSource.takeLast(DruidDataSource.java:1899)
    at com.alibaba.druid.pool.DruidDataSource.getConnectionInternal(DruidDataSource.java:1460)
    at com.alibaba.druid.pool.DruidDataSource.getConnectionDirect(DruidDataSource.java:1255)
    at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1235)
    at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1225)
    at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:90)
    at org.springframework.jdbc.datasource.DataSourceUtils.fetchConnection(DataSourceUtils.java:157)
    at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:115)
    at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:78)
    at org.mybatis.spring.transaction.SpringManagedTransaction.openConnection(SpringManagedTransaction.java:80)
    at org.mybatis.spring.transaction.SpringManagedTransaction.getConnection(SpringManagedTransaction.java:67)
    at org.apache.ibatis.executor.BaseExecutor.getConnection(BaseExecutor.java:336)
    at com.baomidou.mybatisplus.core.executor.MybatisSimpleExecutor.prepareStatement(MybatisSimpleExecutor.java:93)
    at com.baomidou.mybatisplus.core.executor.MybatisSimpleExecutor.doUpdate(MybatisSimpleExecutor.java:53)
    at org.apache.ibatis.executor.BaseExecutor.update(BaseExecutor.java:117)
    at org.apache.ibatis.executor.CachingExecutor.update(CachingExecutor.java:76)
    at org.apache.ibatis.session.defaults.DefaultSqlSession.update(DefaultSqlSession.java:197)
    at org.apache.ibatis.session.defaults.DefaultSqlSession.insert(DefaultSqlSession.java:184)
    at sun.reflect.GeneratedMethodAccessor95.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:426)
    at com.sun.proxy.$Proxy66.insert(Unknown Source)
    at org.mybatis.spring.SqlSessionTemplate.insert(SqlSessionTemplate.java:271)
    at com.baomidou.mybatisplus.core.override.MybatisMapperMethod.execute(MybatisMapperMethod.java:58)
    at com.baomidou.mybatisplus.core.override.MybatisMapperProxy.invoke(MybatisMapperProxy.java:61)
    at com.sun.proxy.$Proxy67.batchPartitionInsert(Unknown Source)
    at com.jay.monitor.data.server.store.mysql.MysqlMQStoreProcess.speedUpConsumerList(MysqlMQStoreProcess.java:80)
    at com.jay.monitor.data.server.store.AbstractStoreProcess.speedUpConsumer(AbstractStoreProcess.java:131)
    at com.jay.monitor.data.server.store.AbstractStoreProcess.run(AbstractStoreProcess.java:157)
    at java.lang.Thread.run(Thread.java:748)

然后根据堆栈缕了一下思路,大概就是获取连接的时候获取不到,一直阻塞在那不动了!!!

然后要分析为什么会阻塞

if (maxWait > 0) {
     holder = pollLast(nanos);
 } else {
    // 阻塞代码
     holder = takeLast();
 }

这里基本上已经猜到大概了,原因就是我没有配置maxWait这个参数,导致的!!!

改动前:

spring:
  datasource:
    type: com.alibaba.druid.pool.DruidDataSource
    driver-class-name: com.mysql.jdbc.Driver 
    url: jdbc:mysql://xxxxxx/db?characterEncoding=UTF-8&connectTimeout=60000&socketTimeout=60000
    username: xxxx
    password: xxxx

改动后:

spring:
  datasource:
    type: com.alibaba.druid.pool.DruidDataSource
    driver-class-name: com.mysql.jdbc.Driver
    max-wait: 60000   # 最重要的参数
    time-between-eviction-runs-millis: 60000
    initial-size: 20
    min-idle: 10
    max-active: 20
    min-evictable-idle-time-millis: 600000
    max-evictable-idle-time-millis: 900000
    test-on-borrow: false
    test-on-return: false
    test-while-idle: true
    keep-alive: true
    url: jdbc:mysql://xxxxxx/db?characterEncoding=UTF-8&connectTimeout=60000&socketTimeout=60000
    username: xxxx
    password: xxxx

另外配置文件上可以看看有没有加上这个让参数生效:

@ConfigurationProperties(prefix = "spring.datasource")
@Bean
public DruidDataSource druidDataSource() {
    return new DruidDataSource();
}

嗯,年轻人,一时大意,没有闪~ 排查好久一身汗

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章