純乾貨:死循環線程居然不執行了?

最近弄了一個新的應用,專門用來收集一些應用數據作參考以及問題排查定位的,但是上線一段時間之後出現了一個非常非常詭異的問題。

問題描述

收集器採用異步化,啓用了一個獨立的線程專門收集各個服務發送過來的數據進行消費。剛開始上線的時候數據能夠按時進來,但是一個詭異的事情發生了,每隔一段時間該線程不消費了。

這是大概代碼:

while (true) {
    try {
        // 數據消費邏輯 -> 插入數據庫
    } catch (Throwable e) {
        logger.error("異步隊列消費失敗", e);
        sleep(5);
    } finally {
        // 防止一直佔用不釋放導致CPU飆高
        sleep(5);
    }
}

關鍵是沒有任何異常!!!!線程不動了,數據進不去!!!

之前還以爲是異常沒捕獲到,但是Throwable也不行!

排查思路

分析GC日誌

確實很高,但是和該問題聯繫不上。

會不會是線程掛掉了?

之前通過arthasthread命令定位到的數據,發現線程還在!!!!但是狀態是WAITING
,完了有點知識盲區了,然後看了幾篇文章對該狀態的描述,大意就是需要等待喚醒的意思。我看了看

我的代碼貌似沒有阻塞喚醒的操作啊!!!

然後想看看WAITING線程棧有沒有什麼線索!

jps # 找到應用的pid
jstack [pid]

然後查找線程的名字MysqlMQStoreProcess-consumer-thread

"MysqlMQStoreProcess-consumer-thread" #46 prio=5 os_prio=0 tid=0x00007ff7b1357800 nid=0x2fd3 waiting on condition [0x00007ff7241c3000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x0000000732ef9c48> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at com.alibaba.druid.pool.DruidDataSource.takeLast(DruidDataSource.java:1899)
    at com.alibaba.druid.pool.DruidDataSource.getConnectionInternal(DruidDataSource.java:1460)
    at com.alibaba.druid.pool.DruidDataSource.getConnectionDirect(DruidDataSource.java:1255)
    at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1235)
    at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1225)
    at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:90)
    at org.springframework.jdbc.datasource.DataSourceUtils.fetchConnection(DataSourceUtils.java:157)
    at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:115)
    at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:78)
    at org.mybatis.spring.transaction.SpringManagedTransaction.openConnection(SpringManagedTransaction.java:80)
    at org.mybatis.spring.transaction.SpringManagedTransaction.getConnection(SpringManagedTransaction.java:67)
    at org.apache.ibatis.executor.BaseExecutor.getConnection(BaseExecutor.java:336)
    at com.baomidou.mybatisplus.core.executor.MybatisSimpleExecutor.prepareStatement(MybatisSimpleExecutor.java:93)
    at com.baomidou.mybatisplus.core.executor.MybatisSimpleExecutor.doUpdate(MybatisSimpleExecutor.java:53)
    at org.apache.ibatis.executor.BaseExecutor.update(BaseExecutor.java:117)
    at org.apache.ibatis.executor.CachingExecutor.update(CachingExecutor.java:76)
    at org.apache.ibatis.session.defaults.DefaultSqlSession.update(DefaultSqlSession.java:197)
    at org.apache.ibatis.session.defaults.DefaultSqlSession.insert(DefaultSqlSession.java:184)
    at sun.reflect.GeneratedMethodAccessor95.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:426)
    at com.sun.proxy.$Proxy66.insert(Unknown Source)
    at org.mybatis.spring.SqlSessionTemplate.insert(SqlSessionTemplate.java:271)
    at com.baomidou.mybatisplus.core.override.MybatisMapperMethod.execute(MybatisMapperMethod.java:58)
    at com.baomidou.mybatisplus.core.override.MybatisMapperProxy.invoke(MybatisMapperProxy.java:61)
    at com.sun.proxy.$Proxy67.batchPartitionInsert(Unknown Source)
    at com.jay.monitor.data.server.store.mysql.MysqlMQStoreProcess.speedUpConsumerList(MysqlMQStoreProcess.java:80)
    at com.jay.monitor.data.server.store.AbstractStoreProcess.speedUpConsumer(AbstractStoreProcess.java:131)
    at com.jay.monitor.data.server.store.AbstractStoreProcess.run(AbstractStoreProcess.java:157)
    at java.lang.Thread.run(Thread.java:748)

然後根據堆棧縷了一下思路,大概就是獲取連接的時候獲取不到,一直阻塞在那不動了!!!

然後要分析爲什麼會阻塞

if (maxWait > 0) {
     holder = pollLast(nanos);
 } else {
    // 阻塞代碼
     holder = takeLast();
 }

這裏基本上已經猜到大概了,原因就是我沒有配置maxWait這個參數,導致的!!!

改動前:

spring:
  datasource:
    type: com.alibaba.druid.pool.DruidDataSource
    driver-class-name: com.mysql.jdbc.Driver 
    url: jdbc:mysql://xxxxxx/db?characterEncoding=UTF-8&connectTimeout=60000&socketTimeout=60000
    username: xxxx
    password: xxxx

改動後:

spring:
  datasource:
    type: com.alibaba.druid.pool.DruidDataSource
    driver-class-name: com.mysql.jdbc.Driver
    max-wait: 60000   # 最重要的參數
    time-between-eviction-runs-millis: 60000
    initial-size: 20
    min-idle: 10
    max-active: 20
    min-evictable-idle-time-millis: 600000
    max-evictable-idle-time-millis: 900000
    test-on-borrow: false
    test-on-return: false
    test-while-idle: true
    keep-alive: true
    url: jdbc:mysql://xxxxxx/db?characterEncoding=UTF-8&connectTimeout=60000&socketTimeout=60000
    username: xxxx
    password: xxxx

另外配置文件上可以看看有沒有加上這個讓參數生效:

@ConfigurationProperties(prefix = "spring.datasource")
@Bean
public DruidDataSource druidDataSource() {
    return new DruidDataSource();
}

嗯,年輕人,一時大意,沒有閃~ 排查好久一身汗

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章