最近弄了一個新的應用,專門用來收集一些應用數據作參考以及問題排查定位的,但是上線一段時間之後出現了一個非常非常詭異的問題。
問題描述
收集器採用異步化,啓用了一個獨立的線程專門收集各個服務發送過來的數據進行消費。剛開始上線的時候數據能夠按時進來,但是一個詭異的事情發生了,每隔一段時間該線程不消費了。
這是大概代碼:
while (true) {
try {
// 數據消費邏輯 -> 插入數據庫
} catch (Throwable e) {
logger.error("異步隊列消費失敗", e);
sleep(5);
} finally {
// 防止一直佔用不釋放導致CPU飆高
sleep(5);
}
}
關鍵是沒有任何異常!!!!線程不動了,數據進不去!!!
之前還以爲是異常沒捕獲到,但是Throwable
也不行!
排查思路
分析GC日誌
確實很高,但是和該問題聯繫不上。
會不會是線程掛掉了?
之前通過arthas
的thread
命令定位到的數據,發現線程還在!!!!但是狀態是WAITING
,完了有點知識盲區了,然後看了幾篇文章對該狀態的描述,大意就是需要等待喚醒的意思。我看了看
我的代碼貌似沒有阻塞喚醒的操作啊!!!
然後想看看WAITING線程棧有沒有什麼線索!
jps # 找到應用的pid
jstack [pid]
然後查找線程的名字MysqlMQStoreProcess-consumer-thread
"MysqlMQStoreProcess-consumer-thread" #46 prio=5 os_prio=0 tid=0x00007ff7b1357800 nid=0x2fd3 waiting on condition [0x00007ff7241c3000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000732ef9c48> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at com.alibaba.druid.pool.DruidDataSource.takeLast(DruidDataSource.java:1899)
at com.alibaba.druid.pool.DruidDataSource.getConnectionInternal(DruidDataSource.java:1460)
at com.alibaba.druid.pool.DruidDataSource.getConnectionDirect(DruidDataSource.java:1255)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1235)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1225)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:90)
at org.springframework.jdbc.datasource.DataSourceUtils.fetchConnection(DataSourceUtils.java:157)
at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:115)
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:78)
at org.mybatis.spring.transaction.SpringManagedTransaction.openConnection(SpringManagedTransaction.java:80)
at org.mybatis.spring.transaction.SpringManagedTransaction.getConnection(SpringManagedTransaction.java:67)
at org.apache.ibatis.executor.BaseExecutor.getConnection(BaseExecutor.java:336)
at com.baomidou.mybatisplus.core.executor.MybatisSimpleExecutor.prepareStatement(MybatisSimpleExecutor.java:93)
at com.baomidou.mybatisplus.core.executor.MybatisSimpleExecutor.doUpdate(MybatisSimpleExecutor.java:53)
at org.apache.ibatis.executor.BaseExecutor.update(BaseExecutor.java:117)
at org.apache.ibatis.executor.CachingExecutor.update(CachingExecutor.java:76)
at org.apache.ibatis.session.defaults.DefaultSqlSession.update(DefaultSqlSession.java:197)
at org.apache.ibatis.session.defaults.DefaultSqlSession.insert(DefaultSqlSession.java:184)
at sun.reflect.GeneratedMethodAccessor95.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:426)
at com.sun.proxy.$Proxy66.insert(Unknown Source)
at org.mybatis.spring.SqlSessionTemplate.insert(SqlSessionTemplate.java:271)
at com.baomidou.mybatisplus.core.override.MybatisMapperMethod.execute(MybatisMapperMethod.java:58)
at com.baomidou.mybatisplus.core.override.MybatisMapperProxy.invoke(MybatisMapperProxy.java:61)
at com.sun.proxy.$Proxy67.batchPartitionInsert(Unknown Source)
at com.jay.monitor.data.server.store.mysql.MysqlMQStoreProcess.speedUpConsumerList(MysqlMQStoreProcess.java:80)
at com.jay.monitor.data.server.store.AbstractStoreProcess.speedUpConsumer(AbstractStoreProcess.java:131)
at com.jay.monitor.data.server.store.AbstractStoreProcess.run(AbstractStoreProcess.java:157)
at java.lang.Thread.run(Thread.java:748)
然後根據堆棧縷了一下思路,大概就是獲取連接的時候獲取不到,一直阻塞在那不動了!!!
然後要分析爲什麼會阻塞
if (maxWait > 0) {
holder = pollLast(nanos);
} else {
// 阻塞代碼
holder = takeLast();
}
這裏基本上已經猜到大概了,原因就是我沒有配置maxWait
這個參數,導致的!!!
改動前:
spring:
datasource:
type: com.alibaba.druid.pool.DruidDataSource
driver-class-name: com.mysql.jdbc.Driver
url: jdbc:mysql://xxxxxx/db?characterEncoding=UTF-8&connectTimeout=60000&socketTimeout=60000
username: xxxx
password: xxxx
改動後:
spring:
datasource:
type: com.alibaba.druid.pool.DruidDataSource
driver-class-name: com.mysql.jdbc.Driver
max-wait: 60000 # 最重要的參數
time-between-eviction-runs-millis: 60000
initial-size: 20
min-idle: 10
max-active: 20
min-evictable-idle-time-millis: 600000
max-evictable-idle-time-millis: 900000
test-on-borrow: false
test-on-return: false
test-while-idle: true
keep-alive: true
url: jdbc:mysql://xxxxxx/db?characterEncoding=UTF-8&connectTimeout=60000&socketTimeout=60000
username: xxxx
password: xxxx
另外配置文件上可以看看有沒有加上這個讓參數生效:
@ConfigurationProperties(prefix = "spring.datasource")
@Bean
public DruidDataSource druidDataSource() {
return new DruidDataSource();
}
嗯,年輕人,一時大意,沒有閃~ 排查好久一身汗