weblogic节点告警,系统不能访问诊断

1.现场实施同事反馈节点告警,单点访问节点白屏。
2.从nohup日志中看到有stuck线程,要现场做了jstack回来。
<Jun 23, 2020 11:40:51 PM CST> <[STUCK] ExecuteThread: ‘2’ for queue: ‘weblogic.kernel.Default (self-tuning)’ has been busy for “613” seconds working on the request "weblogic.servlet.internal.ServletRequestImpl@4c96929a[
POST /web/gg/dwr/exec/ggDwrUtils.getggItemVO.dwr HTTP/1.1
Accept: /

从jstack中可以看到,业务持有了这个锁一直不释放,堵塞了weblogic后台线程。
“[STUCK] ExecuteThread: ‘2’ for queue: ‘weblogic.kernel.Default (self-tuning)’” daemon prio=10 tid=0x00007f7e94001000 nid=0x68e6 runnable [0x00007f7e6f4f2000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at oracle.net.ns.Packet.receive(Packet.java:293)
at oracle.net.ns.DataPacket.receive(DataPacket.java:104)
at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:315)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:260)

- locked <0x0000000702a29980> (a oracle.jdbc.driver.T4CConnection)

“[STUCK] ExecuteThread: ‘10’ for queue: ‘weblogic.kernel.Default (self-tuning)’” daemon prio=10 tid=0x00007f7e98001800 nid=0x17dd2 waiting for monitor entry [0x00007f7cd7ffd000]
java.lang.Thread.State: BLOCKED (on object monitor)
at oracle.jdbc.driver.OracleStatement.close(OracleStatement.java:1545)
- waiting to lock <0x0000000702a29980> (a oracle.jdbc.driver.T4CConnection)
at oracle.jdbc.driver.OracleStatementWrapper.close(OracleStatementWrapper.java:80)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.close(OraclePreparedStatementWrapper.java:78)
at weblogic.jdbc.common.internal.ConnectionEnv.initializeTest(ConnectionEnv.java:940)
at weblogic.jdbc.common.internal.ConnectionEnv.destroyForFlush(ConnectionEnv.java:529)
- locked <0x0000000702a29858> (a weblogic.jdbc.common.internal.ConnectionEnv)
at weblogic.jdbc.common.internal.ConnectionEnv.destroy(ConnectionEnv.java:507)
- locked <0x0000000702a29858> (a weblogic.jdbc.common.internal.ConnectionEnv)
at weblogic.common.resourcepool.ResourcePoolImpl.destroyResource(ResourcePoolImpl.java:1802)

3.结合jstack和nohup定位到功能/web/gg/dwr/exec/ggDwrUtils.getggItemVO.dwr,发现是一个简单的功能,只是有一些循环操作数据库的情况,都是根据主键访问。

4.陷入僵局,再次看nohup日志,发现连接池被关闭。
<Jun 23, 2020 11:32:41 PM CST> <Test “SELECT 1 FROM DUAL” set up for pool “ggDataSource” failed with exception: “java.sql.SQLRecoverableException: IO Error: Connection reset”.>
<Jun 23, 2020 11:32:41 PM CST> <Test “SELECT 1 FROM DUAL” set up for pool “ggDataSource” failed with exception: “java.sql.SQLRecoverableException: IO Error: Connection reset”.>
[ERROR] 2020-06-23 23:33:06 com.gg.executor.DBExecutorProvider (DBExecutorProvider.java:58)
weblogic.jdbc.extensions.PoolDisabledSQLException: weblogic.common.resourcepool.ResourceDisabledException: Pool ggDataSource is Suspended, cannot allocate resources to applications…

在2020 11:30:38的时候用户点了/web/gg/dwr/exec/ggDwrUtils.getggItemVO.dwr这个功能,从 stuck从可以看到has been busy for “613” seconds working on the ,这个请求执行了613s。

在2020 11:32:41 weblogic检查到连接池挂了,可能之间就挂了,用户刚在点IssueWorkorderDwrUtils.getRequirementItemVO.dwr这个功能的时候,所以在jstack里面看到一个根据主键查询的功能都hang住,就跟soa接口有点类似,因为操作数据库没有设置超时时间,所以一直hang住。

5.这个功能算是躺着中枪,为什么连接池会IO Error: Connection reset,只能看Oracle的alter日志。

6.分析alert日志,发现11:30重启了,这就是导致连接池会IO Error: Connection reset的原因。

7.通知现场做好协同,数据库重启之后,中间件也要重启一下。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章