weblogic節點告警,系統不能訪問診斷

1.現場實施同事反饋節點告警,單點訪問節點白屏。
2.從nohup日誌中看到有stuck線程,要現場做了jstack回來。
<Jun 23, 2020 11:40:51 PM CST> <[STUCK] ExecuteThread: ‘2’ for queue: ‘weblogic.kernel.Default (self-tuning)’ has been busy for “613” seconds working on the request "weblogic.servlet.internal.ServletRequestImpl@4c96929a[
POST /web/gg/dwr/exec/ggDwrUtils.getggItemVO.dwr HTTP/1.1
Accept: /

從jstack中可以看到,業務持有了這個鎖一直不釋放,堵塞了weblogic後臺線程。
“[STUCK] ExecuteThread: ‘2’ for queue: ‘weblogic.kernel.Default (self-tuning)’” daemon prio=10 tid=0x00007f7e94001000 nid=0x68e6 runnable [0x00007f7e6f4f2000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at oracle.net.ns.Packet.receive(Packet.java:293)
at oracle.net.ns.DataPacket.receive(DataPacket.java:104)
at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:315)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:260)

- locked <0x0000000702a29980> (a oracle.jdbc.driver.T4CConnection)

“[STUCK] ExecuteThread: ‘10’ for queue: ‘weblogic.kernel.Default (self-tuning)’” daemon prio=10 tid=0x00007f7e98001800 nid=0x17dd2 waiting for monitor entry [0x00007f7cd7ffd000]
java.lang.Thread.State: BLOCKED (on object monitor)
at oracle.jdbc.driver.OracleStatement.close(OracleStatement.java:1545)
- waiting to lock <0x0000000702a29980> (a oracle.jdbc.driver.T4CConnection)
at oracle.jdbc.driver.OracleStatementWrapper.close(OracleStatementWrapper.java:80)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.close(OraclePreparedStatementWrapper.java:78)
at weblogic.jdbc.common.internal.ConnectionEnv.initializeTest(ConnectionEnv.java:940)
at weblogic.jdbc.common.internal.ConnectionEnv.destroyForFlush(ConnectionEnv.java:529)
- locked <0x0000000702a29858> (a weblogic.jdbc.common.internal.ConnectionEnv)
at weblogic.jdbc.common.internal.ConnectionEnv.destroy(ConnectionEnv.java:507)
- locked <0x0000000702a29858> (a weblogic.jdbc.common.internal.ConnectionEnv)
at weblogic.common.resourcepool.ResourcePoolImpl.destroyResource(ResourcePoolImpl.java:1802)

3.結合jstack和nohup定位到功能/web/gg/dwr/exec/ggDwrUtils.getggItemVO.dwr,發現是一個簡單的功能,只是有一些循環操作數據庫的情況,都是根據主鍵訪問。

4.陷入僵局,再次看nohup日誌,發現連接池被關閉。
<Jun 23, 2020 11:32:41 PM CST> <Test “SELECT 1 FROM DUAL” set up for pool “ggDataSource” failed with exception: “java.sql.SQLRecoverableException: IO Error: Connection reset”.>
<Jun 23, 2020 11:32:41 PM CST> <Test “SELECT 1 FROM DUAL” set up for pool “ggDataSource” failed with exception: “java.sql.SQLRecoverableException: IO Error: Connection reset”.>
[ERROR] 2020-06-23 23:33:06 com.gg.executor.DBExecutorProvider (DBExecutorProvider.java:58)
weblogic.jdbc.extensions.PoolDisabledSQLException: weblogic.common.resourcepool.ResourceDisabledException: Pool ggDataSource is Suspended, cannot allocate resources to applications…

在2020 11:30:38的時候用戶點了/web/gg/dwr/exec/ggDwrUtils.getggItemVO.dwr這個功能,從 stuck從可以看到has been busy for “613” seconds working on the ,這個請求執行了613s。

在2020 11:32:41 weblogic檢查到連接池掛了,可能之間就掛了,用戶剛在點IssueWorkorderDwrUtils.getRequirementItemVO.dwr這個功能的時候,所以在jstack裏面看到一個根據主鍵查詢的功能都hang住,就跟soa接口有點類似,因爲操作數據庫沒有設置超時時間,所以一直hang住。

5.這個功能算是躺着中槍,爲什麼連接池會IO Error: Connection reset,只能看Oracle的alter日誌。

6.分析alert日誌,發現11:30重啓了,這就是導致連接池會IO Error: Connection reset的原因。

7.通知現場做好協同,數據庫重啓之後,中間件也要重啓一下。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章