oracle ORA-32701 hang分析（一）

原創

2018-09-03 22:11

今日對某局方的數據庫進行巡檢，發現alert.log日誌裏面有大量的ORA-32701: Possible hangs up to hang ID=57 detected報錯，完整的日誌報錯如下：
Sun Dec 13 01:08:12 2015
Errors in file /oracle/app/oracle/diag/rdbms/huibuy/huibuy1/trace/huibuy1_dia0_80848.trc (incident=200081):
ORA-32701: Possible hangs up to hang ID=57 detected
Incident details in: /oracle/app/oracle/diag/rdbms/huibuy/huibuy1/incident/incdir_200081/huibuy1_dia0_80848_i200081.trc
DIA0 requesting termination of session sid:5217 with serial # 33617 (ospid:41409) on instance 2
due to a GLOBAL, HIGH confidence hang with ID=57.
Hang Resolution Reason: Although the number of affected sessions did not
justify automatic hang resolution initially, this previously ignored
hang was automatically resolved.
DIA0: Examine the alert log on instance 2 for session termination status of hang with ID=57.

進一步跟蹤日誌：
/oracle/app/oracle/diag/rdbms/huibuy/huibuy1/trace/huibuy1_dia0_80848.trc，
發現：
Incident details in: /oracle/app/oracle/diag/rdbms/huibuy/huibuy1/incident/incdir_200081/huibuy1_dia0_80848_i200081.trc
inst# SessId Ser# OSPID PrcNm Event

  1   6610  9703     89862  M000 enq: WF - contention
  2   5217 33617     41409  M000 not in wait

M000進程出現等待，根據運維人員的反饋，出現hang的時間很短，業務側反饋，數據庫主機只能夠承受1000左右的連接，256G內存，64CORE的X86性能怎麼會如此低下？還會導致庫經常出現hang？感覺非常不科學。並且1000左右的連接就消耗完了整個主機內存。基本定位出，用戶連接不合理導致消耗大量的內存，M000等待內存分配從而出現hang。

附上完整的數據庫hang分析步驟：

收集systemstate dump:

Oracle$ sqlplus -prelim / as sysdba 
SQL>oradebug setmypid 
SQL>oradebug unlimit 
SQL>oradebug –g all dump systemstate 266 
Wait for 30 seconds 
SQL>oradebug -g all dump systemstate 266 
Wait for 30 seconds 
SQL>oradebug -g all dump systemstate 266 
SQL>oradebug tracefile_name 顯示trace file name

hanganalyze:

SQL>sqlplus -prelim / as sysdba 
SQL>oradebug setmypid 
SQL>oradebug unlimit; 
SQL>oradebug –g all hanganalyze 3 
Wait for 30 seconds 
SQL>oradebug –g all hanganalyze 3 
Wait for 30 seconds 
SQL>oradebug –g all hanganalyze 3 
SQL>oradebug tracefile_name 顯示trace file name

這個hang只是一個表象，真實的原因還是內存的分配，結合awr報告，可以看到兩個節點之間有大量的gc，還有當前x86主機的Pagetables佔用了將近25GB。所以一個整體的優化思路是：
一、優化gc
二、優化cache buffer chain和lantch
三、優化內存，採用Hugepages。

具體優化手段見後續的（二）。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

oracle ORA-32701 hang分析（一）

oracle 11.2.0.4 RAC 安全停庫

數據文件壞塊處理

oracle 11g 鎖處理

oracle sys.AUDSES$序列

oracle ORA-32701 hang分析（二）---hugepage優化

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結