記一次oracle宕機事件


這周又出了一次生產事件,發生在另一個運營了近四年的雲環境中,受影響客戶較多,好在影響時間不長,較快的恢復了生產。具體的排查過程就不說了,在這裏先做一下排查記錄。

服務器message日誌

根據此日誌文件可以查看oracle是因爲什麼宕掉的,這次事件通過此文件發現,是因爲free swap 爲0KB 引發了系統主動kill了oracle進程

[root@57373ded4b19 log]# more /var/log/messages 

應用日誌

app日誌

查詢DB宕機時間段的應用狀態

web-apache日誌

統計應用的交易請求量

oracle日誌

alert*.log 是oracle的警告日誌文件,能夠看出來出問題的時候oracle在做什麼,是因爲什麼引發的問題產生

trace 日誌

[oracle@57373ded4b19 trace]$ sqlplus / as sysdba

SQL>  show parameter dump

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
background_core_dump                 string      partial
background_dump_dest                 string      /home/oracle/app/oracle/diag/rdbms/helowin/helowin/trace
core_dump_dest                       string      /home/oracle/app/oracle/diag/rdbms/helowin/helowin/cdump
max_dump_file_size                   string      unlimited
shadow_core_dump                     string      partial
user_dump_dest                       string      /home/oracle/app/oracle/diag/rdbms/helowin/helowin/trace
SQL> select * from v$diag_info;

   INST_ID NAME                                                             VALUE
---------- ---------------------------------------------------------------- -------------------------------------------------------------------------------------
         1 Diag Enabled                                                     TRUE
         1 ADR Base                                                         /home/oracle/app/oracle
         1 ADR Home                                                         /home/oracle/app/oracle/diag/rdbms/helowin/helowin
         1 Diag Trace                                                       /home/oracle/app/oracle/diag/rdbms/helowin/helowin/trace
         1 Diag Alert                                                       /home/oracle/app/oracle/diag/rdbms/helowin/helowin/alert
         1 Diag Incident                                                    /home/oracle/app/oracle/diag/rdbms/helowin/helowin/incident
         1 Diag Cdump                                                       /home/oracle/app/oracle/diag/rdbms/helowin/helowin/cdump
         1 Health Monitor                                                   /home/oracle/app/oracle/diag/rdbms/helowin/helowin/hm
         1 Default Trace File                                               /home/oracle/app/oracle/diag/rdbms/helowin/helowin/trace/helowin_ora_25878.trc
         1 Active Problem Count                                             1
         1 Active Incident Count                                            26

11 rows selected.

SQL>
[oracle@57373ded4b19 alert]$ cd /home/oracle/app/oracle/diag/rdbms/helowin/helowin/alert
[oracle@57373ded4b19 alert]$ ls
log.xml
[oracle@57373ded4b19 alert]$ cd /home/oracle/app/oracle/diag/rdbms/helowin/helowin/trace
[oracle@57373ded4b19 trace]$ ls alert_helowin.log 
alert_helowin.log
[oracle@57373ded4b19 trace]$

ASM日誌

[oracle@57373ded4b19 trace]$ sqlplus / as sysasm


SQL> show parameter dump

NAME     TYPE VALUE
------------------------------------ ----------- ------------------------------
background_core_dump     string partial
background_dump_dest     string /home/oracle/app/grid/diag/asm/+asm/+A
SM1/trace
core_dump_dest     string /home/oracle/app/grid/diag/asm/+asm/+A
SM1/cdump
max_dump_file_size     string unlimited
shadow_core_dump     string partial
user_dump_dest     string /home/oracle/app/grid/diag/asm/+asm/+ASM1/trace
SQL> select * from v$diag_info;

   INST_ID NAME    VALUE
---------- ---------------------------------------------------------------- -------------------------------------------------------------------------------------
1 Diag Enabled    TRUE
1 ADR Base    /home/oracle/app/grid
1 ADR Home    /home/oracle/app/grid/diag/asm/+asm/+ASM1
1 Diag Trace    /home/oracle/app/grid/diag/asm/+asm/+ASM1/trace
1 Diag Alert    /home/oracle/app/grid/diag/asm/+asm/+ASM1/alert
1 Diag Incident    /home/oracle/app/grid/diag/asm/+asm/+ASM1/incident
1 Diag Cdump    /home/oracle/app/grid/diag/asm/+asm/+ASM1/cdump
1 Health Monitor    /home/oracle/app/grid/diag/asm/+asm/+ASM1/hm
1 Default Trace File    /home/oracle/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_11192.trc
1 Active Problem Count    0
1 Active Incident Count    0

11 rows selected.

[oracle@57373ded4b19 trace]$ cd /home/oracle/app/grid/diag/asm/+asm/+ASM1/trace
[oracle@57373ded4b19 trace]$ more alert_+ASM1.log

oracle 導出AWR

sqlplus / as sysdba

SQL> @?/rdbms/admin/awrrpt.sql

然後根據提示輸入:

  • 導出文件類型

‘html’ HTML format (default)
‘text’ Text format
‘active-html’ Includes Performance Hub active report

  • 導出的AWR報告天數
  • 根據提示輸入開始和結束時間點的 Snap Id
  • 輸入導出的文件名稱
    即可導出AWR 報告。

每一次生產問題的排查與解決都是從成堆的日誌文件中的不知道多少行的多少字符中篩選那麼一點信息,去比對定位。
*哎,天天腦殼疼 *

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章