最近一個朋友數據庫異常了,諮詢我,通過分析日誌發現對方人員根本不懂aix中的裸設備和Oracle數據庫然後就直接使用OEM創建新表空間,導致了數據庫crash而且不能正常啓動
hread 1 advanced to log sequence 4395
Current log
# 1 seq# 4395 mem# 0: /dev/rorcl_redo01
Thu Jun 12 19:28:38 2014
/* OracleOEM */ CREATE SMALLFILE TABLESPACE
"XIFENFEI"
LOGGING DATAFILE
'/dev/orcl_redo04'
SIZE 2000M EXTENT MANAGEMENT
LOCAL SEGMENT SPACE MANAGEMENT AUTO
ORA-1119 signalled during: /* OracleOEM */ CREATE SMALLFILE TABLESPACE
"XIFENFEI"
LOGGING DATAFILE
'/dev/orcl_redo04'
SIZE 2000M EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO ...
Thu Jun 12 19:36:23 2014
/* OracleOEM */ CREATE SMALLFILE TABLESPACE
"XIFENFEI"
LOGGING DATAFILE
'/dev/orcl_redo03'
SIZE 2000M EXTENT MANAGEMENT
LOCAL SEGMENT SPACE MANAGEMENT AUTO
Thu Jun 12 19:43:56 2014
ORA-604 signalled during: /* OracleOEM */ CREATE SMALLFILE TABLESPACE
"XIFENFEI"
LOGGING DATAFILE
'/dev/orcl_redo03'
SIZE 2000M EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO ...
Thu Jun 12 19:48:11 2014
/* OracleOEM */ CREATE SMALLFILE TABLESPACE
"XIFENFEI"
LOGGING DATAFILE
'/dev/rorcl_redo03'
SIZE 2000M EXTENT
MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO
Thu Jun 12 19:48:11 2014
ORA-1537 signalled during: /* OracleOEM */ CREATE SMALLFILE TABLESPACE
"XIFENFEI"
LOGGING DATAFILE
'/dev/rorcl_redo03'
SIZE 2000M EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO ...
Thu Jun 12 19:48:20 2014
/* OracleOEM */ CREATE SMALLFILE TABLESPACE
"XIFENFEI"
LOGGING DATAFILE
'/dev/rorcl_redo04'
SIZE 2000M EXTENT
MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO
ORA-1537 signalled during: /* OracleOEM */ CREATE SMALLFILE TABLESPACE
"XIFENFEI"
LOGGING DATAFILE
'/dev/rorcl_redo04'
SIZE 2000M EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO ...
Fri Jun 13 00:50:37 2014
Trace dumping is performing
id
=[cdmp_20140613005032]
Fri Jun 13 00:50:40 2014
Reconfiguration started (old inc 4, new inc 6)
List of nodes:
0
Global Resource Directory frozen
* dead instance detected - domain 0 invalid = TRUE
…………
Fri Jun 13 00:50:40 2014
Beginning instance recovery of 1 threads
Reconfiguration complete
Fri Jun 13 00:50:41 2014
parallel recovery started with 7 processes
Fri Jun 13 00:50:43 2014
Started redo scan
Fri Jun 13 00:50:43 2014
Errors
in
file
/oracle/admin/orcl/bdump/orcl1_smon_213438
.trc:
ORA-00316: log 3 of thread 2,
type
0
in
header is not log
file
ORA-00312: online log 3 thread 2:
'/dev/rorcl_redo03'
Fri Jun 13 00:50:43 2014
Errors
in
file
/oracle/admin/orcl/bdump/orcl1_smon_213438
.trc:
ORA-00316: log 3 of thread 2,
type
0
in
header is not log
file
ORA-00312: online log 3 thread 2:
'/dev/rorcl_redo03'
SMON: terminating instance due to error 316
Fri Jun 13 00:50:43 2014
Errors
in
file
/oracle/admin/orcl/bdump/orcl1_lgwr_335980
.trc:
ORA-00316: log of thread ,
type
in
header is not log
file
Instance terminated by SMON, pid = 213438
從這裏可以看出來,在使用OEM創建表空間的過程中犯了兩個錯誤
1. 未分清楚aix的塊設備和字符設備的命名方式
2. 對於2節點正在使用的current redo作爲不適用設備當作未使用設備來創建新表空間
由於創建表空間的使用了錯誤的文件和錯誤的設備,導致2節點的當前redo(/dev/rorcl_redo03)被損壞(因爲先讀redo header,所以數據庫中優先反饋出來的是ORA-00316: log of thread , type in header is not log file).從而導致數據庫2節點先crash,然後節點1進行實例恢復,但是由於2節點的current redo已經損壞,導致實例恢復無法完成,從而兩個節點都crash.因爲是rac的一個節點的當前redo損壞,數據庫無法正常.
如果有備份該數據庫可以使用備份還原進行恢復,如果沒有備份只能使用強制拉庫的方法搶救數據.希望不要發生一個大的數據丟失悲劇
介紹這個案例希望給大家以警示:對數據庫的裸設備操作請謹慎,不清楚切不可亂操作,否則後果嚴重
更多精彩Oracle內容 請關注我: