今天一大早來單位一看,兩個asm磁盤超過disk_repair_time被幹掉了
SQL> select group_number,disk_number,STATE,PATH,NAME,failgroup from v$asm_disk;
GROUP_NUMBER DISK_NUMBER STATE PATH NAME FAILGROUP
------------ ----------- -------- ---------------------------------------- ------------------------------ ------------------------------
0 0 NORMAL /dev/mapper/mpathg
0 1 NORMAL /dev/mapper/mpathf
2 1 NORMAL OCR_0001 OCR_0001
1 1 FORCING _DROPPED_0001_DATA DATA_0001
1 0 FORCING _DROPPED_0000_DATA DATA_0000
0 2 NORMAL /dev/mapper/mpathcp2
0 3 NORMAL /dev/mapper/mpathdp2
0 8 NORMAL /dev/mapper/mpathe
0 9 NORMAL /dev/mapper/mpathc
0 10 NORMAL /dev/mapper/mpathd
0 11 NORMAL /dev/mapper/mpathb
GROUP_NUMBER DISK_NUMBER STATE PATH NAME FAILGROUP
------------ ----------- -------- ---------------------------------------- ------------------------------ ------------------------------
0 12 NORMAL /dev/mapper/vg_rac01-lv_swap
2 2 NORMAL /dev/mapper/mpathdp1 OCR_0002 OCR_0002
1 2 NORMAL /dev/mapper/mpathbp2 DATA_0002 DATA_0002
2 0 NORMAL /dev/mapper/mpathbp1 OCR_0000 OCR_0000
2 3 NORMAL /dev/mapper/mpathcp1 OCR_0003 OCR_0003
幸好我的磁盤組是high冗餘的。
試着online一下不管用
SQL> ALTER DISKGROUP DATA ONLINE DISKS IN FAILGROUP DATA_0001 NOWAIT;
ALTER DISKGROUP DATA ONLINE DISKS IN FAILGROUP DATA_0001 NOWAIT
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15281: not all specified disks were brought ONLINE
ORA-15284: ASM terminated ALTER DISKGROUP ONLINE
查了一下v$asm_disk 的官方文檔說明
FORCING - Disk is being removed from the disk group without attempting to offload its data. The data will be recovered from redundant copies, where possible.
難道我這個情況是impossible?
網上查了一些資料,並沒有找到有用的方法,只能使用最簡單粗暴的dd了!
[root@rac01 ~]# dd if='/dev/zero' of='/dev/mapper/mpathdp2' bs=20000 count=10000;
[root@rac01 ~]# dd if='/dev/zero' of='/dev/mapper/mpathcp2' bs=20000 count=10000;
然後加回磁盤
SQL> alter diskgroup data add FAILGROUP DATA_0000 disk '/dev/mapper/mpathdp2' name DATA_0000 FAILGROUP DATA_0001 DISK '/dev/mapper/mpathcp2' name DATA_0001;
Diskgroup altered.
此時已完成了加回,但是_DROPPED開頭的磁盤仍然存在,感覺就像痔瘡。。
SQL> select GROUP_NUMBER,STATE,name,path,REPAIR_TIMER from v$asm_disk;
GROUP_NUMBER STATE NAME PATH REPAIR_TIMER
------------ -------- ------------------------------ ---------------------------------------- ------------
0 NORMAL /dev/mapper/mpathg 0
0 NORMAL /dev/mapper/mpathf 0
2 NORMAL OCR_0001 27630
1 FORCING _DROPPED_0001_DATA 0
1 FORCING _DROPPED_0000_DATA 0
0 NORMAL /dev/mapper/vg_rac01-lv_swap 0
0 NORMAL /dev/mapper/mpathb 0
0 NORMAL /dev/mapper/mpathe 0
0 NORMAL /dev/mapper/mpathc 0
0 NORMAL /dev/mapper/mpathd 0
2 NORMAL OCR_0002 /dev/mapper/mpathdp1 0
GROUP_NUMBER STATE NAME PATH REPAIR_TIMER
------------ -------- ------------------------------ ---------------------------------------- ------------
1 NORMAL DATA_0002 /dev/mapper/mpathbp2 0
2 NORMAL OCR_0000 /dev/mapper/mpathbp1 0
1 NORMAL DATA_0001 /dev/mapper/mpathcp2 0
1 NORMAL DATA_0000 /dev/mapper/mpathdp2 0
2 NORMAL OCR_0003 /dev/mapper/mpathcp1 0
查看 v$asm_operation
SQL> select * from v$asm_operation;
GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK EST_RATE
------------ ----- ---- ---------- ---------- ---------- ---------- ----------
EST_MINUTES ERROR_CODE
----------- --------------------------------------------
1 REBAL RUN 1 1 4101 246529 15573
15
待operation完成後再查詢v$asm_disk
SQL> select GROUP_NUMBER,STATE,name,path,failgroup,REPAIR_TIMER from v$asm_disk where group_number=1;
GROUP_NUMBER STATE NAME PATH FAILGROUP REPAIR_TIMER
------------ -------- ------------------------------ ---------------------------------------- ------------------------------ ------------
1 NORMAL DATA_0002 /dev/mapper/mpathbp2 DATA_0002 0
1 NORMAL DATA_0001 /dev/mapper/mpathcp2 DATA_0001 0
1 NORMAL DATA_0000 /dev/mapper/mpathdp2 DATA_0000 0
_drop 開頭的磁盤已經被oracle標記爲不可用從v$asm_disk幹掉了。
那麼問題來了:
1:除了dd還有木有別的方法?
2:大家的disk_repair_time 一般設置多久?感覺要是你盤壞了,24小時都不一定夠換的。
3:FORCING - Disk is being removed from the disk group without attempting to offload its data. The data will be recovered from redundant copies, where possible. 這句話到底怎麼理解? 啥叫where possible