好的,又一個庫掛了
查看集羣資源
[grid@jcsjsjk02 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCHDG.dg
ONLINE ONLINE lzl01
OFFLINE OFFLINE lzl02
ora.DATA2.dg
ONLINE ONLINE lzl01
ONLINE ONLINE lzl02
ora.DATADG.dg
ONLINE ONLINE lzl01
ONLINE ONLINE lzl02
ora.LISTENER.lsnr
ONLINE ONLINE lzl01
ONLINE ONLINE lzl02
ora.OCRDG.dg
ONLINE ONLINE lzl01
ONLINE ONLINE lzl02
ora.asm
ONLINE ONLINE lzl01 Started
ONLINE ONLINE lzl02 Started
ora.gsd
OFFLINE OFFLINE lzl01
OFFLINE OFFLINE lzl02
ora.net1.network
ONLINE ONLINE lzl01
ONLINE ONLINE lzl02
ora.ons
ONLINE ONLINE lzl01
ONLINE ONLINE lzl02
ora.registry.acfs
ONLINE ONLINE lzl01
ONLINE ONLINE lzl02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE lzl01
ora.cvu
1 ONLINE ONLINE lzl01
ora.epmjc.db
1 OFFLINE OFFLINE Corrupted Controlfi
le
2 OFFLINE OFFLINE Corrupted Controlfi
le
ora.lzl01.vip
1 ONLINE ONLINE lzl01
ora.lzl02.vip
1 ONLINE ONLINE lzl02
ora.oc4j
1 ONLINE ONLINE lzl01
ora.scan1.vip
1 ONLINE ONLINE lzl01
從集羣資源看出, 數據庫和磁盤狀態有問題
查看multipath -ll
| `- 7:0:2:5 sdg 8:96 active ready running
|-+- policy='round-robin 0' prio=1 status=enabled
| `- 8:0:2:5 sdpg 130:352 active ready running
|-+- policy='round-robin 0' prio=1 status=enabled
| `- 0:0:2:5 sdaay 133:608 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
`- 1:0:2:5 sdahy 128:960 active ready running
mpathf (36005076305ffd7840000000000002107) dm-9 IBM,2107900
size=700G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
|- 7:0:3:8 sdp 8:240 active ready running
|- 7:0:4:8 sdcz 70:112 active ready running
|- 8:0:0:8 sdip 135:144 active ready running
|- 8:0:1:8 sdlz 69:272 active ready running
|- 0:0:1:8 sdsb 134:496 active ready running
|- 0:0:0:8 sdxr 128:528 active ready running
|- 1:0:0:8 sdabh 133:752 active ready running
`- 1:0:1:8 sdaer 67:880 active ready running
mpathda (36005076305ffd7840000000000002054) dm-30 IBM,2107900
size=700G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
|- 7:0:3:17 sdy 65:128 active ready running
|- 7:0:4:17 sddi 71:0 active ready running
|- 8:0:0:17 sdiy 8:288 active ready running
`- 8:0:1:17 sdmi 69:416 active ready running
mpathfk (3600b342506bf3ebdb842d43fcd0000d9) dm-134 MacroSAN,LU
size=1.1T features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| `- 7:0:5:25 sdha 133:0 active ready running
多路徑有問題,有些是8鏈路有些又是4鏈路
查看磁盤組狀態
SQL> select GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,PATH from v$asm_disk order by 1,2;
select GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,PATH from v$asm_disk order by 1,2
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kfdskResolveDisk1], [], [], [], [],
[], [], [], [], [], [], []
磁盤組直接報錯
查看message信息
ACT EXTSCYD starting.
Jul 8 15:53:05 lzl02 Oracle GoldenGate Capture for Oracle[185015]: 2019-07-08 15:53:05 ERROR OGG-00664 Oracle GoldenGate Capture for Oracle, extscya.prm: OCI Error beginning session (status = 1034-ORA-01034: ORACLE not available#012ORA-27101: shared memory realm does not exist#012Linux-x86_64 Error: 2: No such file or directory).
查看告警日誌
Mon Jul 08 15:42:41 2019
Errors in file /oracle/app/oracle/diag/rdbms/lzldb/lzldb2/trace/lzldb2_ora_97643.trc:
ORA-00317: file type 47077 in header is not log file
ORA-00334: archived log: '+DATADG/lzldb/onlinelog/group_3_1'
Errors in file /oracle/app/oracle/diag/rdbms/lzldb/lzldb2/trace/lzldb2_ora_28387.trc (incident=797121):
ORA-00600: internal error code, arguments: [2618], [8], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/app/oracle/diag/rdbms/lzldb/lzldb2/incident/incdir_797121/lzldb2_ora_28387_i797121.trc
Errors in file /oracle/app/oracle/diag/rdbms/lzldb/lzldb2/trace/lzldb2_ora_97643.trc (incident=796969):
ORA-00600: internal error code, arguments: [2618], [8], [], [], [], [], [], [], [], [], [], []
Mon Jul 08 15:42:41 2019
Hex dump of (file 32, block 3750338) in trace file /oracle/app/oracle/diag/rdbms/lzldb/lzldb2/trace/lzldb2_ora_112234.trc
Incident details in: /oracle/app/oracle/diag/rdbms/lzldb/lzldb2/incident/incdir_796969/lzldb2_ora_97643_i796969.trc
Corrupt block relative dba: 0x083939c2 (file 32, block 3750338)
Bad header found during buffer read
Data in bad block:
type: 6 format: 2 rdba: 0x05682d42
last change scn: 0x0cf2.e932bbbc seq: 0x1 flg: 0x06
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0xbbbc0601
check value in block header: 0x1691
computed block checksum: 0x0
Reading datafile '+DATADG/lzldb/datafile/cachedata03.dbf' for corruption at rdba: 0x083939c2 (file 32, block 3750338)
Reread (file 32, block 3750338) found same corrupt data (no logical check)
Errors in file /oracle/app/oracle/diag/rdbms/lzldb/lzldb2/trace/lzldb2_ora_112234.trc (incident=798393):
ORA-01578: ORACLE 數據塊損�(文件�32, 塊號 3750338)
ORA-01110: 數據文件 32: '+DATADG/lzldb/datafile/cachedata03.dbf'
Incident details in: /oracle/app/oracle/diag/rdbms/lzldb/lzldb2/incident/incdir_798393/lzldb2_ora_112234_i798393.trc
Mon Jul 08 15:42:41 2019
Hex dump of (file 1, block 118639) in trace file /oracle/app/oracle/diag/rdbms/lzldb/lzldb2/trace/lzldb2_smon_80667.trc
Corrupt block relative dba: 0x0041cf6f (file 1, block 118639)
Bad header found during buffer read
Data in bad block:
type: 6 format: 2 rdba: 0x006e106f
last change scn: 0x0e6a.1dbc963d seq: 0x1 flg: 0x04
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x963d0601
check value in block header: 0x19d0
computed block checksum: 0x0
Reading datafile '+DATADG/lzldb/datafile/system.259.864741115' for corruption at rdba: 0x0041cf6f (file 1, block 118639)
Reread (file 1, block 118639) found same corrupt data (no logical check)
Hex dump of (file 0, block 1) in trace file /oracle/app/oracle/diag/rdbms/lzldb/lzldb2/trace/lzldb2_smon_80667.trc
Corrupt block relative dba: 0x00000001 (file 0, block 1)
Bad header found during control file header read
Data in bad block:
type: 1 format: 2 rdba: 0x0014085f
last change scn: 0x809c.001d382a seq: 0xfb flg: 0xd4
spare1: 0x0 spare2: 0x0 spare3: 0x60
consistency value in tail: 0x001234c1
check value in block header: 0x2
computed block checksum: 0x0
Mon Jul 08 15:42:42 2019
Incomplete read from log member '+DATADG/lzldb/onlinelog/group_3_1'. Trying next member.
Mon Jul 08 15:42:42 2019
Incomplete read from log member '+DATADG/lzldb/onlinelog/group_3_1'. Trying next member.
Mon Jul 08 15:42:42 2019
Incomplete read from log member '+DATADG/lzldb/onlinelog/group_3_1'. Trying next member.
很多磁盤和io類的報錯
asm alert日誌:
Mon Jul 08 15:44:33 2019
NOTE: ASM client lzldb2:lzldb disconnected unexpectedly.
NOTE: check client alert log.
NOTE: Trace records dumped in trace file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_80677.trc
Mon Jul 08 15:45:34 2019
WARNING: cache read a corrupt block: group=1(ARCHDG) fn=1 blk=7 disk=0 (ASMDISK1) incarn=3916141130 au=2 blk=7 count=1
Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_pz99_149004.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [1] [7] [0 != 1]
NOTE: a corrupted block from group ARCHDG was dumped to /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_pz99_149004.trc
WARNING: cache read (retry) a corrupt block: group=1(ARCHDG) fn=1 blk=7 disk=0 (ASMDISK1) incarn=3916141130 au=2 blk=7 count=1
Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_pz99_149004.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [1] [7] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [1] [7] [0 != 1]
ERROR: cache failed to read group=1(ARCHDG) fn=1 blk=7 from disk(s): 0(ASMDISK1)
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [1] [7] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [1] [7] [0 != 1]
NOTE: cache initiating offline of disk 0 group ARCHDG
NOTE: process _pz99_+asm2 (149004) initiating offline of disk 0.3916141130 (ASMDISK1) with mask 0x7e in group 1
NOTE: initiating PST update: grp = 1, dsk = 0/0xe96b924a, mask = 0x6a, op = clear
Mon Jul 08 15:45:34 2019
GMON updating disk modes for group 1 at 13 for pid 61, osid 149004
ERROR: Disk 0 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 1)
Mon Jul 08 15:45:34 2019
NOTE: cache dismounting (not clean) group 1/0xE52B6129 (ARCHDG)
NOTE: messaging CKPT to quiesce pins Unix process pid: 174610, image: oracle@lzl02 (B000)
Mon Jul 08 15:45:34 2019
NOTE: halting all I/Os to diskgroup 1 (ARCHDG)
Mon Jul 08 15:45:34 2019
NOTE: LGWR doing non-clean dismount of group 1 (ARCHDG)
NOTE: LGWR sync ABA=21221.7871 last written ABA 21221.7871
WARNING: Offline for disk ASMDISK1 in mode 0x7f failed.
Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_pz99_149004.trc (incident=2215850):
ORA-15335: ASM metadata corruption detected in disk group 'ARCHDG'
ORA-15130: diskgroup "ARCHDG" is being dismounted
ORA-15066: offlining disk "ASMDISK1" in group "ARCHDG" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [1] [7] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [1] [7] [0 != 1]
Incident details in: /oracle/app/grid/diag/asm/+asm/+ASM2/incident/incdir_2215850/+ASM2_pz99_149004_i2215850.trc
Mon Jul 08 15:45:34 2019
kjbdomdet send to inst 1
detach from dom 1, sending detach message to inst 1
Mon Jul 08 15:45:34 2019
List of instances:
1 2
Dirty detach reconfiguration started (new ddet inc 1, cluster inc 20)
Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 1 invalid = TRUE
3514 GCS resources traversed, 0 cancelled
Dirty Detach Reconfiguration complete
Mon Jul 08 15:45:35 2019
WARNING: dirty detached from domain 1
NOTE: cache dismounted group 1/0xE52B6129 (ARCHDG)
SQL> alter diskgroup ARCHDG dismount force /* ASM SERVER:3844825385 */
Mon Jul 08 15:45:35 2019
Dumping diagnostic data in directory=[cdmp_20190708154535], requested by (instance=2, osid=149004 (PZ99)), summary=[incident=2215850].
Mon Jul 08 15:45:35 2019
Sweep [inc][2215850]: completed
System State dumped to trace file /oracle/app/grid/diag/asm/+asm/+ASM2/incident/incdir_2215850/+ASM2_pz99_149004_i2215850.trc
Mon Jul 08 15:45:35 2019
NOTE: cache deleting context for group ARCHDG 1/0xe52b6129
Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_pz99_149004.trc (incident=2215851):
ORA-15335: ASM metadata corruption detected in disk group 'ARCHDG'
ORA-15130: diskgroup "ARCHDG" is being dismounted
ORA-15066: offlining disk "ASMDISK1" in group "ARCHDG" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [1] [7] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [1] [7] [0 != 1]
Incident details in: /oracle/app/grid/diag/asm/+asm/+ASM2/incident/incdir_2215851/+ASM2_pz99_149004_i2215851.trc
GMON dismounting group 1 at 14 for pid 27, osid 174610
NOTE: Disk ASMDISK1 in mode 0x7f marked for de-assignment
NOTE: Disk ASMDISK2 in mode 0x7f marked for de-assignment
NOTE: Disk ASMDISK18 in mode 0x7f marked for de-assignment
NOTE: Disk ASMDISK19 in mode 0x7f marked for de-assignment
NOTE: Disk ASMDISK51 in mode 0x7f marked for de-assignment
NOTE: Disk ASMDISK50 in mode 0x7f marked for de-assignment
NOTE: Disk ASMDISK100 in mode 0x7f marked for de-assignment
NOTE: Disk ASMDISK101 in mode 0x7f marked for de-assignment
NOTE: Disk ASMDISK102 in mode 0x7f marked for de-assignment
NOTE: Disk ASMMACDISK26 in mode 0x7f marked for de-assignment
NOTE: Disk ASMMACDISK27 in mode 0x7f marked for de-assignment
SUCCESS: diskgroup ARCHDG was dismounted
SUCCESS: alter diskgroup ARCHDG dismount force /* ASM SERVER:3844825385 */
Mon Jul 08 15:45:36 2019
NOTE: diskgroup resource ora.ARCHDG.dg is offline
SUCCESS: ASM-initiated MANDATORY DISMOUNT of group ARCHDG
Mon Jul 08 15:45:36 2019
Sweep [inc][2215851]: completed
Mon Jul 08 15:45:36 2019
Sweep [inc2][2215850]: completed
Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_pz99_149004.trc (incident=2215852):
ORA-15335: ASM metadata corruption detected in disk group ''
Incident details in: /oracle/app/grid/diag/asm/+asm/+ASM2/incident/incdir_2215852/+ASM2_pz99_149004_i2215852.trc
Dumping diagnostic data in directory=[cdmp_20190708154536], requested by (instance=2, osid=149004 (PZ99)), summary=[incident=2215851].
Mon Jul 08 15:45:37 2019
Sweep [inc][2215852]: completed
Dumping diagnostic data in directory=[cdmp_20190708154537], requested by (instance=2, osid=149004 (PZ99)), summary=[incident=2215852].
Dumping diagnostic data in directory=[cdmp_20190708154538], requested by (instance=1, osid=66351), summary=[incident=2215833].
Mon Jul 08 15:46:36 2019
Sweep [inc2][2215852]: completed
Sweep [inc2][2215851]: completed
Mon Jul 08 15:48:52 2019
NOTE: AMDU dump of disk group ARCHDG created at /oracle/app/grid/diag/asm/+asm/+ASM2/incident/incdir_2215850
Mon Jul 08 16:14:23 2019
Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_211166.trc (incident=2215578):
ORA-00600: internal error code, arguments: [kfdskResolveDisk1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/app/grid/diag/asm/+asm/+ASM2/incident/incdir_2215578/+ASM2_ora_211166_i2215578.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Mon Jul 08 16:14:24 2019
Dumping diagnostic data in directory=[cdmp_20190708161424], requested by (instance=2, osid=211166), summary=[incident=2215578].
Mon Jul 08 16:14:25 2019
Sweep [inc][2215578]: completed
Sweep [inc2][2215578]: completed
asm報錯且有強制offline archdg動作
判斷鏈路可能有問題,主機工程師幫忙修復鏈路問題
然後查看2個節點多路徑個數
[root@lzl01 fc_host]# multipath -ll|grep mpath|wc -l
129
[root@lzl02 fc_host]# multipath -ll|grep mpath|wc -l
129
多路徑個數一致,然後啓庫仍然失敗,2節點磁盤組仍然離線
對比2個節點多路徑綁定,發現有幾個盤符不一致
1節點:
mpathaw (36005076305ffd7840000000000002030) IBM,2107900
mpathb (36005076305ffd7840000000000002004) IBM,2107900
2節點:
mpathaw (36005076305ffd7840000000000002030) IBM,2107900
mpathb (36005076305ffd7840000000000002006) IBM,2107900
mpathb盤符不一致,但是發現/etc/multipath.conf裏面的是一致的,這就比較奇怪了
爲什麼2節點的多路徑配置沒有生效?
於是刪除多路徑配置,重置多路徑軟件,然後重啓多路徑,終於一致了
asmlib重新掃盤
[root@lzl02 fc_host]# oracleasm scandisks
重新掛載2節點磁盤組
[grid@lzl02 fc_host]#
srvctl start diskgroup -g archdg
啓動數據庫
[oracle@lzl02 fc_host]#srvctl start database -d lzl
ok!