週末一個庫又掛了,跑到客戶現場去解決問題
1.查看啓庫時間(29號下午6點查的,所以之前應該所有一次異常重啓)
SQL> select startup_time,inst_id from gv$instance;
STARTUP_TIME INST_ID
------------------- ----------
2019-06-29 13:29:57 1
2019-06-29 13:54:17 2
2.查看集羣狀態
[grid@jcsjsjk01 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCHDG.dg
ONLINE ONLINE jcsjsjk01
ONLINE ONLINE jcsjsjk02
ora.DATA2.dg
ONLINE ONLINE jcsjsjk01
ONLINE OFFLINE jcsjsjk02
ora.DATADG.dg
ONLINE ONLINE jcsjsjk01
ONLINE ONLINE jcsjsjk02
ora.LISTENER.lsnr
ONLINE OFFLINE jcsjsjk01
ONLINE OFFLINE jcsjsjk02
ora.OCRDG.dg
ONLINE ONLINE jcsjsjk01
ONLINE ONLINE jcsjsjk02
ora.asm
ONLINE ONLINE jcsjsjk01 Started
ONLINE ONLINE jcsjsjk02 Started
ora.gsd
OFFLINE OFFLINE jcsjsjk01
OFFLINE OFFLINE jcsjsjk02
ora.net1.network
ONLINE ONLINE jcsjsjk01
ONLINE ONLINE jcsjsjk02
ora.ons
ONLINE ONLINE jcsjsjk01
ONLINE ONLINE jcsjsjk02
ora.registry.acfs
ONLINE ONLINE jcsjsjk01
ONLINE ONLINE jcsjsjk02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE INTERMEDIATE jcsjsjk02 Not All Endpoints R
egistered
ora.cvu
1 ONLINE ONLINE jcsjsjk02
ora.epmjc.db
1 ONLINE ONLINE jcsjsjk01 Open
2 ONLINE OFFLINE
ora.jcsjsjk01.vip
1 ONLINE ONLINE jcsjsjk01
ora.jcsjsjk02.vip
1 ONLINE ONLINE jcsjsjk02
ora.oc4j
1 ONLINE ONLINE jcsjsjk02
ora.scan1.vip
1 ONLINE ONLINE jcsjsjk02
集羣狀態不對,有塊盤在2節點上沒掛上
3、 查看磁盤組狀態
GROUP_NUMBER NAME STATE TOTAL_MB FREE_MB OFFLINE_DISKS
------------ ------------------------------------------------------------ ---------------------- ---------- ---------- -------------
1 ARCHDG MOUNTED 9523197 3982493 0
0 DATA2 DISMOUNTED 0 0 0
3 DATADG MOUNTED 36556792 14891 0
4 OCRDG MOUNTED 10236 9840 0
GROUP_NUMBER NAME STATE TOTAL_MB FREE_MB OFFLINE_DISKS
------------ ------------------------------------------------------------ ---------------------- ---------- ---------- -------------
1 ARCHDG MOUNTED 9523197 3978710 0
2 DATA2 MOUNTED 42045440 161689 0
3 DATADG MOUNTED 36556792 14891 0
4 OCRDG MOUNTED 10236 9840 0
SQL> alter diskgroup DATA2 mount;
alter diskgroup DATA2 mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "14" is missing from group number "2"
2節點磁盤DATA2沒有掛上,手動掛載失敗,報錯2號磁盤組的14號盤沒有找到
4.磁盤狀態 。查看1節點磁盤
1* select GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,MODE_STATUS,OS_MB,PATH from v$asm_disk where group_number=2 order by 2
SQL> /
GROUP_NUMBER DISK_NUMBER MOUNT_STATUS HEADER_STATUS MODE_STATUS OS_MB PATH
------------ ----------- -------------- ------------------------ -------------- ---------- --------------------------------------------------
2 0 CACHED MEMBER ONLINE 1116160 ORCL:ASMMACDISK01
2 1 CACHED MEMBER ONLINE 1116160 ORCL:ASMMACDISK02
2 2 CACHED MEMBER ONLINE 1116160 ORCL:ASMMACDISK03
2 3 CACHED MEMBER ONLINE 1116160 ORCL:ASMMACDISK04
2 4 CACHED MEMBER ONLINE 1116160 ORCL:ASMMACDISK05
2 5 CACHED MEMBER ONLINE 1116160 ORCL:ASMMACDISK06
2 6 CACHED MEMBER ONLINE 1116160 ORCL:ASMMACDISK07
2 7 CACHED MEMBER ONLINE 1116160 ORCL:ASMMACDISK08
2 8 CACHED MEMBER ONLINE 1116160 ORCL:ASMMACDISK09
2 9 CACHED MEMBER ONLINE 1116160 ORCL:ASMMACDISK10
2 10 CACHED MEMBER ONLINE 1116160 ORCL:ASMMACDISK11
2 11 CACHED MEMBER ONLINE 1116160 ORCL:ASMMACDISK12
2 12 CACHED MEMBER ONLINE 1116160 ORCL:ASMMACDISK13
2 13 CACHED MEMBER ONLINE 1116160 ORCL:ASMMACDISK14
2 14 CACHED MEMBER ONLINE 1116160 ORCL:ASMMACDISK15
2 15 CACHED MEMBER ONLINE 1116160 ORCL:ASMMACDISK16
2 16 CACHED MEMBER ONLINE 1116160 ORCL:ASMMACDISK17
2 17 CACHED MEMBER ONLINE 1116160 ORCL:ASMMACDISK18
2 18 CACHED MEMBER ONLINE 1116160 ORCL:ASMMACDISK19
2 19 CACHED MEMBER ONLINE 1116160 ORCL:ASMMACDISK20
2節點少了 ORCL:ASMMACDISK15,disknumber爲14
所有應該是2節點ORCL:ASMMACDISK15盤掉了的緣故
ORCL:****這種方式命名的磁盤,說明用的是asmlib
5、重新掃盤
重新掃盤:掃完等幾分鐘 ,如果還沒有需要重啓服務器再掃盤
echo '1'> /sys/class/fc_host/host5/issue_lip
echo '1'> /sys/class/fc_host/host6/issue_lip
結果還是沒用
6、檢查multipath配置
可以從oracleasm querydisk -v -p OCRVOL1這樣查看asmlib到底是綁的什麼盤
而且從oracleasm listdisks中看出,2節點少了 ORCL:ASMMACDISK15盤
下面是wwid中的配置
1節點:
mpathfb (3600b342a13a806fd193cd503fd0000d9) dm-33 MacroSAN,LU
mpathfc (3600b34232aa337ad388dd025cd0000d9) dm-37 MacroSAN,LU
mpathfd (3600b3421df3bb16da9fed23f2d0000d9) dm-32 MacroSAN,LU
mpathfe (3600b3421f7d38fad44afd0ca0d0000d9) dm-31 MacroSAN,LU
mpathfk (3600b342506bf3ebdb842d43fcd0000d9) dm-46 MacroSAN,LU
mpathfl (3600b3420c37375dd0aa5db41cd0000d9) dm-45 MacroSAN,LU
mpathfm (3600b342df76e211d37cddb941d0000d9) dm-47 MacroSAN,LU
mpathfn (3600b34259e1e1abd5415db2a9d0000d9) dm-48 MacroSAN,LU
mpathfo (3600b34228b7a62cd2a4dded63d0000d9) dm-44 MacroSAN,LU
mpathfp (3600b3427856a473dfe55d48b5d0000d9) dm-50 MacroSAN,LU
mpathfq (36005076305ffd7840000000000003035) dm-77 IBM,2107900
2節點:
mpatheo (36005076305ffd7840000000000002171) dm-90 IBM,2107900
mpathf (36005076305ffd7840000000000002007) dm-7 IBM,2107900
mpathfa (3600b342ed61dbcbdacb5db433d0000d9) dm-37 MacroSAN,LU
mpathfb (3600b342a13a806fd193cd503fd0000d9) dm-34 MacroSAN,LU
mpathfc (3600b34232aa337ad388dd025cd0000d9) dm-36 MacroSAN,LU
mpathfd (3600b3421df3bb16da9fed23f2d0000d9) dm-35 MacroSAN,LU
mpathfe (36005076305ffd7840000000000002173) dm-38 IBM,2107900
mpathfk (3600b342506bf3ebdb842d43fcd0000d9) dm-48 MacroSAN,LU
mpathfl (3600b3420c37375dd0aa5db41cd0000d9) dm-45 MacroSAN,LU
mpathfm (3600b342df76e211d37cddb941d0000d9) dm-47 MacroSAN,LU
mpathfn (3600b34259e1e1abd5415db2a9d0000d9) dm-50 MacroSAN,LU
mpathfo (3600b34228b7a62cd2a4dded63d0000d9) dm-44 MacroSAN,LU
mpathfp (3600b3427856a473dfe55d48b5d0000d9) dm-46 MacroSAN,LU
mpathfq (36005076305ffd7840000000000003035) dm-77 IBM,2107900
mpathfr (36005076305ffd7840000000000003120) dm-80 IBM,2107900
mpathfu (3600b342de126e74d35b3dd23cd0000d9) dm-49 MacroSAN,LU
說明multipath配置了2個節點的那塊盤的
那麼實際存在嗎?multipath映射了嗎?
[root@jcsjsjk01 log]# multipath -ll|grep 3600b3421f7d38fad44afd0ca0d0000d9
mpathfe (3600b3421f7d38fad44afd0ca0d0000d9) dm-31 MacroSAN,LU
You have mail in /var/spool/mail/root
[root@jcsjsjk01 log]# cat /etc/multipath/bindings |grep 3600b3421f7d38fad44afd0ca0d0000d9
mpathfe 3600b3421f7d38fad44afd0ca0d0000d9
[root@jcsjsjk02 ~]# multipath -ll|grep 3600b3421f7d38fad44afd0ca0d0000d9
[root@jcsjsjk02 ~]# cat /etc/multipath/bindings |grep 3600b3421f7d38fad44afd0ca0d0000d9
mpathdz 3600b3421f7d38fad44afd0ca0d0000d9
說明1節點multipath綁定了3600b3421f7d38fad44afd0ca0d0000d9這塊盤,但是2節點沒有綁出來。
7、
重啓multipath
[root@jcsjsjk02 ~]# /etc/init.d/multipathd restart
ok
Stopping multipathd daemon: [ OK ]
Starting multipathd daemon: [ OK ]
[root@jcsjsjk02 ~]# multipath -ll|grep 3600b3421f7d38fad44afd0ca0d0000d9
You have mail in /var/spool/mail/rootmpathfe (3600b3421f7d38fad44afd0ca0d0000d9) dm-38 MacroSAN,LU
[root@jcsjsjk02 ~]# You have mail in /var/spool/mail/root^C
[root@jcsjsjk02 ~]#
[root@jcsjsjk02 ~]#
[root@jcsjsjk02 ~]# multipath -ll|grep 3600b3421f7d38fad44afd0ca0d0000d9
mpathfe (3600b3421f7d38fad44afd0ca0d0000d9) dm-38 MacroSAN,LU
重啓後multipath聚合出了新的路徑
然後asmlib重新掃盤
oracleasm scandisks
可以看到明顯有個盤新增了(忘了cp~)
重新掛盤
SQL> alter diskgroup data2 mount;
Diskgroup altered.
ok,檢查數據庫,收工。