dataguard (ADG)備庫移動(遷移)數據文件
最近負責維護的一套oracle rac的dataguard環境在巡檢的時候發現問題,現在將當時的處理過程整理了下。
環境介紹:
primary主庫 | Standby備庫 |
---|---|
HPUX 11.31系統 | HPUX 11.31系統 |
ORACLE 11.2.0.4 RAC數據庫 | ORACLE 11.2.0.4 RAC數據庫 |
參考文章:http://blog.chinaunix.net/uid-77311-id-5818675.html
1. 問題現象:
當時巡檢時發現主庫歸檔空間用的很多,當時覺的不對勁,因爲正常歸檔不應該保留這麼多的,備份完就會刪除,因爲這個庫是個ADG,所以開始檢查ADG狀態和日誌,發現ADG進程異常沒有正常同步到備庫,且日誌有異常報錯。
2. 檢查過程:
1) 首先檢查ADG進程的狀態,以及歸檔應用情況,發現源端(primary)和目標端(Standby)隊列相差比較大源端已經到了27450,目標端還在27302,而且目標端的MRP0 進程沒有運行。
2)源端查看日誌應用情況,12月1日之後的歸檔都沒有在目標端應用。
3)檢查primary數據庫日誌,在源端數據庫發現archive failed錯誤。
FAL[server, ARC3]: FAL archive failed, see trace file.
ARCH: FAL archive failed. Archiver continuing
ORACLE Instance ydjyorcl1 - Archival Error. Archiver continuing.
FAL[server, ARC0]: FAL archive failed, see trace file.
ARCH: FAL archive failed. Archiver continuing
ORACLE Instance ydjyorcl1 - Archival Error. Archiver continuing.
FAL[server, ARC1]: FAL archive failed, see trace file.
ARCH: FAL archive failed. Archiver continuing
ORACLE Instance ydjyorcl1 - Archival Error. Archiver continuing.
4)檢查Standby數據庫日誌,提示ARCH的diskgroup空間不足 ,基本上找到問題關鍵所在就是ARCH磁盤組的空間不夠造成的。
Creating archive destination file : +ARCH (1902850 blocks)
Errors in file /opt/u01/app/oracle/diag/rdbms/ydjydg/ydjydg1/trace/ydjydg1_ora_13970.trc:
ORA-19816: WARNING: Files may exist in db_recovery_file_dest that are not known to database.
ORA-17502: ksfdcre:4 Failed to create file +ARCH
ORA-15041: diskgroup "ARCH" space exhausted
5)查看磁盤組的使用情況
主庫:
備庫:和主庫相比,ARCH和DATA磁盤組都已經沒有剩於空間了。
3. 原因:
正常情況主庫和備庫使用的空間應該是一致的,不應該出現空間主備庫相差這麼大,根據檢查結果主要是以下原因導致的:
1、DATA空間滿是因爲前一週主庫增加了一個表空間,數據文件放在ARCH磁盤組,而備庫standby_file_management爲自動處理到DATA磁盤組。
主庫:
備庫:
2、ARCH磁盤組空間滿主要是因爲歸檔保留週期比較長,當前是一個月。
4. 處理:
找到原因之後,處理措施是手動在備庫將新增的表空間文件遷移到ARCH磁盤組。
1、關閉adg日誌應用
SQL> alter database recover managed standby database cancel;
2、關閉數據庫,兩個rac節點都要執行
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
#檢查狀態
grid@ydjydb3[+ASM1]:/home/grid$ crsctl status resource -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCH.dg
ONLINE ONLINE ydjydb3
ONLINE ONLINE ydjydb4
ora.DATA.dg
ONLINE ONLINE ydjydb3
ONLINE ONLINE ydjydb4
ora.LISTENER.lsnr
ONLINE ONLINE ydjydb3
ONLINE ONLINE ydjydb4
ora.OCRVOTING.dg
ONLINE ONLINE ydjydb3
ONLINE ONLINE ydjydb4
ora.asm
ONLINE ONLINE ydjydb3 Started
ONLINE ONLINE ydjydb4 Started
ora.gsd
OFFLINE OFFLINE ydjydb3
OFFLINE OFFLINE ydjydb4
ora.net1.network
ONLINE ONLINE ydjydb3
ONLINE ONLINE ydjydb4
ora.ons
ONLINE ONLINE ydjydb3
ONLINE ONLINE ydjydb4
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE ydjydb3
ora.cvu
1 ONLINE ONLINE ydjydb3
ora.oc4j
1 OFFLINE OFFLINE
ora.scan1.vip
1 ONLINE ONLINE ydjydb3
ora.ydjydb3.vip
1 ONLINE ONLINE ydjydb3
ora.ydjydb4.vip
1 ONLINE ONLINE ydjydb4
ora.ydjydg.db
1 OFFLINE OFFLINE Instance Shutdown
2 OFFLINE OFFLINE Instance Shutdown
3、啓動nomount狀態。
SQL> startup nomount;
ORACLE instance started.
Total System Global Area 1.0957E+11 bytes
Fixed Size 2218568 bytes
Variable Size 5.3419E+10 bytes
Database Buffers 5.6103E+10 bytes
Redo Buffers 43307008 bytes
4、移動數據文件
ASMCMD> cp +data/ydjydg/datafile/FQCS.595.1023725157 +arch/ydjydg/datafile/FQCS.595.1023725157
copying +data/ydjydg/datafile/FQCS.595.1023725157 -> +arch/ydjydg/datafile/FQCS.595.1023725157
ASMCMD-8016: copy source '+data/ydjydg/datafile/FQCS.595.1023725157' and target '+arch/ydjydg/datafile/FQCS.595.1023725157' failed
ORA-15056: additional error message
ORA-15046: ASM file name '+arch/ydjydg/datafile/FQCS.595.1023725157' is not in single-file creation form
ORA-06512: at "SYS.X$DBMS_DISKGROUP", line 415
ORA-06512: at line 3 (DBD ERROR: OCIStmtExecute)
發現在asm裏面無法移動數據文件,只需要在ASM上覆制的時候不要指定文件後面的數值即可。
ASMCMD> cp +data/ydjydg/datafile/FQCS.618.1025740023 +arch/ydjydg/datafile/fqcs19.dbf
copying +data/ydjydg/datafile/FQCS.618.1025740023 -> +arch/ydjydg/datafile/fqcs19.dbf
ASMCMD> cp +data/ydjydg/datafile/FQCS.620.1025740183 +arch/ydjydg/datafile/fqcs21.dbf
copying +data/ydjydg/datafile/FQCS.620.1025740183 -> +arch/ydjydg/datafile/fqcs21.dbf
查看文件發現ASM其實並沒有完全根據命令將數據文件拷貝到指定的目錄,ASM只是在目標目錄下存儲了一個alias,真正的文件被ASM放到了其他的位置。
ASMCMD> ls -l
Type Redund Striped Time Sys Name
N fqcs01.dbf => +ARCH/ASM/DATAFILE/fqcs01.dbf.1072.1027602353
N fqcs02.dbf => +ARCH/ASM/DATAFILE/fqcs02.dbf.1859.1027602347
N fqcs03.dbf => +ARCH/ASM/DATAFILE/fqcs03.dbf.720.1027602777
N fqcs04.dbf => +ARCH/ASM/DATAFILE/fqcs04.dbf.1400.1027602837
N fqcs05.dbf => +ARCH/ASM/DATAFILE/fqcs05.dbf.2388.1027602847
N fqcs06.dbf => +ARCH/ASM/DATAFILE/fqcs06.dbf.1881.1027602885
N fqcs07.dbf => +ARCH/ASM/DATAFILE/fqcs07.dbf.3741.1027602893
N fqcs08.dbf => +ARCH/ASM/DATAFILE/fqcs08.dbf.1629.1027602923
N fqcs09.dbf => +ARCH/ASM/DATAFILE/fqcs09.dbf.3080.1027602931
N fqcs10.dbf => +ARCH/ASM/DATAFILE/fqcs10.dbf.2923.1027602817
N fqcs11.dbf => +ARCH/ASM/DATAFILE/fqcs11.dbf.3016.1027603145
N fqcs12.dbf => +ARCH/ASM/DATAFILE/fqcs12.dbf.3025.1027602857
N fqcs13.dbf => +ARCH/ASM/DATAFILE/fqcs13.dbf.3091.1027603179
N fqcs14.dbf => +ARCH/ASM/DATAFILE/fqcs14.dbf.3113.1027602891
N fqcs15.dbf => +ARCH/ASM/DATAFILE/fqcs15.dbf.3135.1027603215
N fqcs16.dbf => +ARCH/ASM/DATAFILE/fqcs16.dbf.3279.1027602927
N fqcs17.dbf => +ARCH/ASM/DATAFILE/fqcs17.dbf.3325.1027603257
N fqcs18.dbf => +ARCH/ASM/DATAFILE/fqcs18.dbf.3398.1027602969
N fqcs19.dbf => +ARCH/ASM/DATAFILE/fqcs19.dbf.947.1027603297
N fqcs20.dbf => +ARCH/ASM/DATAFILE/fqcs20.dbf.891.1027603009
N fqcs21.dbf => +ARCH/ASM/DATAFILE/fqcs21.dbf.877.1027603335
5、將數據庫啓動到mount狀態,修改standby_file_management 參數,再修改數據文件路徑。
SQL> startup mount;
ORACLE instance started.
Total System Global Area 1.0957E+11 bytes
Fixed Size 2218568 bytes
Variable Size 5.3419E+10 bytes
Database Buffers 5.6103E+10 bytes
Redo Buffers 43307008 bytes
Database mounted.
SQL> alter system set standby_file_management = MANUAL;
System altered.
SQL> alter database rename file '+data/ydjydg/datafile/FQCS.595.1023725157' to '+arch/ydjydg/datafile/fqcs01.dbf';
alter database rename file '+data/ydjydg/datafile/FQCS.596.1024163587' to '+arch/ydjydg/datafile/fqcs02.dbf';
alter database rename file '+data/ydjydg/datafile/FQCS.602.1025733207' to '+arch/ydjydg/datafile/fqcs03.dbf';
alter database rename file '+data/ydjydg/datafile/FQCS.603.1025733209' to '+arch/ydjydg/datafile/fqcs04.dbf';
alter database rename file '+data/ydjydg/datafile/FQCS.604.1025733211' to '+arch/ydjydg/datafile/fqcs05.dbf';
alter database rename file '+data/ydjydg/datafile/FQCS.605.1025733215' to '+arch/ydjydg/datafile/fqcs06.dbf';
alter database rename file '+data/ydjydg/datafile/FQCS.606.1025733217' to '+arch/ydjydg/datafile/fqcs07.dbf';
alter database rename file '+data/ydjydg/datafile/FQCS.607.1025733219' to '+arch/ydjydg/datafile/fqcs08.dbf';
alter database rename file '+data/ydjydg/datafile/FQCS.608.1025733223' to '+arch/ydjydg/datafile/fqcs09.dbf';
alter database rename file '+data/ydjydg/datafile/FQCS.609.1025733225' to '+arch/ydjydg/datafile/fqcs10.dbf';
alter database rename file '+data/ydjydg/datafile/FQCS.610.1025733227' to '+arch/ydjydg/datafile/fqcs11.dbf';
alter database rename file '+data/ydjydg/datafile/FQCS.611.1025733229' to '+arch/ydjydg/datafile/fqcs12.dbf';
alter database rename file '+data/ydjydg/datafile/FQCS.612.1025739745' to '+arch/ydjydg/datafile/fqcs13.dbf';
alter database rename file '+data/ydjydg/datafile/FQCS.613.1025739775' to '+arch/ydjydg/datafile/fqcs14.dbf';
alter database rename file '+data/ydjydg/datafile/FQCS.614.1025739797' to '+arch/ydjydg/datafile/fqcs15.dbf';
alter database rename file '+data/ydjydg/datafile/FQCS.615.1025739855' to '+arch/ydjydg/datafile/fqcs16.dbf';
alter database rename file '+data/ydjydg/datafile/FQCS.616.1025739883' to '+arch/ydjydg/datafile/fqcs17.dbf';
alter database rename file '+data/ydjydg/datafile/FQCS.617.1025739943' to '+arch/ydjydg/datafile/fqcs18.dbf';
alter database rename file '+data/ydjydg/datafile/FQCS.618.1025740023' to '+arch/ydjydg/datafile/fqcs19.dbf';
alter database rename file '+data/ydjydg/datafile/FQCS.619.1025740097' to '+arch/ydjydg/datafile/fqcs20.dbf';
alter database rename file '+data/ydjydg/datafile/FQCS.620.1025740183' to '+arch/ydjydg/datafile/fqcs21.dbf';
Database altered.
SQL>
Database altered.
SQL>
Database altered.
SQL>
Database altered.
6、打開數據庫,並啓動日誌應用
SQL> alter database open;
SQL> alter database recover managed standby database using current logfile disconnect from session;
Database altered.
7、檢查表空間數據文件信息,數據字典已經指向到遷移的文件上去了。
8、檢查adg狀態
#檢查數據庫情況
grid@ydjydb3[+ASM1]:/home/grid$ crsctl status resource -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCH.dg
ONLINE ONLINE ydjydb3
ONLINE ONLINE ydjydb4
ora.DATA.dg
ONLINE ONLINE ydjydb3
ONLINE ONLINE ydjydb4
ora.LISTENER.lsnr
ONLINE ONLINE ydjydb3
ONLINE ONLINE ydjydb4
ora.OCRVOTING.dg
ONLINE ONLINE ydjydb3
ONLINE ONLINE ydjydb4
ora.asm
ONLINE ONLINE ydjydb3 Started
ONLINE ONLINE ydjydb4 Started
ora.gsd
OFFLINE OFFLINE ydjydb3
OFFLINE OFFLINE ydjydb4
ora.net1.network
ONLINE ONLINE ydjydb3
ONLINE ONLINE ydjydb4
ora.ons
ONLINE ONLINE ydjydb3
ONLINE ONLINE ydjydb4
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE ydjydb3
ora.cvu
1 ONLINE ONLINE ydjydb3
ora.oc4j
1 OFFLINE OFFLINE
ora.scan1.vip
1 ONLINE ONLINE ydjydb3
ora.ydjydb3.vip
1 ONLINE ONLINE ydjydb3
ora.ydjydb4.vip
1 ONLINE ONLINE ydjydb4
ora.ydjydg.db
1 ONLINE ONLINE ydjydb3 Open,Readonly
2 ONLINE ONLINE ydjydb4 Open,Readonly
#檢adg情況
SQL> select open_mode, switchover_status from
v$database;
OPEN_MODE SWITCHOVER_STATUS
-------------------- --------------------
READ ONLY NOT ALLOWED
SQL>SELECT PID,PROCESS,STATUS,CLIENT_PROCESS,CLIENT_PID, THREAD# , SEQUENCE# SEQ#,BLOCK# , BLOCKS FROM V$MANAGED_STANDBY;
PID PROCESS STATUS CLIENT_P CLIENT_PID THREAD# SEQ# BLOCK# BLOCKS
-------- --------- ------------ -------- ---------------------------------------- ------- ---------- ---------- ----------
22297 ARCH CLOSING ARCH 22297 1 27451 4096 255
22299 ARCH CONNECTED ARCH 22299 0 0 0 0
22301 ARCH CONNECTED ARCH 22301 0 0 0 0
22303 ARCH CLOSING ARCH 22303 2 33555 1033216 127
22726 RFS IDLE LGWR 5737 1 27452 30379 1
22750 RFS IDLE ARCH 23597 0 0 0 0
22722 RFS IDLE LGWR 14249 2 33557 21120 9
22728 RFS IDLE ARCH 20786 0 0 0 0
22709 RFS IDLE UNKNOWN 13249 0 0 0 0
9 rows selected.
SQL> /
PID PROCESS STATUS CLIENT_P CLIENT_PID THREAD# SEQ# BLOCK# BLOCKS
-------- --------- ------------ -------- ---------------------------------------- ------- ---------- ---------- ----------
22297 ARCH CLOSING ARCH 22297 1 27451 4096 255
22299 ARCH CONNECTED ARCH 22299 0 0 0 0
22301 ARCH CONNECTED ARCH 22301 0 0 0 0
22303 ARCH CLOSING ARCH 22303 2 33555 1033216 127
22726 RFS IDLE LGWR 5737 1 27452 38812 1
22750 RFS IDLE ARCH 23597 0 0 0 0
22722 RFS IDLE LGWR 14249 2 33557 35045 3
22728 RFS IDLE ARCH 20786 0 0 0 0
22709 RFS IDLE UNKNOWN 13249 0 0 0 0
23111 MRP0 APPLYING_LOG N/A N/A 2 32693 1027 5753
23124 RFS RECEIVING UNKNOWN 19301 1 27450 315394 1024
11 rows selected.
過個幾分鐘再刷新的時候可以看到mrp0已經開始在應用日誌了,且接收進程也在正常運行了。
查看日誌延這時間
SQL> select name,value,datum_time from v$dataguard_stats;
NAME VALUE DATUM_TIME
-------------------------------- ---------------------------------------------------------------- ------------------------------
transport lag +00 00:00:00 12/21/2019 13:44:45
apply lag +19 18:30:06 12/21/2019 13:44:45
apply finish time +00 02:16:34.324
estimated startup time 28
至此,adg運行恢復正常。