oracle goldengate extract抽取進程異常終止,報錯OGG-01973 The redo record indicates data loss on object
問題描述:
最近一段時間發現goldengate經常出現異常,extract進程終止,無法啓動,檢查日誌報錯如下:
可以看到錯誤ERROR OGG-01973 The redo record indicates data loss on object 94,785.
問題處理
通過查找oracle mos文檔,找到一遍關於這個錯誤的介紹
OGG v11.2 Extract abends with "OGG-01965 The redo record indicates data loss on object " or "OGG-01973 The redo record indicates data loss on object " (Doc ID 1483431.1)
APPLIES TO:
Oracle GoldenGate - Version 11.2.1.0.1 and later
Information in this document applies to any platform.
SYMPTOMS
2012-08-02 09:10:09 ERROR OGG-01965 The redo record indicates data loss on object xxx,xxx.
or
2012-18-02 05:14:03 ERROR OGG-01973 The redo record indicates data loss on object xxx,xxx
CHANGES
CAUSE
Redo log records indicate data missing for the object in question and thus the error
SOLUTION
Prior to the code change in bug 14158453 Extract would abend with errors similar to following as described in KM note 1465720.1 for
tables/partitions with NOLOGGING regardless of whether the tablespace/ database is in logging/force logging mode.
ERROR OGG-01960 Failed to validate table <OWNER>.<TABLENAME>. The table is created with the NOLOGGING option, which is not supported.
Extract may not be able to capture data from it.
And DBOPTIONS ALLOWNOLOGGING can be added in the extract to work-around the error with known limitations as described in 1465720.1
However post the code change in bug 14158453 the parameter DBOPTIONS ALLOWNOLOGGING is no longer valid
Extract no longer relies on the table metadata (logging/ nologging) and instead uses special records in the redo log to detect
missing data and when detected it would error as below indicating the object id of the table in question (same type of error is signaled when table data has been loaded via SQL*Loader with UNRECOVERABLE parameter):
2012-08-02 09:10:09 ERROR OGG-01965 The redo record indicates data loss on object xxx,xxx.
The table can be excluded from the extract, enable logging at the table/ partition/ tablespace level (or if table data is loaded via SQL*Loader do not use UNRECOVERABLE) and include it back in the extract
if it needs to be replicated across
If using DDL extraction following can be run to see the offending object name
SELECT metadata_text, fragmentNo, optime FROM <owner>.GGS_DDL_HIST WHERE seqno = (SELECT MIN(seqno) FROM <owner>.GGS_DDL_HIST
WHERE objectid = (SELECT objectId FROM <owner>.GGS_DDL_HIST_ALT WHERE altObjectId = <object id reported> AND rownum=1));
eg.,
SELECT metadata_text, fragmentNo, optime FROM ggs_owner.GGS_DDL_HIST WHERE seqno = (SELECT MIN(seqno) FROM ggs_owner.GGS_DDL_HIST
WHERE objectid = (SELECT objectId FROM ggs_owner.GGS_DDL_HIST_ALT WHERE altObjectId = xxxxxx AND rownum=1));
Also following note can be referred to prevent these nologging operations
The Gains and Pains of Nologging Operations (Doc ID 290161.1)
The error can also be seen for Compression Advisor related tables (DBMS_TABCOMP* in 11.2.0.3 & CMP<x>$<object_id> in 11.2.0.4) and can be safely excluded from replication
REFERENCES
BUG:14405239 - ALLOWNOLOGGING IS NOT WORKING IN LATEST OGG V11.2.1.0.1 MLR VERSIONS
這是一個ogg11.2版本的bug,我本機使用的ogg版本是12.2,應該不是這個原因造成的,根據報錯的object_id,找到是oracle2019用戶下面的zs_ac06表
查看錶是loging模式
檢查附加日誌沒有啓動,將附加日誌開啓,之後再啓動ogg進程,發現還是不行報錯。
實在是找不到原因了,網上有一些貼子說過濾掉這張表的同步,但是該表是業務需要使用的到沒法過濾,最後不死心在mos上看到另外一遍文章Classic Extract Fails With Error “The redo record indicates data loss on object nnnnn” (Doc ID 2316712.1)
APPLIES TO:
Oracle GoldenGate - Version 12.2.0.1.0 and later
Information in this document applies to any platform.
SYMPTOMS
Classic extract on 12.2 fails with ERROR OGG-01973 ...The redo record indicates data loss on object nnnnn
Note that, Table with same object id exists on database
SQL> select object_name,owner, object_type from dba_objects where object_id=nnnn;
CHANGES
none
CAUSE
Redo corruption, Activity logs may show messages like
2017-10-11 13:56:54.136 INFO |er.redo.ora.thread |Redo_Thread_2| 5875 oracle/redoorardr.c | DETECT ZERO LENGTH RECORD1 at 93185656!
2017-10-11 13:56:54.136 INFO |er.redo.ora.thread |Redo_Thread_2| 6867 oracle/redoorardr.c | Completed pdata buffering in sequence 1288, with nb = 1, tb = 1, lb = 182003.
2017-10-11 13:56:54.136 INFO |er.redo.ora.thread |Redo_Thread_2| 262 oracle/redoorapl.c | PL_get: len=104, vld=4, subscn=1, SCN = 2501.2438258045 (10744151465341), seqno=1288, rba=93185552
2017-10-11 13:56:54.136 ERROR|er.redo.ora.thread |Redo_Thread_2| 6993 oracle/redoorardr.c | Detected an SCN base increase greater than 1, last_record_scn = 2501.2438257951 (10744151465247) current_rechdr.scn = 2501.2438258045 (10744151465341)
2017-10-11 13:56:54.136 INFO |gglog.std.application |Redo_Thread_4| 69 ggapp/ggPrintLn.c | [rpt]:
2017-10-11 13:56:54 ERROR OGG-01973 The redo record indicates data loss on object 437,723.
SOLUTION
Tranlogoptions allowdataloss will work for 12.2 OGG versions. This is made internal parameter on 12.2 onwards
If customer is on OGG 12.2, use _allowdataloss and restart the extract.
Tranlogoptions _allowdataloss
版本和問題描述和我現在的問題一致,根據文檔的提示,在抽取進程上面加上Tranlogoptions _allowdataloss參數,之後再進行啓動,等了大概半個小時,總算看到抽取日誌有變動沒有報錯了,問題解決大功告成。