skip a transaction in goldengate(跳過一個事務OGG)



           

我們現在用OGG做兩個ORACLE OLTP數據庫的A-A同步含DDL,剛發現Replicat進程ABENDING了,下面分析一下原因

ggserr.log日誌

2012-10-31 17:09:05  WARNING OGG-00869  Oracle GoldenGate Delivery for Oracle, ricme.prm:  OCI Error ORA-02292: integrity constraint (ICME.FK_NOPROSCORE_TO_STU) violated - child record found (status = 2292). UPDATE "ICME"."ICME_STUDENT" SET "IC_CODE" = :a1,"REMARK" = :a2,"MODIFY_TIME" = :a3 WHERE "IC_CODE" = :b0.
2012-10-31 17:09:05  WARNING OGG-01004  Oracle GoldenGate Delivery for Oracle, ricme.prm:  Aborted grouped transaction on 'ICME.ICME_STUDENT', Database error 2292 (OCI Error ORA-02292: integrity constraint (ICME.FK_NOPROSCORE_TO_STU) violated - child record found (status = 2292). UPDATE "ICME"."ICME_STUDENT" SET "IC_CODE" = :a1,"REMARK" = :a2,"MODIFY_TIME" = :a3 WHERE "IC_CODE" = :b0).
2012-10-31 17:09:05  WARNING OGG-01003  Oracle GoldenGate Delivery for Oracle, ricme.prm:  Repositioning to rba 84509907 in seqno 40.
2012-10-31 17:09:05  WARNING OGG-01154  Oracle GoldenGate Delivery for Oracle, ricme.prm:  SQL error 2292 mapping ICME.ICME_STUDENT to ICME.ICME_STUDENT OCI Error ORA-02292: integrity constraint (ICME.FK_NOPROSCORE_TO_STU) violated - child record found (status = 2292). UPDATE "ICME"."ICME_STUDENT" SET "IC_CODE" = :a1,"REMARK" = :a2,"MODIFY_TIME" = :a3 WHERE "IC_CODE" = :b0.
2012-10-31 17:09:05  WARNING OGG-01003  Oracle GoldenGate Delivery for Oracle, ricme.prm:  Repositioning to rba 84509907 in seqno 40.
2012-10-31 17:09:05  ERROR   OGG-01296  Oracle GoldenGate Delivery for Oracle, ricme.prm:  Error mapping from ICME.ICME_STUDENT to ICME.ICME_STUDENT.
2012-10-31 17:09:05  ERROR   OGG-01668  Oracle GoldenGate Delivery for Oracle, ricme.prm:  PROCESS ABENDING.

在日誌中能看出大概SQL,我的replicat group配置文件配置了DiscardFile 記錄了image

[oracle@ggsdb dirrpt]$ vi ricme.dsc
OCI Error ORA-02292: integrity constraint (ICME.FK_NOPROSCORE_TO_STU) violated – child record found (status = 2292). UPDATE “ICME”.”ICME_STUDENT” SET “IC_COD
E” = :a1,”REMARK” = :a2,”MODIFY_TIME” = :a3 WHERE “IC_CODE” = :b0
Aborting transaction on dirdat/l2 beginning at seqno 40 rba 84509907
error at seqno 40 rba 84509907
Problem replicating ICME.ICME_STUDENT to ICME.ICME_STUDENT
Mapping problem with compressed key update record (target format)…
*
IC_CODE = 1114020AY
IC_CODE = 3
REMARK =
000000: bf a8 ba c5 d6 d8 b8 b4

看到這個sql,我確認了修改內容,問了下同事果然是失誤操作,修改了學員卡號,而那個卡號上是有trigger,會級連修改好多相關表,而且有外鍵約束,但從庫上的trigger是disable的,所以就遇到了外鍵約束導致備庫更新失敗,不過後來同事又修改回來了,數據上在主庫是還原了的,那我可以來跳過此事務

首先先找到replicat進程當前應用到的rba,也就是csn(commit sequence number),在oracle中的scn,來定位下次應用的起始RBA,它就是在trail文件中下一次replicat 進程將要fseek() call 並起動進程的位置(actual byte position )

GGSCI (ggsdb) 4> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
REPLICAT    ABENDED     RICME       00:00:00      00:29:41    

GGSCI (ggsdb) 5> info rep ricme

REPLICAT   RICME     Last Started 2012-10-31 17:23   Status ABENDED
Checkpoint Lag       00:00:00 (updated 00:29:47 ago)
Log Read Checkpoint  File dirdat/l2000040
                     2012-10-31 17:08:56.879106  RBA 84509907

通過上面的信息我們知道了replicat進程ricme group 下在應用到了dirdat/l2000040的RBA 84509907,我們想跳過這個事務應用下一條記錄就可以,但是可不是簡單的在當前的RBA上加1,RBA必須是有OGG格式過的,如果輸入的是無效地址啓動後EXCEPTION會記錄到ggserr.log中,我們可以用OGG安裝目錄下的logdump工具來定位下一條記錄的“真正”位置

[oracle@ggsdb ogg11r2]$ ./logdump

Oracle GoldenGate Log File Dump Utility for Oracle
Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230

Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.

Logdump 1 >open dirdat/l2000040
Current LogTrail is /oracle/ogg11r2/dirdat/l2000040
Logdump 2 >pos 84509907
Reading forward from RBA 84509907
Logdump 3 >n

2012/10/31 17:08:58.914.149 GGSPKUpdate          Len    69 RBA 84509907
Name: ICME.ICME_STUDENT
After  Image:                                             Partition 4   G  b
 0011 0000 000d 0000 0009 3131 3134 3032 3041 5900 | ..........1114020AY.
 0000 0500 0000 0133 0018 000c 0000 0008 bfa8 bac5 | .......3............
 d6d8 b8b4 001d 0015 0000 3230 3132 2d31 302d 3331 | ..........2012-10-31
 3a31 373a 3034 3a33 39                            | :17:04:39  

Logdump 4 >n

2012/10/31 17:08:58.914.149 FieldComp            Len    23 RBA 84510103
Name: ICME.ICME_PROJECT_SCORE
After  Image:                                             Partition 4   G  m
 0000 000a 0000 0000 0000 0252 1521 0001 0005 0000 | ...........R.!......
 0001 33                                           | ..3  

Logdump 5 >exit

pos是position的縮寫,意思是定位到replicat啓始的位置,n是next的縮寫,第一個n定位顯示出當前應用的記錄,可以看出是update 還有表的名字,還有image的值,我們要跳過這個事務當然要再輸一個n,可以看到下一個記錄的rba是 84510103絕不是前面RBA簡單的加1.這樣我們就可以修改replicat進程啓動時的rba指定爲84510103

GGSCI (ggsdb) 1> alter replicat ricme, extrba 84510103
REPLICAT altered.
GGSCI (ggsdb) 3> start ricme

Sending START request to MANAGER ...
REPLICAT RICME starting

當然如果還有失敗的事務還可以繼續next用上面的方法,不過如果有幾個連續的事務需要skip,那就可以用另外一個方法

start rep ricme skiptransaction

不過跳過的事務數是未知的,同樣也會記錄到discard文件中,如果參數中配置了reperror default, discard



---end---

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章