我們現在用OGG做兩個ORACLE OLTP數據庫的A-A同步含DDL,剛發現Replicat進程ABENDING了,下面分析一下原因
ggserr.log日誌
2012-10-31 17:09:05 WARNING OGG-00869 Oracle GoldenGate Delivery for Oracle, ricme.prm: OCI Error ORA-02292: integrity constraint (ICME.FK_NOPROSCORE_TO_STU) violated - child record found (status = 2292). UPDATE "ICME"."ICME_STUDENT" SET "IC_CODE" = :a1,"REMARK" = :a2,"MODIFY_TIME" = :a3 WHERE "IC_CODE" = :b0.
2012-10-31 17:09:05 WARNING OGG-01004 Oracle GoldenGate Delivery for Oracle, ricme.prm: Aborted grouped transaction on 'ICME.ICME_STUDENT', Database error 2292 (OCI Error ORA-02292: integrity constraint (ICME.FK_NOPROSCORE_TO_STU) violated - child record found (status = 2292). UPDATE "ICME"."ICME_STUDENT" SET "IC_CODE" = :a1,"REMARK" = :a2,"MODIFY_TIME" = :a3 WHERE "IC_CODE" = :b0).
2012-10-31 17:09:05 WARNING OGG-01003 Oracle GoldenGate Delivery for Oracle, ricme.prm: Repositioning to rba 84509907 in seqno 40.
2012-10-31 17:09:05 WARNING OGG-01154 Oracle GoldenGate Delivery for Oracle, ricme.prm: SQL error 2292 mapping ICME.ICME_STUDENT to ICME.ICME_STUDENT OCI Error ORA-02292: integrity constraint (ICME.FK_NOPROSCORE_TO_STU) violated - child record found (status = 2292). UPDATE "ICME"."ICME_STUDENT" SET "IC_CODE" = :a1,"REMARK" = :a2,"MODIFY_TIME" = :a3 WHERE "IC_CODE" = :b0.
2012-10-31 17:09:05 WARNING OGG-01003 Oracle GoldenGate Delivery for Oracle, ricme.prm: Repositioning to rba 84509907 in seqno 40.
2012-10-31 17:09:05 ERROR OGG-01296 Oracle GoldenGate Delivery for Oracle, ricme.prm: Error mapping from ICME.ICME_STUDENT to ICME.ICME_STUDENT.
2012-10-31 17:09:05 ERROR OGG-01668 Oracle GoldenGate Delivery for Oracle, ricme.prm: PROCESS ABENDING.
在日誌中能看出大概SQL,我的replicat group配置文件配置了DiscardFile 記錄了image
[oracle@ggsdb dirrpt]$ vi ricme.dsc
OCI Error ORA-02292: integrity constraint (ICME.FK_NOPROSCORE_TO_STU) violated – child record found (status = 2292). UPDATE “ICME”.”ICME_STUDENT” SET “IC_COD
E” = :a1,”REMARK” = :a2,”MODIFY_TIME” = :a3 WHERE “IC_CODE” = :b0
Aborting transaction on dirdat/l2 beginning at seqno 40 rba 84509907
error at seqno 40 rba 84509907
Problem replicating ICME.ICME_STUDENT to ICME.ICME_STUDENT
Mapping problem with compressed key update record (target format)…
*
IC_CODE = 1114020AY
IC_CODE = 3
REMARK =
000000: bf a8 ba c5 d6 d8 b8 b4
看到這個sql,我確認了修改內容,問了下同事果然是失誤操作,修改了學員卡號,而那個卡號上是有trigger,會級連修改好多相關表,而且有外鍵約束,但從庫上的trigger是disable的,所以就遇到了外鍵約束導致備庫更新失敗,不過後來同事又修改回來了,數據上在主庫是還原了的,那我可以來跳過此事務
首先先找到replicat進程當前應用到的rba,也就是csn(commit sequence number),在oracle中的scn,來定位下次應用的起始RBA,它就是在trail文件中下一次replicat 進程將要fseek() call 並起動進程的位置(actual byte position )
GGSCI (ggsdb) 4> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING REPLICAT ABENDED RICME 00:00:00 00:29:41 GGSCI (ggsdb) 5> info rep ricme REPLICAT RICME Last Started 2012-10-31 17:23 Status ABENDED Checkpoint Lag 00:00:00 (updated 00:29:47 ago) Log Read Checkpoint File dirdat/l2000040 2012-10-31 17:08:56.879106 RBA 84509907
通過上面的信息我們知道了replicat進程ricme group 下在應用到了dirdat/l2000040的RBA 84509907,我們想跳過這個事務應用下一條記錄就可以,但是可不是簡單的在當前的RBA上加1,RBA必須是有OGG格式過的,如果輸入的是無效地址啓動後EXCEPTION會記錄到ggserr.log中,我們可以用OGG安裝目錄下的logdump工具來定位下一條記錄的“真正”位置
[oracle@ggsdb ogg11r2]$ ./logdump
Oracle GoldenGate Log File Dump Utility for Oracle
Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230
Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.
Logdump 1 >open dirdat/l2000040
Current LogTrail is /oracle/ogg11r2/dirdat/l2000040
Logdump 2 >pos 84509907
Reading forward from RBA 84509907
Logdump 3 >n
2012/10/31 17:08:58.914.149 GGSPKUpdate Len 69 RBA 84509907
Name: ICME.ICME_STUDENT
After Image: Partition 4 G b
0011 0000 000d 0000 0009 3131 3134 3032 3041 5900 | ..........1114020AY.
0000 0500 0000 0133 0018 000c 0000 0008 bfa8 bac5 | .......3............
d6d8 b8b4 001d 0015 0000 3230 3132 2d31 302d 3331 | ..........2012-10-31
3a31 373a 3034 3a33 39 | :17:04:39
Logdump 4 >n
2012/10/31 17:08:58.914.149 FieldComp Len 23 RBA 84510103
Name: ICME.ICME_PROJECT_SCORE
After Image: Partition 4 G m
0000 000a 0000 0000 0000 0252 1521 0001 0005 0000 | ...........R.!......
0001 33 | ..3
Logdump 5 >exit
pos是position的縮寫,意思是定位到replicat啓始的位置,n是next的縮寫,第一個n定位顯示出當前應用的記錄,可以看出是update 還有表的名字,還有image的值,我們要跳過這個事務當然要再輸一個n,可以看到下一個記錄的rba是 84510103絕不是前面RBA簡單的加1.這樣我們就可以修改replicat進程啓動時的rba指定爲84510103
GGSCI (ggsdb) 1> alter replicat ricme, extrba 84510103
REPLICAT altered.
GGSCI (ggsdb) 3> start ricme
Sending START request to MANAGER ...
REPLICAT RICME starting
當然如果還有失敗的事務還可以繼續next用上面的方法,不過如果有幾個連續的事務需要skip,那就可以用另外一個方法
start rep ricme skiptransaction
不過跳過的事務數是未知的,同樣也會記錄到discard文件中,如果參數中配置了reperror default, discard
---end---