goldengate REPLICAT進程報錯ORA-26808: Apply process OGG$R_YBGSSJ died unexpectedly.ORA-01843: not a valid month
1、問題描述
環境:goldengate12.2.0.1+oracle12.1.0.2
目標端應用的時候報錯
ORA-26808: Apply process OGG$R_YBGSSJ died unexpectedly.ORA-01843: not a valid month
1、解決過程
1、該報錯在mos上面也沒有找到對應的錯誤,重新啓動次都卡在這裏,陷入僵局了,之後想起來之前有碰到一些怪問題是和進程建的模式有關係,查看該進程發現是INTEGRATED模式。
GGSCI (PBYKJHD1) 64> info r_ybgssj
REPLICAT R_YBGSSJ Last Started 2019-12-06 14:37 Status RUNNING
INTEGRATED
Checkpoint Lag 00:00:00 (updated 00:01:13 ago)
Process ID 28496
Log Read Checkpoint File ./dirdat/ybgs/sj000003319
2019-12-06 14:36:52.723981 RBA 19659053
2、將該進程重新創建成經典模式,然後將對列文件讀取修改成當前的隊列重新啓動。
GGSCI (PBYKJHD1 as oggadmin@pbykj2/GD_SHENGJZ1) 6> delete r_ybgssj
GGSCI (PBYKJHD1 as oggadmin@pbykj2/GD_SHENGJZ1) 6> ADD REPLICAT r_ybgssj, EXTTRAIL ./dirdat/ybgs/sj, checkpointtable oggadmin.checkpointtable
GGSCI (PBYKJHD1 as oggadmin@pbykj2/GD_SHENGJZ1) 6> alter REPLICAT r_ybgssj extseqno 3319,extrba 19659053
3、使用經典模式之後,進程正常運行跑數了,正高興的時候進程又ABENDED了,再次查看報錯,發現是lc99插入數據報的錯, ORA-01843: not a valid month無效月份。
4、這個報錯在最開始INTEGRATED模式時候也有提示,但是那時候沒有其它提示,無法進行故障處理,現在報錯之後我們知道是lc99的表,然後查看這個進程的dsc日誌文件(使用命令進行查看more ./dirrpt/r_ybgssj.dsc),可以看到是在插入AAZ801 = 5216684數據的時候報的錯。
5、找到這條數據之後,我們先去源端找到這條數據,準備手動在目標端插入試一下。
這個在源端查出來的記錄,可以看到alc026字段是0000/0/0,查看字段類型爲date。有點蒙圈了,怎麼可能會出現這種無效的日期在生產庫,同步到目標端肯定是插入報錯的。
6、先處理ogg異常,現在找到這條數據之後手動在目標端插入,將alc026留空
7、數據手動處理完成之後,就要更改r_ybgssj進程讀取隊列號,跳過當前事務,讓進程續同步,使用logdump工具查找出隊列文件的下一個rba號,之後依次使用命令alter REPLICAT r_ybgssj extrba 進行修改。
GGSCI (PBYKJHD1 as oggadmin@pbykj2/GD_SHENGJZ1) 44> info r_ybgssj
REPLICAT R_YBGSSJ Last Started 2019-12-06 17:56 Status ABENDED
Checkpoint Lag 91:13:52 (updated 00:02:27 ago)
Log Read Checkpoint File ./dirdat/ybgs/sj000003319
2019-12-02 22:44:51.001075 RBA 19659053
GGSCI (PBYKJHD1 as oggadmin@pbykj2/GD_SHENGJZ1) 45> exit
[oracle@PBYKJHD1:/u01/app/ggs]$ logdump
open
Oracle GoldenGate Log File Dump Utility for Oracle
Version 12.2.0.1.170221 25977542
Copyright (C) 1995, 2017, Oracle and/or its affiliates. All rights reserved.
Logdump 289 > ./dirdat/ybgs/sj000003319
Current LogTrail is /u01/app/ggs/dirdat/ybgs/sj000003321
Logdump 290 >pos 19659053
Reading forward from RBA 19659053
Logdump 291 >n
2019/11/10 12:20:22.697.973 Metadata Len 1254 RBA 19659053
Name: YBGS.YBGS_SJ.AC70
3040 0000 04e0 0100 0002 0001 0200 001e 0100 0006 | 0@..................
0100 0002 0001 0200 0004 0000 0000 0300 0002 0000 | ....................
0400 0002 0000 0340 0000 04b2 0010 0046 0034 0006 | .......@.......F.4..
4141 4130 3730 0040 0000 0003 0000 0003 0000 0003 | AAA070.@............
0000 0000 0000 0000 0000 0000 ffff ffff 0001 0000 | ....................
0000 0000 0000 0000 0022 000e 0000 0000 0000 0000 | ........."..........
0000 0000 0000 0046 0034 0006 4141 4330 3031 0040 | .......F.4..AAC001.@
Logdump 292 >n
2019/12/02 22:45:22.016.188 Insert Len 211 RBA 19660375
Name: YBGS.YBGS_SJ.AC70 (TDR Index: 6)
After Image: Partition 12 G b
0000 0004 ffff 0000 0001 0014 0000 0010 3132 3130 | ................1210
3030 3030 3032 3132 3330 3133 0002 000d 0000 0009 | 000002123013........
e6a2 81e6 9f8f e4bc 9f00 0300 04ff ff00 0000 0400 | ....................
15ff ff31 3930 302d 3031 2d30 313a 3030 3a30 303a | ...1900-01-01:00:00:
3030 0005 0005 0000 0001 3100 0600 1500 0032 3030 | 00........1......200
342d 3037 2d32 393a 3038 3a32 303a 3539 0007 000a | 4-07-29:08:20:59....
0000 0000 0000 0131 cc19 0008 0007 0000 0003 3431 | .......1..........41
Logdump 293 >exit
[oracle@PBYKJHD1:/u01/app/ggs]$ ./ggsci
Oracle GoldenGate Command Interpreter for Oracle
Version 12.2.0.1.170221 25977542_FBO
Linux, x64, 64bit (optimized), Oracle 12c on May 8 2017 12:54:39
Operating system character set identified as UTF-8.
Copyright (C) 1995, 2017, Oracle and/or its affiliates. All rights reserved.
GGSCI (PBYKJHD1) 1> alter REPLICAT r_ybgssj extrba 19660375
2019-12-06 18:01:35 INFO OGG-06594 Replicat R_YBGSSJ has been altered through GGSCI. Even the start up position might be updated, duplicate suppression remains active in next startup. To override duplicate suppression, start R_YBGSSJ with NOFILTERDUPTRANSACTIONS option.
REPLICAT altered.
GGSCI (PBYKJHD1) 2> info r_ybgssj
REPLICAT R_YBGSSJ Initialized 2019-12-06 18:01 Status STOPPED
Checkpoint Lag 00:00:00 (updated 00:00:02 ago)
Log Read Checkpoint File ./dirdat/ybgs/sj000003319
First Record RBA 19660375
GGSCI (PBYKJHD1) 3> start r_ybgssj
Sending START request to MANAGER ...
REPLICAT R_YBGSSJ starting
8、第一次跳過之後啓動還是失敗,報錯的還是同一條記錄,在當前的基礎上第二次修改讀取的rba號之後,可以看到進程已經恢復正常運行。
GGSCI (PBYKJHD1) 7> info r_ybgssj
REPLICAT R_YBGSSJ Last Started 2019-12-06 18:01 Status ABENDED
Checkpoint Lag 00:00:00 (updated 00:02:31 ago)
Log Read Checkpoint File ./dirdat/ybgs/sj000003319
First Record RBA 19660375
GGSCI (PBYKJHD1) 8> exit
[oracle@PBYKJHD1:/u01/app/ggs]$ logdump
Oracle GoldenGate Log File Dump Utility for Oracle
Version 12.2.0.1.170221 25977542
Copyright (C) 1995, 2017, Oracle and/or its affiliates. All rights reserved.
Logdump 293 >open ./dirdat/ybgs/sj000003319
Current LogTrail is /u01/app/ggs/dirdat/ybgs/sj000003319
Logdump 294 >pos 19660375
Reading forward from RBA 19660375
Logdump 295 >n
2019/12/02 22:45:22.016.188 Insert Len 211 RBA 19660375
Name: YBGS.YBGS_SJ.AC70 (TDR Index: 6)
After Image: Partition 12 G b
0000 0004 ffff 0000 0001 0014 0000 0010 3132 3130 | ................1210
3030 3030 3032 3132 3330 3133 0002 000d 0000 0009 | 000002123013........
e6a2 81e6 9f8f e4bc 9f00 0300 04ff ff00 0000 0400 | ....................
15ff ff31 3930 302d 3031 2d30 313a 3030 3a30 303a | ...1900-01-01:00:00:
3030 0005 0005 0000 0001 3100 0600 1500 0032 3030 | 00........1......200
342d 3037 2d32 393a 3038 3a32 303a 3539 0007 000a | 4-07-29:08:20:59....
0000 0000 0000 0131 cc19 0008 0007 0000 0003 3431 | .......1..........41
Logdump 296 >n
2019/12/02 22:45:22.016.188 Insert Len 211 RBA 19660706
Name: YBGS.YBGS_SJ.AC70 (TDR Index: 6)
After Image: Partition 12 G m
0000 0004 ffff 0000 0001 0014 0000 0010 3132 3130 | ................1210
3030 3030 3032 3132 3330 3133 0002 000d 0000 0009 | 000002123013........
e6a2 81e6 9f8f e4bc 9f00 0300 04ff ff00 0000 0400 | ....................
15ff ff31 3930 302d 3031 2d30 313a 3030 3a30 303a | ...1900-01-01:00:00:
3030 0005 0005 0000 0001 3100 0600 1500 0032 3030 | 00........1......200
342d 3037 2d32 393a 3038 3a31 393a 3137 0007 000a | 4-07-29:08:19:17....
0000 0000 0000 0131 cc19 0008 0007 0000 0003 3431 | .......1..........41
Logdump 297 >exit
[oracle@PBYKJHD1:/u01/app/ggs]$ ./ggsci
Oracle GoldenGate Command Interpreter for Oracle
Version 12.2.0.1.170221 25977542_FBO
Linux, x64, 64bit (optimized), Oracle 12c on May 8 2017 12:54:39
Operating system character set identified as UTF-8.
Copyright (C) 1995, 2017, Oracle and/or its affiliates. All rights reserved.
GGSCI (PBYKJHD1) 1> alter REPLICAT r_ybgssj extrba 19660706
2019-12-06 18:04:34 INFO OGG-06594 Replicat R_YBGSSJ has been altered through GGSCI. Even the start up position might be updated, duplicate suppression remains active in next startup. To override duplicate suppression, start R_YBGSSJ with NOFILTERDUPTRANSACTIONS option.
REPLICAT altered.
GGSCI (PBYKJHD1) 2> start r_ybgssj
Sending START request to MANAGER ...
REPLICAT R_YBGSSJ starting
GGSCI (PBYKJHD1) 3> info r_ybgssj
REPLICAT R_YBGSSJ Last Started 2019-12-06 18:04 Status RUNNING
Checkpoint Lag 91:19:27 (updated 00:00:05 ago)
Process ID 13898
Log Read Checkpoint File ./dirdat/ybgs/sj000003319
2019-12-02 22:45:40.999998 RBA 74788662
GGSCI (PBYKJHD1) 4> !
info r_ybgssj
REPLICAT R_YBGSSJ Last Started 2019-12-06 18:04 Status RUNNING
Checkpoint Lag 91:19:27 (updated 00:00:10 ago)
Process ID 13898
Log Read Checkpoint File ./dirdat/ybgs/sj000003319
2019-12-02 22:45:40.999998 RBA 74788662
GGSCI (PBYKJHD1) 5> !
info r_ybgssj
REPLICAT R_YBGSSJ Last Started 2019-12-06 18:04 Status RUNNING
Checkpoint Lag 91:11:31 (updated 00:00:08 ago)
Process ID 13898
Log Read Checkpoint File ./dirdat/ybgs/sj000003322
2019-12-02 22:54:26.999970 RBA 36812398
9、至此恢復了ogg同步,現在回顧之前遇到的問題01843: not a valid month,源端的數據爲麼可以插入0000/0/0的日期,聯繫開發人員去確認。自己在百度找了一遍文章有這樣的描述https://blog.csdn.net/cnm123456001/article/details/100414273。可以通過轉換插入成功。