ogg不能自動刪除trail文件解決一例

問題描述:

巡檢的時候發現ogg掛的gg目錄使用率85%了,經排查發現是ogg不能自動刪除trail文件導致目錄使用率告警。

[oracle@host01 ~]$ df -h

Filesystem              Size  Used Avail Use% Mounted on
/dev/mapper/ggvg-lv_gg  468G  375G   70G  85% /gg
/dev/sda1               194M   33M  152M  18% /boot


gg目錄使用375G,查看trail文件目錄372G

[oracle@host01 gg]$ cd /gg/goldengate/
[oracle@host01 goldengate]$ du -sh *
252K    dirchk
372G    dirdat
4.0K    dirdef
……

查看trail文件詳細信息,發現trial文件還有2號的,
[oracle@host01 dirdat]$ ll
total 389674828
-rw-rw-rw- 1 oracle oinstall 177660920 Aug 22  2017 ec005334
-rw-rw-rw- 1 oracle oinstall 199999595 Dec  2 10:01 fe022410
-rw-rw-rw- 1 oracle oinstall 199999835 Dec  2 10:18 fe022411
-rw-rw-rw- 1 oracle oinstall 199999908 Dec  2 10:35 fe022412
-rw-rw-rw- 1 oracle oinstall 199999956 Dec  2 10:52 fe022413
-rw-rw-rw- 1 oracle oinstall 199999814 Dec  2 11:09 fe022414
-rw-rw-rw- 1 oracle oinstall 199999798 Dec  2 11:25 fe022415
-rw-rw-rw- 1 oracle oinstall 199999882 Dec  2 11:42 fe022416
-rw-rw-rw- 1 oracle oinstall 199999537 Dec  2 11:59 fe022417
-rw-rw-rw- 1 oracle oinstall 199999477 Dec  2 12:16 fe022418

[oracle@host01 dirdat]$ cd ..
[oracle@host01 goldengate]$ ./ggsci
Oracle GoldenGate Command Interpreter for Oracle
Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Apr 23 2012 08:32:14
Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.
GGSCI (host01) 1> info all
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
MANAGER     RUNNING                                           
REPLICAT    STOPPED     ESS_ONE     00:00:03      672:12:03   
REPLICAT    STOPPED     REP_ESS     00:00:00      672:11:53   
REPLICAT    STOPPED     REP_IT      00:00:00      20614:00:47 
REPLICAT    RUNNING     REP_NEW     00:00:00      00:00:02    
REPLICAT    STOPPED     REP_ORD     00:00:00      20614:00:23 

發現問題

查看mgr進程詳細信息,發現有自動刪除的配置。爲什麼沒有自動刪除呢?是不是mgr夯住了?
GGSCI (host01) 2> view param mgr
Port 7839
DynamicPortList 7840-7850
DynamicPortReassignDelay 5
PurgeOldExtracts ./dirdat/ec*, UseCheckpoints, MinKeepDays 8
PurgeOldExtracts ./dirdat/fe*, UseCheckpoints, MinKeepDays 8
-- PurgeOldExtracts ./dirdat2/e2*, UseCheckpoints, MinKeepDays 5
-- PurgeOldExtracts ./dirdat3/e3*, UseCheckpoints, MinKeepDays 5
-- PurgeOldExtracts ./dirdat4/e4*, UseCheckpoints, MinKeepDays 5
-- AutoRestart ER *, Retries 5, WaitMinutes 10, ResetMinutes 60

LagReportHours 1
LagInfoMinutes 3
LagCriticalMinutes 5

嘗試解決

打算重新啓動mgr,先停replicat進程

GGSCI (host01) 6> stop REP_NEW
Sending STOP request to REPLICAT REP_NEW ...

STOP request pending end-of-transaction (936 records so far)..

GGSCI (host01) 7> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt
MANAGER     RUNNING                                           
REPLICAT    STOPPED     ESS_ONE     00:00:03      672:17:31   
REPLICAT    STOPPED     REP_ESS     00:00:00      672:17:22   
REPLICAT    STOPPED     REP_IT      00:00:00      20614:06:16 
REPLICAT    STOPPED     REP_NEW     00:00:03      00:00:03    
REPLICAT    STOPPED     REP_ORD     00:00:00      20614:05:52 

GGSCI (host01) 8> stop mgr
Manager process is required by other GGS processes.
Are you sure you want to stop it (y/n)? y
Sending STOP request to MANAGER ...
Request processed.
Manager stopped.
GGSCI (host01) 9> info all
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
MANAGER     STOPPED                                           
REPLICAT    STOPPED     ESS_ONE     00:00:03      672:17:52   
REPLICAT    STOPPED     REP_ESS     00:00:00      672:17:42   
REPLICAT    STOPPED     REP_IT      00:00:00      20614:06:37 
REPLICAT    STOPPED     REP_NEW     00:00:03      00:00:23    
REPLICAT    STOPPED     REP_ORD     00:00:00      20614:06:12 
GGSCI (host01) 10> start mgr
Manager started.
GGSCI (host01) 11> info all
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
MANAGER     RUNNING                                           
REPLICAT    STOPPED     ESS_ONE     00:00:03      672:18:01   
REPLICAT    STOPPED     REP_ESS     00:00:00      672:17:51   
REPLICAT    STOPPED     REP_IT      00:00:00      20614:06:45 
REPLICAT    STOPPED     REP_NEW     00:00:03      00:00:32    
REPLICAT    STOPPED     REP_ORD     00:00:00      20614:06:21 
GGSCI (host01) 12> start REP_NEW
Sending START request to MANAGER ...
REPLICAT REP_NEW starting
GGSCI (host01) 13> info all
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
MANAGER     RUNNING                                           
REPLICAT    STOPPED     ESS_ONE     00:00:03      672:18:13   
REPLICAT    STOPPED     REP_ESS     00:00:00      672:18:04   
REPLICAT    STOPPED     REP_IT      00:00:00      20614:06:58 
REPLICAT    RUNNING     REP_NEW     00:00:03      00:00:45    
REPLICAT    STOPPED     REP_ORD     00:00:00      20614:06:34 
GGSCI (host01) 14> info all
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
MANAGER     RUNNING                                           
REPLICAT    STOPPED     ESS_ONE     00:00:03      672:18:20   
REPLICAT    STOPPED     REP_ESS     00:00:00      672:18:11   
REPLICAT    STOPPED     REP_IT      00:00:00      20614:07:05 
REPLICAT    RUNNING     REP_NEW     00:00:03      00:00:52    
REPLICAT    STOPPED     REP_ORD     00:00:00      20614:06:41 
重啓mgr後發現依舊沒有刪除
[oracle@host01 dirdat]$ ll
total 389776208
-rw-rw-rw- 1 oracle oinstall 177660920 Aug 22  2017 ec005334
-rw-rw-rw- 1 oracle oinstall 199999595 Dec  2 10:01 fe022410
-rw-rw-rw- 1 oracle oinstall 199999835 Dec  2 10:18 fe022411
-rw-rw-rw- 1 oracle oinstall 199999908 Dec  2 10:35 fe022412
-rw-rw-rw- 1 oracle oinstall 199999956 Dec  2 10:52 fe022413
-rw-rw-rw- 1 oracle oinstall 199999814 Dec  2 11:09 fe022414
-rw-rw-rw- 1 oracle oinstall 199999798 Dec  2 11:25 fe022415
-rw-rw-rw- 1 oracle oinstall 199999882 Dec  2 11:42 fe022416
-rw-rw-rw- 1 oracle oinstall 199999537 Dec  2 11:59 fe022417
-rw-rw-rw- 1 oracle oinstall 199999477 Dec  2 12:16 fe022418
-rw-rw-rw- 1 oracle oinstall 199999689 Dec  2 12:32 fe022419
-rw-rw-rw- 1 oracle oinstall 199999651 Dec  2 12:48 fe022420
-rw-rw-rw- 1 oracle oinstall 199999687 Dec  2 13:06 fe022421
-rw-rw-rw- 1 oracle oinstall 199999923 Dec  2 13:24 fe022422
-rw-rw-rw- 1 oracle oinstall 199999838 Dec  2 13:42 fe022423

解決問題

思考了下,之前有暫停了另外一個不再使用的進程,和這個公用trail文件,應該是mgr判斷哪個已經暫停未刪除的進程還需要使用這些trail,所以沒有刪除。既然已經暫停的mgr進程不再使用了,索性就刪除進程。
[oracle@host01 goldengate]$ ./ggsci
Oracle GoldenGate Command Interpreter for Oracle
Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Apr 23 2012 08:32:14
Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.

GGSCI (host01) 1> info all
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
MANAGER     RUNNING                                           
REPLICAT    STOPPED     ESS_ONE     00:00:03      672:33:19   
REPLICAT    STOPPED     REP_ESS     00:00:00      672:33:10   
REPLICAT    STOPPED     REP_IT      00:00:00      20614:22:04 
REPLICAT    RUNNING     REP_NEW     00:00:03      00:00:02    
REPLICAT    STOPPED     REP_ORD     00:00:00      20614:21:40 

登錄ogg
GGSCI (host01) 2> dblogin userid goldengate,password goldengate
Successfully logged into database.
刪除哪兩個已經不再使用的進程
GGSCI (host01) 3> delete ESS_ONE
Deleted REPLICAT ESS_ONE.
GGSCI (host01) 4> delete REP_ESS
Deleted REPLICAT REP_ESS.
GGSCI (host01) 5> info all
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
MANAGER     RUNNING                                           
REPLICAT    STOPPED     REP_IT      00:00:00      20614:22:49 
REPLICAT    RUNNING     REP_NEW     00:00:03      00:00:00    
REPLICAT    STOPPED     REP_ORD     00:00:00      20614:22:25 
然後重啓mgr(應該是立即生效的,df看了一眼沒有空間沒有變化,這裏又重啓了mgr)
[oracle@host01 goldengate]$ ./ggsci
Oracle GoldenGate Command Interpreter for Oracle
Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Apr 23 2012 08:32:14
Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.
GGSCI (host01) 1> info all
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
MANAGER     RUNNING                                           
REPLICAT    STOPPED     REP_IT      00:00:00      20614:24:00 
REPLICAT    RUNNING     REP_NEW     00:00:03      00:00:01    
REPLICAT    STOPPED     REP_ORD     00:00:00      20614:23:36 
GGSCI (host01) 2> stop REP_NEW
Sending STOP request to REPLICAT REP_NEW ...
STOP request pending end-of-transaction (694 records so far)..
GGSCI (host01) 3> info all
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
MANAGER     RUNNING                                           
REPLICAT    STOPPED     REP_IT      00:00:00      20614:24:12 
REPLICAT    STOPPED     REP_NEW     00:00:04      00:00:03    
REPLICAT    STOPPED     REP_ORD     00:00:00      20614:23:48 
GGSCI (host01) 4> stop mgr
Manager process is required by other GGS processes.
Are you sure you want to stop it (y/n)? y
Sending STOP request to MANAGER ...
Request processed.
Manager stopped.
GGSCI (host01) 5> info all
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
MANAGER     STOPPED                                           
REPLICAT    STOPPED     REP_IT      00:00:00      20614:24:33 
REPLICAT    STOPPED     REP_NEW     00:00:04      00:00:24    
REPLICAT    STOPPED     REP_ORD     00:00:00      20614:24:08 
GGSCI (host01) 6> start mgr
Manager started.
GGSCI (host01) 7> info all
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
MANAGER     RUNNING                                           
REPLICAT    STOPPED     REP_IT      00:00:00      20614:24:42 
REPLICAT    STOPPED     REP_NEW     00:00:04      00:00:34    
REPLICAT    STOPPED     REP_ORD     00:00:00      20614:24:18 
GGSCI (host01) 8> start REP_NEW
Sending START request to MANAGER ...
REPLICAT REP_NEW starting
GGSCI (host01) 9> info all
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
MANAGER     RUNNING                                           
REPLICAT    STOPPED     REP_IT      00:00:00      20614:25:01 
REPLICAT    RUNNING     REP_NEW     00:00:51      00:00:01    
REPLICAT    STOPPED     REP_ORD     00:00:00      20614:24:37 

 

發現已經刪除8天前的trail文件
[oracle@host01 dirdat]$ ls -lrt|more
total 121587316
-rw-rw-rw- 1 oracle oinstall 177660920 Aug 22  2017 ec005334
-rw-rw-rw- 1 oracle oinstall 199999865 Dec 22 10:33 fe023785
-rw-rw-rw- 1 oracle oinstall 199999987 Dec 22 10:51 fe023786
-rw-rw-rw- 1 oracle oinstall 199999700 Dec 22 11:09 fe023787
-rw-rw-rw- 1 oracle oinstall 199999857 Dec 22 11:26 fe023788
-rw-rw-rw- 1 oracle oinstall 199999805 Dec 22 11:44 fe023789
-rw-rw-rw- 1 oracle oinstall 199999972 Dec 22 12:02 fe023790
-rw-rw-rw- 1 oracle oinstall 199999703 Dec 22 12:20 fe023791
-rw-rw-rw- 1 oracle oinstall 199999351 Dec 22 12:38 fe023792
-rw-rw-rw- 1 oracle oinstall 199999885 Dec 22 12:56 fe023793
-rw-rw-rw- 1 oracle oinstall 199999664 Dec 22 13:15 fe023794
-rw-rw-rw- 1 oracle oinstall 199999993 Dec 22 13:32 fe023795
-rw-rw-rw- 1 oracle oinstall 199999713 Dec 22 13:50 fe023796
-rw-rw-rw- 1 oracle oinstall 199999766 Dec 22 14:08 fe023797
-rw-rw-rw- 1 oracle oinstall 199999760 Dec 22 14:26 fe023798
-rw-rw-rw- 1 oracle oinstall 199999629 Dec 22 14:45 fe023799
-rw-rw-rw- 1 oracle oinstall 199999904 Dec 22 15:03 fe023800
[oracle@host01 dirdat]$ cd ..
[oracle@host01 goldengate]$ du -sh *
188K    dirchk
116G    dirdat
4.0K    dirdef

驗證
[oracle@host01 /]$ df -h
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/vg00-lv_root   9.9G  6.4G  3.0G  69% /
tmpfs                      32G  294M   32G   1% /dev/shm
/dev/mapper/vg00-lv_oracle 30G   18G   11G  65% /oracle
/dev/mapper/ggvg-lv_gg     468G  119G  326G  27% /gg
 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章