問題描述:
巡檢的時候發現ogg掛的gg目錄使用率85%了,經排查發現是ogg不能自動刪除trail文件導致目錄使用率告警。
[oracle@host01 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/ggvg-lv_gg 468G 375G 70G 85% /gg
/dev/sda1 194M 33M 152M 18% /boot
gg目錄使用375G,查看trail文件目錄372G
[oracle@host01 gg]$ cd /gg/goldengate/
[oracle@host01 goldengate]$ du -sh *
252K dirchk
372G dirdat
4.0K dirdef
……
查看trail文件詳細信息,發現trial文件還有2號的,
[oracle@host01 dirdat]$ ll
total 389674828
-rw-rw-rw- 1 oracle oinstall 177660920 Aug 22 2017 ec005334
-rw-rw-rw- 1 oracle oinstall 199999595 Dec 2 10:01 fe022410
-rw-rw-rw- 1 oracle oinstall 199999835 Dec 2 10:18 fe022411
-rw-rw-rw- 1 oracle oinstall 199999908 Dec 2 10:35 fe022412
-rw-rw-rw- 1 oracle oinstall 199999956 Dec 2 10:52 fe022413
-rw-rw-rw- 1 oracle oinstall 199999814 Dec 2 11:09 fe022414
-rw-rw-rw- 1 oracle oinstall 199999798 Dec 2 11:25 fe022415
-rw-rw-rw- 1 oracle oinstall 199999882 Dec 2 11:42 fe022416
-rw-rw-rw- 1 oracle oinstall 199999537 Dec 2 11:59 fe022417
-rw-rw-rw- 1 oracle oinstall 199999477 Dec 2 12:16 fe022418
[oracle@host01 dirdat]$ cd ..
[oracle@host01 goldengate]$ ./ggsci
Oracle GoldenGate Command Interpreter for Oracle
Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Apr 23 2012 08:32:14
Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.
GGSCI (host01) 1> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT STOPPED ESS_ONE 00:00:03 672:12:03
REPLICAT STOPPED REP_ESS 00:00:00 672:11:53
REPLICAT STOPPED REP_IT 00:00:00 20614:00:47
REPLICAT RUNNING REP_NEW 00:00:00 00:00:02
REPLICAT STOPPED REP_ORD 00:00:00 20614:00:23
發現問題
查看mgr進程詳細信息,發現有自動刪除的配置。爲什麼沒有自動刪除呢?是不是mgr夯住了?
GGSCI (host01) 2> view param mgr
Port 7839
DynamicPortList 7840-7850
DynamicPortReassignDelay 5
PurgeOldExtracts ./dirdat/ec*, UseCheckpoints, MinKeepDays 8
PurgeOldExtracts ./dirdat/fe*, UseCheckpoints, MinKeepDays 8
-- PurgeOldExtracts ./dirdat2/e2*, UseCheckpoints, MinKeepDays 5
-- PurgeOldExtracts ./dirdat3/e3*, UseCheckpoints, MinKeepDays 5
-- PurgeOldExtracts ./dirdat4/e4*, UseCheckpoints, MinKeepDays 5
-- AutoRestart ER *, Retries 5, WaitMinutes 10, ResetMinutes 60
LagReportHours 1
LagInfoMinutes 3
LagCriticalMinutes 5
嘗試解決
打算重新啓動mgr,先停replicat進程
GGSCI (host01) 6> stop REP_NEW
Sending STOP request to REPLICAT REP_NEW ...
STOP request pending end-of-transaction (936 records so far)..
GGSCI (host01) 7> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT STOPPED ESS_ONE 00:00:03 672:17:31
REPLICAT STOPPED REP_ESS 00:00:00 672:17:22
REPLICAT STOPPED REP_IT 00:00:00 20614:06:16
REPLICAT STOPPED REP_NEW 00:00:03 00:00:03
REPLICAT STOPPED REP_ORD 00:00:00 20614:05:52
GGSCI (host01) 8> stop mgr
Manager process is required by other GGS processes.
Are you sure you want to stop it (y/n)? y
Sending STOP request to MANAGER ...
Request processed.
Manager stopped.
GGSCI (host01) 9> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER STOPPED
REPLICAT STOPPED ESS_ONE 00:00:03 672:17:52
REPLICAT STOPPED REP_ESS 00:00:00 672:17:42
REPLICAT STOPPED REP_IT 00:00:00 20614:06:37
REPLICAT STOPPED REP_NEW 00:00:03 00:00:23
REPLICAT STOPPED REP_ORD 00:00:00 20614:06:12
GGSCI (host01) 10> start mgr
Manager started.
GGSCI (host01) 11> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT STOPPED ESS_ONE 00:00:03 672:18:01
REPLICAT STOPPED REP_ESS 00:00:00 672:17:51
REPLICAT STOPPED REP_IT 00:00:00 20614:06:45
REPLICAT STOPPED REP_NEW 00:00:03 00:00:32
REPLICAT STOPPED REP_ORD 00:00:00 20614:06:21
GGSCI (host01) 12> start REP_NEW
Sending START request to MANAGER ...
REPLICAT REP_NEW starting
GGSCI (host01) 13> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT STOPPED ESS_ONE 00:00:03 672:18:13
REPLICAT STOPPED REP_ESS 00:00:00 672:18:04
REPLICAT STOPPED REP_IT 00:00:00 20614:06:58
REPLICAT RUNNING REP_NEW 00:00:03 00:00:45
REPLICAT STOPPED REP_ORD 00:00:00 20614:06:34
GGSCI (host01) 14> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT STOPPED ESS_ONE 00:00:03 672:18:20
REPLICAT STOPPED REP_ESS 00:00:00 672:18:11
REPLICAT STOPPED REP_IT 00:00:00 20614:07:05
REPLICAT RUNNING REP_NEW 00:00:03 00:00:52
REPLICAT STOPPED REP_ORD 00:00:00 20614:06:41
重啓mgr後發現依舊沒有刪除
[oracle@host01 dirdat]$ ll
total 389776208
-rw-rw-rw- 1 oracle oinstall 177660920 Aug 22 2017 ec005334
-rw-rw-rw- 1 oracle oinstall 199999595 Dec 2 10:01 fe022410
-rw-rw-rw- 1 oracle oinstall 199999835 Dec 2 10:18 fe022411
-rw-rw-rw- 1 oracle oinstall 199999908 Dec 2 10:35 fe022412
-rw-rw-rw- 1 oracle oinstall 199999956 Dec 2 10:52 fe022413
-rw-rw-rw- 1 oracle oinstall 199999814 Dec 2 11:09 fe022414
-rw-rw-rw- 1 oracle oinstall 199999798 Dec 2 11:25 fe022415
-rw-rw-rw- 1 oracle oinstall 199999882 Dec 2 11:42 fe022416
-rw-rw-rw- 1 oracle oinstall 199999537 Dec 2 11:59 fe022417
-rw-rw-rw- 1 oracle oinstall 199999477 Dec 2 12:16 fe022418
-rw-rw-rw- 1 oracle oinstall 199999689 Dec 2 12:32 fe022419
-rw-rw-rw- 1 oracle oinstall 199999651 Dec 2 12:48 fe022420
-rw-rw-rw- 1 oracle oinstall 199999687 Dec 2 13:06 fe022421
-rw-rw-rw- 1 oracle oinstall 199999923 Dec 2 13:24 fe022422
-rw-rw-rw- 1 oracle oinstall 199999838 Dec 2 13:42 fe022423
解決問題
思考了下,之前有暫停了另外一個不再使用的進程,和這個公用trail文件,應該是mgr判斷哪個已經暫停未刪除的進程還需要使用這些trail,所以沒有刪除。既然已經暫停的mgr進程不再使用了,索性就刪除進程。
[oracle@host01 goldengate]$ ./ggsci
Oracle GoldenGate Command Interpreter for Oracle
Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Apr 23 2012 08:32:14
Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.
GGSCI (host01) 1> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT STOPPED ESS_ONE 00:00:03 672:33:19
REPLICAT STOPPED REP_ESS 00:00:00 672:33:10
REPLICAT STOPPED REP_IT 00:00:00 20614:22:04
REPLICAT RUNNING REP_NEW 00:00:03 00:00:02
REPLICAT STOPPED REP_ORD 00:00:00 20614:21:40
登錄ogg
GGSCI (host01) 2> dblogin userid goldengate,password goldengate
Successfully logged into database.
刪除哪兩個已經不再使用的進程
GGSCI (host01) 3> delete ESS_ONE
Deleted REPLICAT ESS_ONE.
GGSCI (host01) 4> delete REP_ESS
Deleted REPLICAT REP_ESS.
GGSCI (host01) 5> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT STOPPED REP_IT 00:00:00 20614:22:49
REPLICAT RUNNING REP_NEW 00:00:03 00:00:00
REPLICAT STOPPED REP_ORD 00:00:00 20614:22:25
然後重啓mgr(應該是立即生效的,df看了一眼沒有空間沒有變化,這裏又重啓了mgr)
[oracle@host01 goldengate]$ ./ggsci
Oracle GoldenGate Command Interpreter for Oracle
Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Apr 23 2012 08:32:14
Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.
GGSCI (host01) 1> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT STOPPED REP_IT 00:00:00 20614:24:00
REPLICAT RUNNING REP_NEW 00:00:03 00:00:01
REPLICAT STOPPED REP_ORD 00:00:00 20614:23:36
GGSCI (host01) 2> stop REP_NEW
Sending STOP request to REPLICAT REP_NEW ...
STOP request pending end-of-transaction (694 records so far)..
GGSCI (host01) 3> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT STOPPED REP_IT 00:00:00 20614:24:12
REPLICAT STOPPED REP_NEW 00:00:04 00:00:03
REPLICAT STOPPED REP_ORD 00:00:00 20614:23:48
GGSCI (host01) 4> stop mgr
Manager process is required by other GGS processes.
Are you sure you want to stop it (y/n)? y
Sending STOP request to MANAGER ...
Request processed.
Manager stopped.
GGSCI (host01) 5> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER STOPPED
REPLICAT STOPPED REP_IT 00:00:00 20614:24:33
REPLICAT STOPPED REP_NEW 00:00:04 00:00:24
REPLICAT STOPPED REP_ORD 00:00:00 20614:24:08
GGSCI (host01) 6> start mgr
Manager started.
GGSCI (host01) 7> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT STOPPED REP_IT 00:00:00 20614:24:42
REPLICAT STOPPED REP_NEW 00:00:04 00:00:34
REPLICAT STOPPED REP_ORD 00:00:00 20614:24:18
GGSCI (host01) 8> start REP_NEW
Sending START request to MANAGER ...
REPLICAT REP_NEW starting
GGSCI (host01) 9> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT STOPPED REP_IT 00:00:00 20614:25:01
REPLICAT RUNNING REP_NEW 00:00:51 00:00:01
REPLICAT STOPPED REP_ORD 00:00:00 20614:24:37
發現已經刪除8天前的trail文件
[oracle@host01 dirdat]$ ls -lrt|more
total 121587316
-rw-rw-rw- 1 oracle oinstall 177660920 Aug 22 2017 ec005334
-rw-rw-rw- 1 oracle oinstall 199999865 Dec 22 10:33 fe023785
-rw-rw-rw- 1 oracle oinstall 199999987 Dec 22 10:51 fe023786
-rw-rw-rw- 1 oracle oinstall 199999700 Dec 22 11:09 fe023787
-rw-rw-rw- 1 oracle oinstall 199999857 Dec 22 11:26 fe023788
-rw-rw-rw- 1 oracle oinstall 199999805 Dec 22 11:44 fe023789
-rw-rw-rw- 1 oracle oinstall 199999972 Dec 22 12:02 fe023790
-rw-rw-rw- 1 oracle oinstall 199999703 Dec 22 12:20 fe023791
-rw-rw-rw- 1 oracle oinstall 199999351 Dec 22 12:38 fe023792
-rw-rw-rw- 1 oracle oinstall 199999885 Dec 22 12:56 fe023793
-rw-rw-rw- 1 oracle oinstall 199999664 Dec 22 13:15 fe023794
-rw-rw-rw- 1 oracle oinstall 199999993 Dec 22 13:32 fe023795
-rw-rw-rw- 1 oracle oinstall 199999713 Dec 22 13:50 fe023796
-rw-rw-rw- 1 oracle oinstall 199999766 Dec 22 14:08 fe023797
-rw-rw-rw- 1 oracle oinstall 199999760 Dec 22 14:26 fe023798
-rw-rw-rw- 1 oracle oinstall 199999629 Dec 22 14:45 fe023799
-rw-rw-rw- 1 oracle oinstall 199999904 Dec 22 15:03 fe023800
[oracle@host01 dirdat]$ cd ..
[oracle@host01 goldengate]$ du -sh *
188K dirchk
116G dirdat
4.0K dirdef
驗證
[oracle@host01 /]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg00-lv_root 9.9G 6.4G 3.0G 69% /
tmpfs 32G 294M 32G 1% /dev/shm
/dev/mapper/vg00-lv_oracle 30G 18G 11G 65% /oracle
/dev/mapper/ggvg-lv_gg 468G 119G 326G 27% /gg