背景:公司線上服務器,FullGC耗時超過1秒報警。
2019-09-16T11:01:25.287+0800: 9566486.997: [GC (Allocation Failure) 2019-09-16T11:01:25.288+0800: 9566486.997: [ParNew: 1683966K->4564K(1887488K), 0.0238815 secs] 31513
45K->1472607K(3984640K), 0.0245668 secs] [Times: user=0.09 sys=0.00, real=0.02 secs]
2019-09-16T11:01:25.316+0800: 9566487.025: [GC (CMS Initial Mark) [1 CMS-initial-mark: 1468043K(2097152K)] 1473371K(3984640K), 0.0139600 secs] [Times: user=0.02 sys=0.0
0, real=0.02 secs]
2019-09-16T11:01:25.330+0800: 9566487.039: [CMS-concurrent-mark-start]
2019-09-16T11:01:26.007+0800: 9566487.716: [CMS-concurrent-mark: 0.666/0.677 secs] [Times: user=0.95 sys=0.09, real=0.67 secs]
2019-09-16T11:01:26.007+0800: 9566487.717: [CMS-concurrent-preclean-start]
2019-09-16T11:01:26.023+0800: 9566487.733: [CMS-concurrent-preclean: 0.015/0.016 secs] [Times: user=0.02 sys=0.01, real=0.02 secs]
2019-09-16T11:01:26.023+0800: 9566487.733: [CMS-concurrent-abortable-preclean-start]
CMS: abort preclean due to time 2019-09-16T11:01:31.115+0800: 9566492.825: [CMS-concurrent-abortable-preclean: 2.679/5.092 secs] [Times: user=3.46 sys=0.32, real=5.09
secs]
2019-09-16T11:01:31.119+0800: 9566492.828: [GC (CMS Final Remark) [YG occupancy: 422286 K (1887488 K)]2019-09-16T11:01:31.119+0800: 9566492.829: [GC (CMS Final Remark)
2019-09-16T11:01:31.120+0800: 9566492.829: [ParNew: 422286K->3266K(1887488K), 0.0261946 secs] 1890329K->1471476K(3984640K), 0.0268209 secs] [Times: user=0.07 sys=0.00,
real=0.02 secs]
2019-09-16T11:01:31.146+0800: 9566492.856: [Rescan (parallel) , 0.0102439 secs]2019-09-16T11:01:31.157+0800: 9566492.866: [weak refs processing, 1.6619564 secs]2019-09-
16T11:01:32.819+0800: 9566494.528: [class unloading, 0.1607796 secs]2019-09-16T11:01:32.979+0800: 9566494.689: [scrub symbol table, 0.0317450 secs]2019-09-16T11:01:33.0
11+0800: 9566494.720: [scrub string table, 0.0030630 secs][1 CMS-remark: 1468209K(2097152K)] 1471476K(3984640K), 1.9392917 secs] [Times: user=2.01 sys=0.00, real=1.93 s
ecs]
2019-09-16T11:01:33.059+0800: 9566494.768: [CMS-concurrent-sweep-start]
2019-09-16T11:01:34.235+0800: 9566495.945: [CMS-concurrent-sweep: 1.056/1.176 secs] [Times: user=1.26 sys=0.00, real=1.18 secs]
2019-09-16T11:01:34.236+0800: 9566495.945: [CMS-concurrent-reset-start]
2019-09-16T11:01:34.259+0800: 9566495.968: [CMS-concurrent-reset: 0.023/0.023 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
2019-09-16T11:03:37.270+0800: 9566618.979: [GC (Allocation Failure) 2019-09-16T11:03:37.270+0800: 9566618.979: [ParNew: 1681090K->4156K(1887488K), 0.0252135 secs] 25335
50K->856782K(3984640K), 0.0259454 secs] [Times: user=0.10 sys=0.00, real=0.02 secs]
CMS收集器,是基於“標記-清除”算法實現的,運作過程分爲4個步驟,包括:
初始標記(CMS initial Mark)
併發標記(CMS concurrent Mark)
重新標記(CMS Remark)
併發清除(CMS concurrent sweep)
其中,初始標記、重新標記這兩個步驟是“Stop The World”。
報警內容提示:FullGC 耗時時長 1953ms,根據以上日誌,你可以計算出1953的數值麼?可以先思考一下。
這裏想先引申一個問題:jstat -gcutil
$jstat -gcutil 1837 1000
Warning: Unresolved Symbol: sun.gc.generation.2.space.0.capacity substituted NaN
Warning: Unresolved Symbol: sun.gc.generation.2.space.0.used substituted NaN
Warning: Unresolved Symbol: sun.gc.generation.2.space.0.capacity substituted NaN
S0 S1 E O P YGC YGCT FGC FGCT GCT
1.24 0.00 7.22 4.64 � 103180 3024.968 9 6.840 3031.809
1.24 0.00 16.24 4.64 � 103180 3024.968 9 6.840 3031.809
1.24 0.00 24.95 4.64 � 103180 3024.968 9 6.840 3031.809
1.24 0.00 25.43 4.64 � 103180 3024.968 9 6.840 3031.809
1.24 0.00 25.55 4.64 � 103180 3024.968 9 6.840 3031.809
1.24 0.00 25.97 4.64 � 103180 3024.968 9 6.840 3031.809
1.24 0.00 26.37 4.64 � 103180 3024.968 9 6.840 3031.809
根據上圖,我們可以看到FGC次數是9次,而我看應用啓動之後的FullGC次數只有3次,執行jmap -dump:live,format=b,file=heap-dump.bin <pid>
dump文件引發的,與三次CMS。
最後查資料發現一個比較合理的說法,這邊的FGC統計的是JVM STW的次數,而一次CMS會在Initial Mark 與Remark 有兩次STW。所以最終的FGC = CMS*2 + FullGC-dump。
迴歸到上面的問題:CMS-FGC時長:GC日誌中我們可以看到:
2019-09-16T11:01:25.316+0800: 9566487.025: [GC (CMS Initial Mark) [1 CMS-initial-mark: 1468043K(2097152K)] 1473371K(3984640K), 0.0139600 secs] [Times: user=0.02 sys=0.0
0, real=0.02 secs]
其中
[1 CMS-initial-mark: 1468043K(2097152K)] 1473371K(3984640K), 0.0139600 secs]
意思就是,CMS Initial Mark 初始標記“Stop The World” 0.0139600 secs
初始化標記我們已經獲得時間了,那麼CMS remark呢。
2019-09-16T11:01:31.119+0800: 9566492.828: [GC (CMS Final Remark) [YG occupancy: 422286 K (1887488 K)]2019-09-16T11:01:31.119+0800: 9566492.829: [GC (CMS Final Remark)
2019-09-16T11:01:31.120+0800: 9566492.829: [ParNew: 422286K->3266K(1887488K), 0.0261946 secs] 1890329K->1471476K(3984640K), 0.0268209 secs] [Times: user=0.07 sys=0.00,
real=0.02 secs]
2019-09-16T11:01:31.146+0800: 9566492.856: [Rescan (parallel) , 0.0102439 secs]2019-09-16T11:01:31.157+0800: 9566492.866: [weak refs processing, 1.6619564 secs]2019-09-
16T11:01:32.819+0800: 9566494.528: [class unloading, 0.1607796 secs]2019-09-16T11:01:32.979+0800: 9566494.689: [scrub symbol table, 0.0317450 secs]2019-09-16T11:01:33.0
11+0800: 9566494.720: [scrub string table, 0.0030630 secs][1 CMS-remark: 1468209K(2097152K)] 1471476K(3984640K), 1.9392917 secs] [Times: user=2.01 sys=0.00, real=1.93 s
ecs]
這邊我們看到[GC (CMS Final Remark)...
,但是這一行卻沒有標記耗時,這是爲什麼呢?
這邊就涉及到CMS重新標記的設計了,簡單說:
- 重新標記之前因爲會需要掃描年輕代,爲什麼會需要掃描年輕代?
- 因爲年老代對象,會存在被年輕代GC Root可達的情況。(後期待補充)
- 掃描年輕代的話,會因爲年輕代對象比較多,而耗時較高,所以在掃描之前,最好是可以進行一次年輕代的回收。實際上,在這以前還會做一次
CMS-concurrent-abortable-preclean
。什麼是CMS-concurrent-abortable-preclean
?(後期待補充) - 所以,
[GC (CMS Final Remark)...
僅僅只是代表了,我即將要做重新標記了,而真正重新標記是發生在[Rescan (parallel)
這一行中了。
[1 CMS-remark: 1468209K(2097152K)] 1471476K(3984640K), 1.9392917 secs]
那麼這裏我們可以獲取到,CMS Remark 重新標記“Stop The World” 1.9392917 secs
。
總計:CMS Initial Mark(0.0139600 secs) + CMS Remark(1.9392917 secs)=1953.2517ms