GC日誌分析 CMS FullGC時長

背景:公司線上服務器,FullGC耗時超過1秒報警。

2019-09-16T11:01:25.287+0800: 9566486.997: [GC (Allocation Failure) 2019-09-16T11:01:25.288+0800: 9566486.997: [ParNew: 1683966K->4564K(1887488K), 0.0238815 secs] 31513
45K->1472607K(3984640K), 0.0245668 secs] [Times: user=0.09 sys=0.00, real=0.02 secs] 
2019-09-16T11:01:25.316+0800: 9566487.025: [GC (CMS Initial Mark) [1 CMS-initial-mark: 1468043K(2097152K)] 1473371K(3984640K), 0.0139600 secs] [Times: user=0.02 sys=0.0
0, real=0.02 secs] 
2019-09-16T11:01:25.330+0800: 9566487.039: [CMS-concurrent-mark-start]
2019-09-16T11:01:26.007+0800: 9566487.716: [CMS-concurrent-mark: 0.666/0.677 secs] [Times: user=0.95 sys=0.09, real=0.67 secs] 
2019-09-16T11:01:26.007+0800: 9566487.717: [CMS-concurrent-preclean-start]
2019-09-16T11:01:26.023+0800: 9566487.733: [CMS-concurrent-preclean: 0.015/0.016 secs] [Times: user=0.02 sys=0.01, real=0.02 secs] 
2019-09-16T11:01:26.023+0800: 9566487.733: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 2019-09-16T11:01:31.115+0800: 9566492.825: [CMS-concurrent-abortable-preclean: 2.679/5.092 secs] [Times: user=3.46 sys=0.32, real=5.09 
secs] 
2019-09-16T11:01:31.119+0800: 9566492.828: [GC (CMS Final Remark) [YG occupancy: 422286 K (1887488 K)]2019-09-16T11:01:31.119+0800: 9566492.829: [GC (CMS Final Remark) 
2019-09-16T11:01:31.120+0800: 9566492.829: [ParNew: 422286K->3266K(1887488K), 0.0261946 secs] 1890329K->1471476K(3984640K), 0.0268209 secs] [Times: user=0.07 sys=0.00, 
real=0.02 secs] 
2019-09-16T11:01:31.146+0800: 9566492.856: [Rescan (parallel) , 0.0102439 secs]2019-09-16T11:01:31.157+0800: 9566492.866: [weak refs processing, 1.6619564 secs]2019-09-
16T11:01:32.819+0800: 9566494.528: [class unloading, 0.1607796 secs]2019-09-16T11:01:32.979+0800: 9566494.689: [scrub symbol table, 0.0317450 secs]2019-09-16T11:01:33.0
11+0800: 9566494.720: [scrub string table, 0.0030630 secs][1 CMS-remark: 1468209K(2097152K)] 1471476K(3984640K), 1.9392917 secs] [Times: user=2.01 sys=0.00, real=1.93 s
ecs] 
2019-09-16T11:01:33.059+0800: 9566494.768: [CMS-concurrent-sweep-start]
2019-09-16T11:01:34.235+0800: 9566495.945: [CMS-concurrent-sweep: 1.056/1.176 secs] [Times: user=1.26 sys=0.00, real=1.18 secs] 
2019-09-16T11:01:34.236+0800: 9566495.945: [CMS-concurrent-reset-start]
2019-09-16T11:01:34.259+0800: 9566495.968: [CMS-concurrent-reset: 0.023/0.023 secs] [Times: user=0.02 sys=0.00, real=0.02 secs] 
2019-09-16T11:03:37.270+0800: 9566618.979: [GC (Allocation Failure) 2019-09-16T11:03:37.270+0800: 9566618.979: [ParNew: 1681090K->4156K(1887488K), 0.0252135 secs] 25335
50K->856782K(3984640K), 0.0259454 secs] [Times: user=0.10 sys=0.00, real=0.02 secs] 

CMS收集器,是基於“標記-清除”算法實現的,運作過程分爲4個步驟,包括:

初始標記(CMS initial Mark)
併發標記(CMS concurrent Mark)
重新標記(CMS Remark)
併發清除(CMS concurrent sweep)

其中,初始標記、重新標記這兩個步驟是“Stop The World”。

報警內容提示:FullGC 耗時時長 1953ms,根據以上日誌,你可以計算出1953的數值麼?可以先思考一下。

這裏想先引申一個問題:jstat -gcutil

$jstat -gcutil 1837 1000
Warning: Unresolved Symbol: sun.gc.generation.2.space.0.capacity substituted NaN
Warning: Unresolved Symbol: sun.gc.generation.2.space.0.used substituted NaN
Warning: Unresolved Symbol: sun.gc.generation.2.space.0.capacity substituted NaN
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT   
  1.24   0.00   7.22   4.64103180 3024.968     9    6.840 3031.809
  1.24   0.00  16.24   4.64103180 3024.968     9    6.840 3031.809
  1.24   0.00  24.95   4.64103180 3024.968     9    6.840 3031.809
  1.24   0.00  25.43   4.64103180 3024.968     9    6.840 3031.809
  1.24   0.00  25.55   4.64103180 3024.968     9    6.840 3031.809
  1.24   0.00  25.97   4.64103180 3024.968     9    6.840 3031.809
  1.24   0.00  26.37   4.64103180 3024.968     9    6.840 3031.809

根據上圖,我們可以看到FGC次數是9次,而我看應用啓動之後的FullGC次數只有3次,執行jmap -dump:live,format=b,file=heap-dump.bin <pid> dump文件引發的,與三次CMS。

最後查資料發現一個比較合理的說法,這邊的FGC統計的是JVM STW的次數,而一次CMS會在Initial Mark 與Remark 有兩次STW。所以最終的FGC = CMS*2 + FullGC-dump。

迴歸到上面的問題:CMS-FGC時長:GC日誌中我們可以看到:

2019-09-16T11:01:25.316+0800: 9566487.025: [GC (CMS Initial Mark) [1 CMS-initial-mark: 1468043K(2097152K)] 1473371K(3984640K), 0.0139600 secs] [Times: user=0.02 sys=0.0
0, real=0.02 secs]

其中

[1 CMS-initial-mark: 1468043K(2097152K)] 1473371K(3984640K), 0.0139600 secs]

意思就是,CMS Initial Mark 初始標記“Stop The World” 0.0139600 secs
初始化標記我們已經獲得時間了,那麼CMS remark呢。

2019-09-16T11:01:31.119+0800: 9566492.828: [GC (CMS Final Remark) [YG occupancy: 422286 K (1887488 K)]2019-09-16T11:01:31.119+0800: 9566492.829: [GC (CMS Final Remark) 
2019-09-16T11:01:31.120+0800: 9566492.829: [ParNew: 422286K->3266K(1887488K), 0.0261946 secs] 1890329K->1471476K(3984640K), 0.0268209 secs] [Times: user=0.07 sys=0.00, 
real=0.02 secs] 
2019-09-16T11:01:31.146+0800: 9566492.856: [Rescan (parallel) , 0.0102439 secs]2019-09-16T11:01:31.157+0800: 9566492.866: [weak refs processing, 1.6619564 secs]2019-09-
16T11:01:32.819+0800: 9566494.528: [class unloading, 0.1607796 secs]2019-09-16T11:01:32.979+0800: 9566494.689: [scrub symbol table, 0.0317450 secs]2019-09-16T11:01:33.0
11+0800: 9566494.720: [scrub string table, 0.0030630 secs][1 CMS-remark: 1468209K(2097152K)] 1471476K(3984640K), 1.9392917 secs] [Times: user=2.01 sys=0.00, real=1.93 s
ecs]

這邊我們看到[GC (CMS Final Remark)...,但是這一行卻沒有標記耗時,這是爲什麼呢?

這邊就涉及到CMS重新標記的設計了,簡單說:

  1. 重新標記之前因爲會需要掃描年輕代,爲什麼會需要掃描年輕代?
  2. 因爲年老代對象,會存在被年輕代GC Root可達的情況。(後期待補充)
  3. 掃描年輕代的話,會因爲年輕代對象比較多,而耗時較高,所以在掃描之前,最好是可以進行一次年輕代的回收。實際上,在這以前還會做一次CMS-concurrent-abortable-preclean。什麼是CMS-concurrent-abortable-preclean?(後期待補充)
  4. 所以,[GC (CMS Final Remark)...僅僅只是代表了,我即將要做重新標記了,而真正重新標記是發生在[Rescan (parallel)這一行中了。
[1 CMS-remark: 1468209K(2097152K)] 1471476K(3984640K), 1.9392917 secs]

那麼這裏我們可以獲取到,CMS Remark 重新標記“Stop The World” 1.9392917 secs
總計:CMS Initial Mark(0.0139600 secs) + CMS Remark(1.9392917 secs)=1953.2517ms

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章