GC日志分析 CMS FullGC时长

背景:公司线上服务器,FullGC耗时超过1秒报警。

2019-09-16T11:01:25.287+0800: 9566486.997: [GC (Allocation Failure) 2019-09-16T11:01:25.288+0800: 9566486.997: [ParNew: 1683966K->4564K(1887488K), 0.0238815 secs] 31513
45K->1472607K(3984640K), 0.0245668 secs] [Times: user=0.09 sys=0.00, real=0.02 secs] 
2019-09-16T11:01:25.316+0800: 9566487.025: [GC (CMS Initial Mark) [1 CMS-initial-mark: 1468043K(2097152K)] 1473371K(3984640K), 0.0139600 secs] [Times: user=0.02 sys=0.0
0, real=0.02 secs] 
2019-09-16T11:01:25.330+0800: 9566487.039: [CMS-concurrent-mark-start]
2019-09-16T11:01:26.007+0800: 9566487.716: [CMS-concurrent-mark: 0.666/0.677 secs] [Times: user=0.95 sys=0.09, real=0.67 secs] 
2019-09-16T11:01:26.007+0800: 9566487.717: [CMS-concurrent-preclean-start]
2019-09-16T11:01:26.023+0800: 9566487.733: [CMS-concurrent-preclean: 0.015/0.016 secs] [Times: user=0.02 sys=0.01, real=0.02 secs] 
2019-09-16T11:01:26.023+0800: 9566487.733: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 2019-09-16T11:01:31.115+0800: 9566492.825: [CMS-concurrent-abortable-preclean: 2.679/5.092 secs] [Times: user=3.46 sys=0.32, real=5.09 
secs] 
2019-09-16T11:01:31.119+0800: 9566492.828: [GC (CMS Final Remark) [YG occupancy: 422286 K (1887488 K)]2019-09-16T11:01:31.119+0800: 9566492.829: [GC (CMS Final Remark) 
2019-09-16T11:01:31.120+0800: 9566492.829: [ParNew: 422286K->3266K(1887488K), 0.0261946 secs] 1890329K->1471476K(3984640K), 0.0268209 secs] [Times: user=0.07 sys=0.00, 
real=0.02 secs] 
2019-09-16T11:01:31.146+0800: 9566492.856: [Rescan (parallel) , 0.0102439 secs]2019-09-16T11:01:31.157+0800: 9566492.866: [weak refs processing, 1.6619564 secs]2019-09-
16T11:01:32.819+0800: 9566494.528: [class unloading, 0.1607796 secs]2019-09-16T11:01:32.979+0800: 9566494.689: [scrub symbol table, 0.0317450 secs]2019-09-16T11:01:33.0
11+0800: 9566494.720: [scrub string table, 0.0030630 secs][1 CMS-remark: 1468209K(2097152K)] 1471476K(3984640K), 1.9392917 secs] [Times: user=2.01 sys=0.00, real=1.93 s
ecs] 
2019-09-16T11:01:33.059+0800: 9566494.768: [CMS-concurrent-sweep-start]
2019-09-16T11:01:34.235+0800: 9566495.945: [CMS-concurrent-sweep: 1.056/1.176 secs] [Times: user=1.26 sys=0.00, real=1.18 secs] 
2019-09-16T11:01:34.236+0800: 9566495.945: [CMS-concurrent-reset-start]
2019-09-16T11:01:34.259+0800: 9566495.968: [CMS-concurrent-reset: 0.023/0.023 secs] [Times: user=0.02 sys=0.00, real=0.02 secs] 
2019-09-16T11:03:37.270+0800: 9566618.979: [GC (Allocation Failure) 2019-09-16T11:03:37.270+0800: 9566618.979: [ParNew: 1681090K->4156K(1887488K), 0.0252135 secs] 25335
50K->856782K(3984640K), 0.0259454 secs] [Times: user=0.10 sys=0.00, real=0.02 secs] 

CMS收集器,是基于“标记-清除”算法实现的,运作过程分为4个步骤,包括:

初始标记(CMS initial Mark)
并发标记(CMS concurrent Mark)
重新标记(CMS Remark)
并发清除(CMS concurrent sweep)

其中,初始标记、重新标记这两个步骤是“Stop The World”。

报警内容提示:FullGC 耗时时长 1953ms,根据以上日志,你可以计算出1953的数值么?可以先思考一下。

这里想先引申一个问题:jstat -gcutil

$jstat -gcutil 1837 1000
Warning: Unresolved Symbol: sun.gc.generation.2.space.0.capacity substituted NaN
Warning: Unresolved Symbol: sun.gc.generation.2.space.0.used substituted NaN
Warning: Unresolved Symbol: sun.gc.generation.2.space.0.capacity substituted NaN
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT   
  1.24   0.00   7.22   4.64103180 3024.968     9    6.840 3031.809
  1.24   0.00  16.24   4.64103180 3024.968     9    6.840 3031.809
  1.24   0.00  24.95   4.64103180 3024.968     9    6.840 3031.809
  1.24   0.00  25.43   4.64103180 3024.968     9    6.840 3031.809
  1.24   0.00  25.55   4.64103180 3024.968     9    6.840 3031.809
  1.24   0.00  25.97   4.64103180 3024.968     9    6.840 3031.809
  1.24   0.00  26.37   4.64103180 3024.968     9    6.840 3031.809

根据上图,我们可以看到FGC次数是9次,而我看应用启动之后的FullGC次数只有3次,执行jmap -dump:live,format=b,file=heap-dump.bin <pid> dump文件引发的,与三次CMS。

最后查资料发现一个比较合理的说法,这边的FGC统计的是JVM STW的次数,而一次CMS会在Initial Mark 与Remark 有两次STW。所以最终的FGC = CMS*2 + FullGC-dump。

回归到上面的问题:CMS-FGC时长:GC日志中我们可以看到:

2019-09-16T11:01:25.316+0800: 9566487.025: [GC (CMS Initial Mark) [1 CMS-initial-mark: 1468043K(2097152K)] 1473371K(3984640K), 0.0139600 secs] [Times: user=0.02 sys=0.0
0, real=0.02 secs]

其中

[1 CMS-initial-mark: 1468043K(2097152K)] 1473371K(3984640K), 0.0139600 secs]

意思就是,CMS Initial Mark 初始标记“Stop The World” 0.0139600 secs
初始化标记我们已经获得时间了,那么CMS remark呢。

2019-09-16T11:01:31.119+0800: 9566492.828: [GC (CMS Final Remark) [YG occupancy: 422286 K (1887488 K)]2019-09-16T11:01:31.119+0800: 9566492.829: [GC (CMS Final Remark) 
2019-09-16T11:01:31.120+0800: 9566492.829: [ParNew: 422286K->3266K(1887488K), 0.0261946 secs] 1890329K->1471476K(3984640K), 0.0268209 secs] [Times: user=0.07 sys=0.00, 
real=0.02 secs] 
2019-09-16T11:01:31.146+0800: 9566492.856: [Rescan (parallel) , 0.0102439 secs]2019-09-16T11:01:31.157+0800: 9566492.866: [weak refs processing, 1.6619564 secs]2019-09-
16T11:01:32.819+0800: 9566494.528: [class unloading, 0.1607796 secs]2019-09-16T11:01:32.979+0800: 9566494.689: [scrub symbol table, 0.0317450 secs]2019-09-16T11:01:33.0
11+0800: 9566494.720: [scrub string table, 0.0030630 secs][1 CMS-remark: 1468209K(2097152K)] 1471476K(3984640K), 1.9392917 secs] [Times: user=2.01 sys=0.00, real=1.93 s
ecs]

这边我们看到[GC (CMS Final Remark)...,但是这一行却没有标记耗时,这是为什么呢?

这边就涉及到CMS重新标记的设计了,简单说:

  1. 重新标记之前因为会需要扫描年轻代,为什么会需要扫描年轻代?
  2. 因为年老代对象,会存在被年轻代GC Root可达的情况。(后期待补充)
  3. 扫描年轻代的话,会因为年轻代对象比较多,而耗时较高,所以在扫描之前,最好是可以进行一次年轻代的回收。实际上,在这以前还会做一次CMS-concurrent-abortable-preclean。什么是CMS-concurrent-abortable-preclean?(后期待补充)
  4. 所以,[GC (CMS Final Remark)...仅仅只是代表了,我即将要做重新标记了,而真正重新标记是发生在[Rescan (parallel)这一行中了。
[1 CMS-remark: 1468209K(2097152K)] 1471476K(3984640K), 1.9392917 secs]

那么这里我们可以获取到,CMS Remark 重新标记“Stop The World” 1.9392917 secs
总计:CMS Initial Mark(0.0139600 secs) + CMS Remark(1.9392917 secs)=1953.2517ms

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章