记一次真实的JVM性能调优过程

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"背景","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最近对负责的项目进行了一次性能优化,其中包括对 JVM 参数的调整,算是进行了一次简单的 JVM 调优,JVM 参数调整之后,服务的整体性能有 5% 左右的提升,还算不错。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"先介绍一下项目的基本情况:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"项目是一个高 QPS 压力的 web 服务,单机 QPS 一直维持在 1.5K 以上,由于旧机器的”拖累”,配置的堆大小是 8G,其中 young 区是 4G,垃圾回收器用的是 parNew + CMS。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"旧状","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先是查看当前 GC 的情况,主要是使用 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"jstat","attrs":{}}],"attrs":{}},{"type":"text","text":" 查看 GC 的概况,再查看 gc log,分析单次 gc 的详细状况。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"jstat -gcutil pid 1000","attrs":{}}],"attrs":{}},{"type":"text","text":" 每隔一秒打印一次 gc 统计信息。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/48/482f59488d6ff8b4bf897a71d2867c01.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以看到,单次 gc 平均耗时是 60ms 左右,还算可以接受,但 YGC 非常频繁,基本上每秒一次,有的时候还会一秒两次,在一秒两次的时候,服务对业务响应时长的压力就会变得很大。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接着查看 gc log,打印 gc log 需要在 JVM 启动参数里添加以下参数:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"-XX:+PrintGCDateStamps","attrs":{}}],"attrs":{}},{"type":"text","text":":打印 gc 发生的时间戳。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"-XX:+PrintTenuringDistribution","attrs":{}}],"attrs":{}},{"type":"text","text":":打印 gc 发生时的分代信息。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"-XX:+PrintGCApplicationStoppedTime","attrs":{}}],"attrs":{}},{"type":"text","text":":打印 gc 停顿时长","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"-XX:+PrintGCApplicationConcurrentTime","attrs":{}}],"attrs":{}},{"type":"text","text":":打印 gc 间隔的服务运行时长","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"-XX:+PrintGCDetails","attrs":{}}],"attrs":{}},{"type":"text","text":":打印 gc 详情,包括 gc 前/内存等。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"-Xloggc:../gclogs/gc.log.date","attrs":{}}],"attrs":{}},{"type":"text","text":":指定 gc log 的路径","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"看到的 gc log 形如:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a2/a2e58d625d5ead0c18f3e9c6596b0e0e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"单次 GC 方面并不能直接看出问题,但可以看到 gc 前有很多次 18ms 左右的停顿。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"分析和调整","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"YGC 频繁","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"直接查看 gc log 并不直观,我们可以借用一些可视化工具来帮助我们分析, ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"[gceasy](https://gceasy.io/)","attrs":{}}],"attrs":{}},{"type":"text","text":" 是个挺不错的网站,我们把 gc log 上传上去后, gceasy 可以帮助我们生成各个维度的图表帮助分析。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"查看 gceasy 生成的报告,发现我们服务的 gc 吞吐量是 95%,它指的是 JVM 运行业务代码的时长占 JVM 总运行时长的比例,这个比例确实有些低了,运行 100 分钟就有 5 分钟在执行 gc。幸好这些 GC 中绝大多数都是 YGC,单次时长可控且分布平均,这使得我们服务还能平稳运行。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解决这个问题要么是减少对象的创建,要么就增大 young 区。前者不是一时半会儿都解决的,需要查找代码里可能有问题的点,分步优化。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而后者虽然改一下配置就行,但以我们对 GC 最直观的印象来说,增大 young 区,YGC 的时长也会迅速增大。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其实这点不必太过担心,我们知道 YGC 的耗时是由 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"GC 标记 + GC 复制","attrs":{}}],"attrs":{}},{"type":"text","text":" 组成的,相对于 GC 复制,GC 标记是非常快的。而 young 区内大多数对象的生命周期都非常短,如果将 young 区增大一倍,GC 标记的时长会提升一倍,但到 GC 发生时被标记的对象大部分已经死亡, GC 复制的时长肯定不会提升一倍,所以我们可以放心增大 young 区大小。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由于低内存旧机器都被换掉了,我把堆大小调整到了 12G,young 区保留为 8G。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"分代调整","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了 GC 太频繁之外,GC 后各分代的平均大小也需要调整。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/04/04d1975979779f267253a6f1e161c0d9.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们知道 GC 的提升机制,每次 GC 后,JVM 存活代数大于 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"MaxTenuringThreshold","attrs":{}}],"attrs":{}},{"type":"text","text":" 的对象提升到老年代。当然,JVM 还有动态年龄计算的规则:按照年龄从小到大对其所占用的大小进行累积,当累积的某个年龄大小超过了 survivor 区的一半时,取这个年龄和 MaxTenuringThreshold 中更小的一个值,作为新的晋升年龄阈值,但看各代总的内存大小,是达不到 survivor 区的一半的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/84/8439f6fbff6f7b9a9a66ab82c58af9fd.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以这十五个分代内的对象会一直在两个 survivor 区之间来回复制,再观察各分代的平均大小,可以看到,四代以上的对象已经有一半都会保留到老年区了,所以可以将这些对象直接提升到老年代,以减少对象在两个 survivor 区之间复制的性能开销。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以我把 MaxTenuringThreshold 的值调整为 4,将存活超过四代的对象直接提升到老年代。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"偏向锁停顿","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"还有一个问题是 gc log 里有很多 18ms 左右的停顿,有时候连续有十多条,虽然每次停顿时长不长,但连续多次累积的时间也非常可观。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这个问题在我之前的文章有讲过,这里就不赘述了。1.8 之后 JVM 对锁进行了优化,添加了偏向锁的概念,避免了很多不必要的加锁操作,但偏向锁一旦遇到锁竞争,取消锁需要进入 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"safe point","attrs":{}}],"attrs":{}},{"type":"text","text":",导致 STW。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解决方式很简单,JVM 启动参数里添加 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"-XX:-UseBiasedLocking","attrs":{}}],"attrs":{}},{"type":"text","text":" 即可。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"结果","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"调整完 JVM 参数后先是对服务进行压测,发现性能确实有提升,也没有发生严重的 GC 问题,之后再把调整好的配置放到线上机器进行灰度,同时收集 gc log,再次进行分析。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由于 young 区大小翻倍了,所以 YGC 的频率减半了,GC 的吞量提升到了 97.75%。 平均 GC 时长略有上升,从 60ms 左右提升到了 66ms,还是挺符合预期的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由于 CMS 在进行 GC 时也会清理 young 区,CMS 的时长也受到了影响,CMS 的最终标记和并发清理阶段耗时增加了,也比较正常。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外我还统计了对业务的影响,之前因为 GC 导致超时的请求大大减少了。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"小结","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"总之,这是一次挺成功的 GC 调整,让我对 GC 有了更深的理解,但由于没有深入到 old 区,之前学习到的 CMS 相关的知识还没有复习到。不过性能优化并不是一朝一夕的事,需要时刻关注问题,及时做出调整。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我把自己的调优经验整理成了一本PDF,也收集了一些关于调优的书籍的PDF,都分享给大伙,需要的同学可以点击","attrs":{}},{"type":"link","attrs":{"href":"https://u.wechat.com/MMP-94sa6p25rlWtZmr06ys","title":"","type":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"JVM调优笔记与学习书籍","attrs":{}}]},{"type":"text","text":"领取。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"关于本文有什么疑问可以在下面留言交流,如果您觉得本文对您有帮助,欢迎关注。","attrs":{}}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章