一次艱難的內存泄露排查,BeanUtils的鍋

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"現象"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過jstat -gcutil pid 5000 ,發現fgc次數很多而且頻繁,此時老年代佔比已經大約70%左右,且已經回收不了內存,我們這邊設置的fgc閾值是老年代的70%。此時因爲還有30%的老年空間,所以整體內存相對還算穩定,CPU也比較穩定,但是有很大的潛在的風險,就是內存一直上漲,不釋放。"}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"[service@ZQ-SE-331-V05 ~]$ jstat -gcutil 1087 5000\n S0 S1 E O M CCS YGC YGCT FGC FGCT GCT \n 0.00 55.09 88.41 72.10 92.64 85.22 9223 1169.442 435 168.866 1338.307\n 57.54 0.00 82.24 72.31 92.64 85.22 9224 1169.542 436 168.877 1338.418\n 0.00 63.75 5.33 72.50 92.64 85.22 9225 1169.642 436 168.877 1338.519\n 0.00 63.75 34.02 72.50 92.64 85.22 9225 1169.642 436 168.877 1338.519\n 0.00 63.75 59.26 72.50 92.64 85.22 9225 1169.642 436 168.877 1338.519\n 0.00 63.75 81.37 72.50 92.64 85.22 9225 1169.642 436 168.877 1338.519\n 55.60 0.00 11.75 72.71 92.64 85.22 9226 1169.742 436 168.877 1338.619\n 55.60 0.00 40.07 72.71 92.64 85.22 9226 1169.742 436 168.877 1338.619\n 55.60 0.00 67.86 72.70 92.64 85.22 9226 1169.742 437 169.541 1339.284\n 0.00 56.04 4.21 72.59 92.64 85.22 9227 1169.838 437 169.541 1339.379\n 0.00 56.04 30.01 71.73 92.64 85.22 9227 1169.838 438 169.649 1339.487\n 0.00 56.04 57.75 71.73 92.64 85.22 9227 1169.838 438 169.649 1339.487\n 0.00 56.04 79.01 71.73 92.64 85.22 9227 1169.838 438 169.649 1339.487\n 55.39 0.00 2.54 71.92 92.64 85.22 9228 1169.988 438 169.649 1339.638\n 55.39 0.00 24.70 71.92 92.64 85.22 9228 1169.988 438 169.649 1339.638\n 55.39 0.00 47.89 71.92 92.64 85.22 9228 1169.988 438 169.649 1339.638\n 55.39 0.00 82.01 71.89 92.64 85.22 9228 1169.988 439 170.207 1340.196\n複製代碼"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"初步猜測是出現了內存泄露,通過"},{"type":"codeinline","content":[{"type":"text","text":"jmap -histo/-histo:live pid >> log"}]},{"type":"text","text":"導出fgc前後的histo對比,發現了一個實例數很大的對象"},{"type":"codeinline","content":[{"type":"text","text":"CarnivalOneDayInfo"}]},{"type":"text","text":",達到了2kw級別,而且一直在增加"}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"num #instances #bytes class name\n----------------------------------------------\n 1: 28906840 1387528320 java.util.HashMap\n 2: 38675870 1237627840 java.util.HashMap$Node\n 3: 18631826 745273040 xxx.CarnivalOneDayInfo\n\n num #instances #bytes class name\n----------------------------------------------\n 1: 31092889 1492458672 java.util.HashMap\n 2: 35749328 1143978496 java.util.HashMap$Node\n 3: 20355334 814213360 xxx.CarnivalOneDayInfo\n複製代碼"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"排查"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"直接看CarnivalOneDayInfo(嘉年華)相關代碼,因爲之前的測試大概知道這個問題,所以很快的定位到是在每分鐘的MinuteJob中檢查所有在線玩家的時間活動的時候,相關邏輯會克隆一個CarnivalOneDayInfo。所以初步定位是CarnivalOneDayInfo不斷clone的問題"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"後端y同學看了一下相關邏輯,本地把相關clone邏輯註釋後,再次本地觀察histo,發現該對象實例不在增加,再次確認1的推論,但看了一下活動相關代碼,所有活動的檢查都是clone了一份,爲什麼只有嘉年華活動泄露?看相關代碼,發現clone出來的對象都是臨時對象,應該會被fgc的,如何泄露呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以直接開始操作第一步,準備查"},{"type":"codeinline","content":[{"type":"text","text":"CarnivalOneDayInfo"}]},{"type":"text","text":"是被誰持有引用,查這個問題的話,必須要把內存堆快照dump出來,然後利用工具檢查,如mat。但是線上玩家很多,而且堆內存很多,導出一次很花費時間,會stw,所以直接連開發服務器,查了一下,一樣存在CarnivalOneDayInfo泄露的問題,所以直接利用"},{"type":"codeinline","content":[{"type":"text","text":"jmap -dump:live,format=b,file=2388_heap.bin 2388"}]},{"type":"text","text":",導出開發服務器java進程的堆內存快照。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"直接利用mat打開,對mat使用有經驗的話,操作步驟是"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"選擇dominator_tree"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"搜索CarnivalOneDayInfo"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"List Objects"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"選擇With incoming references,誰持有了它的引用"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"直接發現是被"}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"queue-executor-handler-5\n\njava.lang.ThreadLocal$ThreadLocalMap @ 0x8104eec0\n\njava.lang.ThreadLocal$ThreadLocalMap$Entry[64] @ 0x866710f0\n\njava.lang.ThreadLocal$ThreadLocalMap$Entry @ 0x86671608\n\njava.util.IdentityHashMap @ 0x86671628\n複製代碼"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"java.lang.Object[]"}]},{"type":"text","text":",通過查看這個屬性,可以看到其大量持有了"},{"type":"codeinline","content":[{"type":"text","text":"CarnivalOneDayInfo和HashSet"}]},{"type":"text","text":"從上面可以很確定的是,"},{"type":"codeinline","content":[{"type":"text","text":"CarnivalOneDayInfo"}]},{"type":"text","text":"是直接被邏輯線程的threadlocal持有。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"艱難的分析之路"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"邏輯線程的threadlocal怎麼會持有CarnivalOneDayInfo,從代碼和想法上感覺不可思議,我這邊的第一步是直接在ide中查所有ThreadLocal的引用,發現"}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"logback"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"protobuf"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務自己的threadlocal"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"BeanUtils的threadlocal"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"排查後,簡單測試了一下,和查了一下相關代碼,發現沒有什麼思路和投訴。覺得beanutils和logback有點問題,但是感覺和"},{"type":"codeinline","content":[{"type":"text","text":"CarnivalOneDayInfo"}]},{"type":"text","text":"關係也不大,於是下面我的重點在於想從mat入手,看看能不能從中找出threadlocal的名字,但最終是被證明是徒勞的,只有引用地址,不過我們z同學從是debug的排查思路出發的在"},{"type":"codeinline","content":[{"type":"text","text":"ThreadLocal"}]},{"type":"text","text":"的set和"},{"type":"codeinline","content":[{"type":"text","text":"setInitialValue"}]},{"type":"text","text":"打斷點,然後跑遊戲,重點斷點在MinuteJob,向邏輯線程投遞消息檢查活動狀態這塊每次調用"},{"type":"codeinline","content":[{"type":"text","text":"checkTimeActivity"}]},{"type":"text","text":",都去觀察每個"},{"type":"codeinline","content":[{"type":"text","text":"ThreadLocal$ThreadLocalMap$Entry"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後重點觀察是否出現IdentityHashMap,因爲上面mat分析到了是這個map終於定位到堆棧。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"堆棧"}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MinuJob -> 遍歷在線所有玩家,向邏輯線程投遞消息 -> ActivityManager#checkTimeActivity"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"遍歷所有個人活動 -> CarnivalActivityInfo#checkActivityState"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CarnivalActivityInfo diff = playerInfo.clone() // 在這裏clone了一份"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"checkIsSameState -> activityBaseInfo.getCarnivalDaysMap().equals(carnivalDaysMap) // 調用equals比較"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"去依次比較CarnivalDaysMap中每一個CarnivalOneDayInfo"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"調用CarnivalOneDayInfo -> BaseCarnivalOneDayInfo # equals"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"BeanUtils.isDirty"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"具體代碼"}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"private static Set getDirtyNamesByBean(AugmentedBean augmentedBean) {\n IdentityHashMap> dirtyNamesMap = dirtyNames.get();\n if (dirtyNamesMap == null) {\n dirtyNamesMap = new IdentityHashMap<>();\n dirtyNames.set(dirtyNamesMap);\n }\n return dirtyNamesMap.computeIfAbsent(augmentedBean, k -> Sets.newHashSet());\n }\n\n public static boolean isDirty(AugmentedBean augmentedBean) {\n return getDirtyNamesByBean(augmentedBean).size() > 0;\n }\n複製代碼"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從上面可以非常容易的看到是這裏創建了一個IdentityHashMap,然後set到了threadlocal中和之前的分析如出一轍至此完全定位問題是,是BeanUtils的鍋。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"覆盤和總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其實非常懷疑過beanutils的問題,但是沒有太仔細注意,現在一看,和上面的分析一模一樣,分析內存泄露,思路:"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對比fgc前後的histo,確認那些對象實例數一直在增加,而且明顯偏大"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分析代碼,如果直接定位問題,最好"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果不能直接定位,需要確認是誰持有該對象引用,那麼需要dump堆內存快照"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是不能在線上dump,需要在開發服務器復現(內存泄露通常比較容易復現),然後在開發服務器dump"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"dump出來後,利用mat工具分析泄露,List Objects With incoming references,找到引用"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下一步是需要確認爲什麼這裏會引用,可以分析代碼,解決問題,最好"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果不能,那麼可以使用debug的方式,在上一步引用相關對象的代碼出加斷點,確認線程堆棧"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"也解釋了爲什麼只有嘉年華活動有泄露,因爲只有它調用了beanutils生成的basexx的equals方法,其他都沒有調用,beanutils是當初clone對象的一個解決方案,用來回滾和diff,增量更新,後來該方案廢棄,因爲會隨着對象的複雜度提高而導致clone成本高,但是遺留了一大部分生成的代碼,而這次的bug也是因爲調用了廢棄的生成代碼的方法。所以下一個版本一定將所有生成的廢棄代碼清理一遍"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"後續解決辦法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"發現了beanutils的這個問題後,那麼很容易解決泄露問題了。我寫了一個beanshell腳本,向邏輯線程投遞了消息,調用 BeanUtils.clean,清理所有threadlocals"}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"import x.BeanUtils;\nimport y.HandlerModule;\n\nfor (int i = 1; i <= 16; i++) {\n HandlerModule.instance.addQueueTask(i, new Runnable() {\n public void run() {\n BeanUtils.clean();\n }\n });\n}\n複製代碼"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在用jstat看了一下,CarnivalOneDayInfo經過fgc後,從kw到了百萬,老年代佔用也從70到了30"}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"$ jstat -gcutil 1087 5000\n\n74.73 0.00 16.02 72.48 92.61 85.04 10156 1313.117 575 271.355 1584.472\n 74.73 0.00 34.71 72.48 92.61 85.04 10156 1313.117 575 271.355 1584.472\n 74.73 0.00 54.42 72.48 92.61 85.04 10156 1313.117 575 271.355 1584.472\n 74.73 0.00 73.29 72.48 92.61 85.04 10156 1313.117 575 271.355 1584.472\n 74.73 0.00 89.41 72.48 92.61 85.04 10156 1313.117 575 271.355 1584.472\n 0.00 71.54 9.25 72.74 92.64 85.06 10157 1313.303 576 272.188 1585.492\n 0.00 71.54 28.30 72.73 92.64 85.06 10157 1313.303 577 272.188 1585.492\n 0.00 71.54 55.85 72.73 92.64 85.06 10157 1313.303 577 272.463 1585.766\n 0.00 71.54 78.05 72.73 92.64 85.06 10157 1313.303 577 272.463 1585.766\n 69.21 0.00 1.70 70.98 92.64 85.06 10158 1313.438 578 273.320 1586.758\n 69.21 0.00 19.97 63.09 92.64 85.06 10158 1313.438 578 273.320 1586.758\n 69.21 0.00 39.82 53.33 92.64 85.06 10158 1313.438 578 273.320 1586.758\n 69.21 0.00 59.75 41.61 92.64 85.06 10158 1313.438 578 273.320 1586.758\n 69.21 0.00 75.12 31.79 92.64 85.06 10158 1313.438 578 273.320 1586.758\n 69.21 0.00 94.13 31.79 92.64 85.06 10158 1313.438 578 273.320 1586.758\n 0.00 86.02 15.60 32.07 92.64 85.06 10159 1313.761 578 273.320 1587.081\n 0.00 86.02 94.86 32.07 92.64 85.06 10159 1313.761 578 273.320 1587.081\n\n [service@ZQ-SE-331-V05 config]$ jmap -histo 1087 | grep CarnivalOneDayInfo\n 10: 1408627 56345080 xxx.CarnivalOneDayInfo\n複製代碼"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外後面的優化方案是重構活動代碼,另外即使用equals,也不用之前beanutils生成的類的equals比較,避免beanutils threadlocals的泄露問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"關於內存問題,在正式上線之前,一定要可根據同時在線人數,dau等準確預估整體佔用內存,如一個player的實際的佔用內存,全局靜態數據如排行榜的實際佔用內存等。可以通過代碼和工具獲取。這樣能快速確認是否是出現了內存泄露還是真的比較佔內存。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"看完三件事❤️"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你覺得這篇內容對你還蠻有幫助,我想邀請你幫我三個小忙:"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"點贊,轉發,有你們的 『點贊和評論』,纔是我創造的動力。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"關注公衆號 『 "},{"type":"text","marks":[{"type":"strong"}],"text":"java爛豬皮"},{"type":"text","text":" 』,不定期分享原創知識。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"同時可以期待後續文章ing🚀"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/34/34172ad7f3cc8e0f28bd1fc6ca2d2b68.png","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章