JVM Crash一般會生成core.pid文件和hs_err_pidXXXX.log。
打開hs_err_pidXXXX.log文件 一般有如下內容:
A fatal error has been detected by the Java Runtime Environment:
#
# SIGBUS (0x7) at pc=0x00007fb7006c6f31, pid=8864, tid=140421610395392
#
# JRE version: 6.0_20-b02
# Java VM: Java HotSpot(TM) 64-Bit Server VM (16.3-b01 mixed mode linux-amd64 )
# Problematic frame:
# C [libzip.so+0xaf31]
#
# If you would like to submit a bug report, please visit:
# http://java.sun.com/webapps/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
其中紅色標識的C表示執行Native code的時候出現問題。
查找C [libzip.so+0xaf31]
可以看到其下部分提示如下:
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J java.util.zip.ZipFile.getEntry(JLjava/lang/String;Z)J
J sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;
J sun.misc.URLClassPath$JarLoader.findResource(Ljava/lang/String;Z)Ljava/net/URL;
j sun.misc.URLClassPath$1.next()Z+42
j sun.misc.URLClassPath$1.hasMoreElements()Z+1
j java.net.URLClassLoader$3$1.run()Ljava/lang/Object;+7
v ~StubRoutines::call_stub
j java.security.AccessController.doPrivileged(Ljava/security/PrivilegedAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0
j java.net.URLClassLoader$3.next()Z+24
j java.net.URLClassLoader$3.hasMoreElements()Z+1
j sun.misc.CompoundEnumeration.next()Z+33
j sun.misc.CompoundEnumeration.hasMoreElements()Z+1
j org.apache.hadoop.mapred.JobConf.findContainingJar(Ljava/lang/Class;)Ljava/lang/String;+42
j org.apache.hadoop.mapred.JobConf.setJarByClass(Ljava/lang/Class;)V+1
j org.apache.hadoop.mapreduce.Job.setJarByClass(Ljava/lang/Class;)V+5
j com.panguso.recommend.mapred.usermodel.task.UserModelTask.run()V+53
j com.panguso.recommend.common.mission.AbstractMission.unitJob()V+198
j com.panguso.recommend.common.mission.AbstractMission.access$100(Lcom/panguso/recommend/common/mission/AbstractMission;)V+1
j com.panguso.recommend.common.mission.AbstractMission$1.run()V+39
j java.util.TimerThread.mainLoop()V+221
j java.util.TimerThread.run()V+1
v ~StubRoutines::call_stub
定位到自己的代碼,發現代碼沒有問題。則定位到jar包。想到jar在運行時發生過替換。斷定可能由於替換導致程序發生變化,使得JVM找不到相關代碼出現問題。
確認階段:查看相關類似問題,在https://forums.oracle.com/forums/thread.jspa?threadID=1540064發現一個類似問題。通過答覆可以確認此問題由無法訪問到相關jar或原類產生。
至此,問題定位。原因也明瞭了。
總結一下:JVM crash後 ,避免查看core.pid文件。直接分析hs_error_pidXXXX.log文件。通過其中的異常信息定位分析原因,斷定可能的問題點,分析驗證即可。
還可參見:http://www.oracle.com/technetwork/java/javase/crashes-137240.html