背景
最近排查一個JMX本地連接問題,記錄一下。
我們的啓動腳本在應用啓動後,會通過JMX來動態檢查應用狀態,那麼這裏就需要動態啓動JMX功能了。
動態打開Java進程的JMX端口
通過下面的代碼,可以動態的讓目標進程加載
management-agent
打開JMX功能後,通過獲取
com.sun.management.jmxremote.localConnectorAddress
的Agent Property,可以獲取到JMX URL
public MBeanServerConnection connect(String pid) throws IOException {
String address = attachJmx(pid);
JMXServiceURL serviceURL = new JMXServiceURL(address);
connector = JMXConnectorFactory.connect(serviceURL);
return connector.getMBeanServerConnection();
}
private String attachJmx(String pid) throws IOException {
try {
virtualmachine = VirtualMachine.attach(pid);
} catch (AttachNotSupportedException e) {
throw new IOException(e);
}
String javaHome = virtualmachine.getSystemProperties().getProperty("java.home");
String agentPath = javaHome + File.separator + "jre" + File.separator + "lib" + File.separator
+ "management-agent.jar";
File file = new File(agentPath);
if (!file.exists()) {
agentPath = javaHome + File.separator + "lib" + File.separator + "management-agent.jar";
file = new File(agentPath);
if (!file.exists()) {
throw new IOException("Management agent not found");
}
}
agentPath = file.getCanonicalPath();
try {
virtualmachine.loadAgent(agentPath, "com.sun.management.jmxremote");
} catch (AgentLoadException e) {
throw new IOException(e);
} catch (AgentInitializationException agentinitializationexception) {
throw new IOException(agentinitializationexception);
}
Properties properties = virtualmachine.getAgentProperties();
String address = (String) properties.get("com.sun.management.jmxremote.localConnectorAddress");
virtualmachine.detach();
return address;
}
爲什麼JMX連接會失敗?
在用上面的代碼動態去連接目標進程時,拋出了下面的異常:
java.rmi.ConnectException: Connection refused to host: 11.164.235.11; nested exception is:
java.net.ConnectException: Connection refused
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619)
at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216)
at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:130)
at java.rmi.server.RemoteObjectInvocationHandler.invokeRemoteMethod(RemoteObjectInvocationHandler.java:227)
at java.rmi.server.RemoteObjectInvocationHandler.invoke(RemoteObjectInvocationHandler.java:179)
at com.sun.proxy.$Proxy0.newClient(Unknown Source)
at javax.management.remote.rmi.RMIConnector.getConnection(RMIConnector.java:2430)
at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:308)
at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270)
at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:229)
at com.test.jmx.JmxLocalConnector.connect(JmxLocalConnector.java:28)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at java.net.Socket.connect(Socket.java:538)
at java.net.Socket.<init>(Socket.java:434)
at java.net.Socket.<init>(Socket.java:211)
at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:40)
at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:148)
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:613)
... 13 more
檢查本機IP是
11.164.234.171
爲什麼rmi連接的是一個外部的IP
11.164.235.11
?
通過調試,發現management-agent
加載成功了,localConnectorAddress
的值是:
jmx:rmi://127.0.0.1/stub/rO0ABXN9AAAAAQAlamF2YXgubWFuYWdlbWVudC5yZW1vdGUucm1pLlJNSVNlcnZlcnhyABdqYXZhLmxhbmcucmVmbGVjdC5Qcm94eeEn2iDMEEPLAgABTAABaHQAJUxqYXZhL2xhbmcvcmVmbGVjdC9JbnZvY2F0aW9uSGFuZGxlcjt4cHNyAC1qYXZhLnJtaS5zZXJ2ZXIuUmVtb3RlT2JqZWN0SW52b2NhdGlvbkhhbmRsZXIAAAAAAAAAAgIAAHhyABxqYXZhLnJtaS5zZXJ2ZXIuUmVtb3RlT2JqZWN002G0kQxhMx4DAAB4cHc4AAtVbmljYXN0UmVmMgAADTExLjE2NC4yMzUuMTEAAIfoCEScYyGQodFlwEdFAAABawK/zE6AAQB4
爲什麼顯示的是127.0.0.1
,但實際連接的是11.164.235.11
?是不是在連接時出的問題?
再仔細調試,發現
jmx是獲取到stub後的字符串
做base64解密,再通過
ObjectInputStream
解析readObject
得到RMIServer
對象來連接的。
//javax.management.remote.rmi.RMIConnector.findRMIServer(JMXServiceURL, Map<String, Object>)
//--------------------------------------------------------------------
// Private stuff - RMIServer creation
//--------------------------------------------------------------------
private RMIServer findRMIServer(JMXServiceURL directoryURL,
Map<String, Object> environment)
throws NamingException, IOException {
final boolean isIiop = RMIConnectorServer.isIiopURL(directoryURL,true);
if (isIiop) {
// Make sure java.naming.corba.orb is in the Map.
environment.put(EnvHelp.DEFAULT_ORB,resolveOrb(environment));
}
String path = directoryURL.getURLPath();
int end = path.indexOf(';');
if (end < 0) end = path.length();
if (path.startsWith("/jndi/"))
return findRMIServerJNDI(path.substring(6,end), environment, isIiop);
else if (path.startsWith("/stub/"))
return findRMIServerJRMP(path.substring(6,end), environment, isIiop);
else if (path.startsWith("/ior/")) {
if (!IIOPHelper.isAvailable())
throw new IOException("iiop protocol not available");
return findRMIServerIIOP(path.substring(5,end), environment, isIiop);
} else {
final String msg = "URL path must begin with /jndi/ or /stub/ " +
"or /ior/: " + path;
throw new MalformedURLException(msg);
}
}
private RMIServer findRMIServerJRMP(String base64, Map<String, ?> env, boolean isIiop)
throws IOException {
// could forbid "iiop:" URL here -- but do we need to?
final byte[] serialized;
try {
serialized = base64ToByteArray(base64);
} catch (IllegalArgumentException e) {
throw new MalformedURLException("Bad BASE64 encoding: " +
e.getMessage());
}
final ByteArrayInputStream bin = new ByteArrayInputStream(serialized);
final ClassLoader loader = EnvHelp.resolveClientClassLoader(env);
final ObjectInputStream oin =
(loader == null) ?
new ObjectInputStream(bin) :
new ObjectInputStreamWithLoader(bin, loader);
final Object stub;
try {
stub = oin.readObject();
} catch (ClassNotFoundException e) {
throw new MalformedURLException("Class not found: " + e);
}
return (RMIServer)stub;
}
通過代碼處理,發現
rO0ABXN9AAAAAQAlamF2YXgubWFuYWdlbWVudC5yZW1vdGUucm1pLlJNSVNlcnZlcnhyABdqYXZhLmxhbmcucmVmbGVjdC5Qcm94eeEn2iDMEEPLAgABTAABaHQAJUxqYXZhL2xhbmcvcmVmbGVjdC9JbnZvY2F0aW9uSGFuZGxlcjt4cHNyAC1qYXZhLnJtaS5zZXJ2ZXIuUmVtb3RlT2JqZWN0SW52b2NhdGlvbkhhbmRsZXIAAAAAAAAAAgIAAHhyABxqYXZhLnJtaS5zZXJ2ZXIuUmVtb3RlT2JqZWN002G0kQxhMx4DAAB4cHc4AAtVbmljYXN0UmVmMgAADTExLjE2NC4yMzUuMTEAAIfoCEScYyGQodFlwEdFAAABawK/zE6AAQB4
轉換爲了:
RMIServerImpl_Stub[UnicastRef2 [liveRef: [endpoint:[11.164.235.11:26449](remote),objID:[-5ddae53d:16b0887d710:-7fff, 7209064096623493021]]]]
可見RMI Server的IP的確是11.164.235.11
。
那麼現在問題變成了:
爲什麼JVM動態加載了
management-agent
,得到的JMX URL是指向外部IP的?
通過heap dump定位IP字符串
但是調試management-agent
的加載過程可能會比較痛苦,於是考慮從別的地方入手。
從上面的調查裏,發現management-agent
啓動之後,11.164.235.11
這個外部IP就會出現在JVM內存裏,那麼考慮用heap dump的方式來定位。
通過執行heap dump,再用jvisualvm
來分析。
用OQL來搜索所有包含11.164.235.11
的String:
select s from java.lang.String s where s.toString().equals("11.164.235.11")
可以發現有好幾個結果:
再依次點開,查看引用,發現其中一個引用的字段名是localHost
:
因此可以猜測:是不是localHost域名解析有問題?
定位localHost域名解析問題
執行hostname命令,得到機器名,再ping一下:
$hostname
web-app201641.we42
$ping web-app201641.we42
PING web-app201641.we42 (11.164.235.11) 56(84) bytes of data.
發現本機被解析到11.164.235.11
了,但是本機的IP是11.164.234.171
:
$ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 11.164.234.171 netmask 255.255.255.0 broadcast 11.164.234.255
到這裏,大概猜到原因了,檢查下 /etc/hosts
文件,果然發現有配置:
11.164.235.11 web-app201641.we42
把這個錯誤的host配置去掉之後,再執行jmx連接終於成功了。
爲什麼會有錯誤的hosts配置呢?據說是機器遷移時遺留的。
總結
動態JMX連接的工作原理:
讓目標
VirtualMachine
動態加載management-agent
從Agent Properties裏獲取到JMX連接地址:
com.sun.management.jmxremote.localConnectorAddress
JMX URL裏帶
stub
的字符串,實際上是base64轉換爲byte[],再用ObjectInputStream
轉換爲RMIServer
JMX實際上是通過RMI來連接的
排查問題的關鍵:
定位錯誤連接的IP
heap dump
用OQL從heap dump裏查找IP字符串,再查看相關的引用來獲取信息
鏈接
ViauslVM
Object Query Language (OQL)
上期抽獎結果
參與抽獎的人數是9,用日期做了一個簡單的除法:20190611/9=2243401
所以中獎的同學是打賞的第2,和第4位。但是因爲第4位是同事,所以跳過,最終結果是打賞的2, 3位。
沒有中獎的同學可以加我微信退款,或者在公衆號發消息給我。因爲微信消息超過48小時之後不能聯繫了,所以要你們主動聯繫我。謝謝大家的參與,謝謝機械出版社提供的《微服務架構設計模式》。