如何檢測ActiveMQ的死活

最近爲了恢復生產和經濟,國家希望人們多多消費,爲此出臺了一系列補貼政策,簡單總結就是:

  • 買買買
  • 走走走
  • 喫喫喫
  • 玩玩玩

但是本文的議題是:如何使用腳本檢測ActiveMQ的死活?然後重啓。

定義死活的概念

  1. 死得很徹底:pid都沒了
  2. 進程還在,只是不能正常使用,比如發佈消息,這種情況有很多原因,比如內存溢出:
    2020-06-01 23:37:42,480 | INFO  | Ignoring no space left exception, java.io.IOException: Java heap space | org.apache.activemq.util.DefaultIOExceptionHandler | ActiveMQ Journal Checkpoint Worker
    java.io.IOException: Java heap space
    	at org.apache.activemq.util.IOExceptionSupport.create(IOExceptionSupport.java:40)[activemq-client-5.15.9.jar:5.15.9]
    	at org.apache.activemq.store.kahadb.MessageDatabase$CheckpointRunner.run(MessageDatabase.java:451)[activemq-kahadb-store-5.15.9.jar:5.15.9]
    	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)[:1.8.0_65]
    	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)[:1.8.0_65]
    	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)[:1.8.0_65]
    	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)[:1.8.0_65]
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)[:1.8.0_65]
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)[:1.8.0_65]
    	at java.lang.Thread.run(Thread.java:745)[:1.8.0_65]
    

如何檢測這種狀態

實踐證明,ActiveMQ很容易出現【假死】狀態,以下是嘗試過的幾種方式。

通過activemq自帶的檢測接口

基於【jolokia】的HTTP接口來使用JMX。

網上也有很多人提出用這種方法:

curl -u 用戶名:密碼 -s http://localhost:8161/api/jolokia/exec/org.apache.activemq:type=Broker,brokerName=localhost,service=Health/healthStatus | jq -r '.value')"

然後,在我的場景下,客戶端連接時出現Session is closed問題,但這裏返回正常。

於是,我採用了暴力檢測。

直接通過發消息來檢測

activemq提供了命令行工具,所以可以直接使用:

bin/activemq producer --user xxx --password xxx --destination check_test_queue --messageCount 2

這裏發送2條測試消息到check_test_queue這個隊列,如果正常,最後會成功並退出,否則,會出現下面的錯誤:

 INFO | Connecting to URL: failover://tcp://localhost:61616 as user: xxx
 INFO | Producing messages to check_test_queue
 INFO | Using persistent messages
 INFO | Sleeping between sends 0 ms
 INFO | Running 1 parallel threads
 INFO | Successfully connected to tcp://localhost:61616
 WARN | Transport (tcp://localhost:61616) failed , attempting to automatically reconnect: {}
java.io.IOException: Wire format negotiation timeout: peer did not send his wire format.
	at org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:99)[activemq-client-5.15.9.jar:5.15.9]
	at org.apache.activemq.transport.failover.FailoverTransport.oneway(FailoverTransport.java:668)[activemq-client-5.15.9.jar:5.15.9]
	at org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)[activemq-client-5.15.9.jar:5.15.9]
	at org.apache.activemq.transport.ResponseCorrelator.asyncRequest(ResponseCorrelator.java:81)[activemq-client-5.15.9.jar:5.15.9]
	at org.apache.activemq.transport.ResponseCorrelator.request(ResponseCorrelator.java:86)[activemq-client-5.15.9.jar:5.15.9]
	at org.apache.activemq.ActiveMQConnection.syncSendPacket(ActiveMQConnection.java:1392)[activemq-client-5.15.9.jar:5.15.9]
	at org.apache.activemq.ActiveMQConnection.ensureConnectionInfoSent(ActiveMQConnection.java:1486)[activemq-client-5.15.9.jar:5.15.9]
	at org.apache.activemq.ActiveMQConnection.start(ActiveMQConnection.java:527)[activemq-client-5.15.9.jar:5.15.9]
	at org.apache.activemq.console.command.ProducerCommand.runTask(ProducerCommand.java:61)[activemq-console-5.15.9.jar:5.15.9]
	at org.apache.activemq.console.command.AbstractCommand.execute(AbstractCommand.java:63)[activemq-console-5.15.9.jar:5.15.9]
	at org.apache.activemq.console.command.ShellCommand.runTask(ShellCommand.java:154)[activemq-console-5.15.9.jar:5.15.9]
	at org.apache.activemq.console.command.AbstractCommand.execute(AbstractCommand.java:63)[activemq-console-5.15.9.jar:5.15.9]
	at org.apache.activemq.console.command.ShellCommand.main(ShellCommand.java:104)[activemq-console-5.15.9.jar:5.15.9]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)[:1.8.0_65]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)[:1.8.0_65]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)[:1.8.0_65]
	at java.lang.reflect.Method.invoke(Method.java:497)[:1.8.0_65]
	at org.apache.activemq.console.Main.runTaskClass(Main.java:262)[activemq.jar:5.15.9]
	at org.apache.activemq.console.Main.main(Main.java:115)[activemq.jar:5.15.9]

因爲會卡住,爲了使他退出,再加個超時參數:

timeout 10 activemq producer --user admin --password 8PpXb8rN --destination check_test_queue --messageCount 2

好,這就是主體內容了。

附錄

一個簡單的檢測腳本:

# 使用命令行生產者命令發送消息,如果10秒沒發送成功,則代表掛了
timeout 10 bin/activemq producer --user xxx --password xxx --destination check_test_queue --messageCount 2
code=$(echo $?)
d=$(date)

if [[ "$code" -ne 0 ]];then
	echo "$d:獲取MQ狀態異常"	
	bin/activemq restart
else
	echo "$d:MQ正常"
fi

然後用corntab每5分鐘定時檢測:

*/5 * * * * /apache-activemq-5.15.9/check-mq.sh >> /apache-activemq-5.15.9/check.log 2>&1 &
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章