最近爲了恢復生產和經濟,國家希望人們多多消費,爲此出臺了一系列補貼政策,簡單總結就是:
- 買買買
- 走走走
- 喫喫喫
- 玩玩玩
但是本文的議題是:如何使用腳本檢測ActiveMQ的死活?然後重啓。
定義死活的概念
- 死得很徹底:pid都沒了
- 進程還在,只是不能正常使用,比如發佈消息,這種情況有很多原因,比如內存溢出:
2020-06-01 23:37:42,480 | INFO | Ignoring no space left exception, java.io.IOException: Java heap space | org.apache.activemq.util.DefaultIOExceptionHandler | ActiveMQ Journal Checkpoint Worker java.io.IOException: Java heap space at org.apache.activemq.util.IOExceptionSupport.create(IOExceptionSupport.java:40)[activemq-client-5.15.9.jar:5.15.9] at org.apache.activemq.store.kahadb.MessageDatabase$CheckpointRunner.run(MessageDatabase.java:451)[activemq-kahadb-store-5.15.9.jar:5.15.9] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)[:1.8.0_65] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)[:1.8.0_65] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)[:1.8.0_65] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)[:1.8.0_65] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)[:1.8.0_65] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)[:1.8.0_65] at java.lang.Thread.run(Thread.java:745)[:1.8.0_65]
如何檢測這種狀態
實踐證明,ActiveMQ很容易出現【假死】狀態,以下是嘗試過的幾種方式。
通過activemq自帶的檢測接口
基於【jolokia】的HTTP接口來使用JMX。
網上也有很多人提出用這種方法:
curl -u 用戶名:密碼 -s http://localhost:8161/api/jolokia/exec/org.apache.activemq:type=Broker,brokerName=localhost,service=Health/healthStatus | jq -r '.value')"
然後,在我的場景下,客戶端連接時出現Session is closed
問題,但這裏返回正常。
於是,我採用了暴力檢測。
直接通過發消息來檢測
activemq提供了命令行工具,所以可以直接使用:
bin/activemq producer --user xxx --password xxx --destination check_test_queue --messageCount 2
這裏發送2條測試消息到check_test_queue這個隊列,如果正常,最後會成功並退出,否則,會出現下面的錯誤:
INFO | Connecting to URL: failover://tcp://localhost:61616 as user: xxx
INFO | Producing messages to check_test_queue
INFO | Using persistent messages
INFO | Sleeping between sends 0 ms
INFO | Running 1 parallel threads
INFO | Successfully connected to tcp://localhost:61616
WARN | Transport (tcp://localhost:61616) failed , attempting to automatically reconnect: {}
java.io.IOException: Wire format negotiation timeout: peer did not send his wire format.
at org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:99)[activemq-client-5.15.9.jar:5.15.9]
at org.apache.activemq.transport.failover.FailoverTransport.oneway(FailoverTransport.java:668)[activemq-client-5.15.9.jar:5.15.9]
at org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)[activemq-client-5.15.9.jar:5.15.9]
at org.apache.activemq.transport.ResponseCorrelator.asyncRequest(ResponseCorrelator.java:81)[activemq-client-5.15.9.jar:5.15.9]
at org.apache.activemq.transport.ResponseCorrelator.request(ResponseCorrelator.java:86)[activemq-client-5.15.9.jar:5.15.9]
at org.apache.activemq.ActiveMQConnection.syncSendPacket(ActiveMQConnection.java:1392)[activemq-client-5.15.9.jar:5.15.9]
at org.apache.activemq.ActiveMQConnection.ensureConnectionInfoSent(ActiveMQConnection.java:1486)[activemq-client-5.15.9.jar:5.15.9]
at org.apache.activemq.ActiveMQConnection.start(ActiveMQConnection.java:527)[activemq-client-5.15.9.jar:5.15.9]
at org.apache.activemq.console.command.ProducerCommand.runTask(ProducerCommand.java:61)[activemq-console-5.15.9.jar:5.15.9]
at org.apache.activemq.console.command.AbstractCommand.execute(AbstractCommand.java:63)[activemq-console-5.15.9.jar:5.15.9]
at org.apache.activemq.console.command.ShellCommand.runTask(ShellCommand.java:154)[activemq-console-5.15.9.jar:5.15.9]
at org.apache.activemq.console.command.AbstractCommand.execute(AbstractCommand.java:63)[activemq-console-5.15.9.jar:5.15.9]
at org.apache.activemq.console.command.ShellCommand.main(ShellCommand.java:104)[activemq-console-5.15.9.jar:5.15.9]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)[:1.8.0_65]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)[:1.8.0_65]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)[:1.8.0_65]
at java.lang.reflect.Method.invoke(Method.java:497)[:1.8.0_65]
at org.apache.activemq.console.Main.runTaskClass(Main.java:262)[activemq.jar:5.15.9]
at org.apache.activemq.console.Main.main(Main.java:115)[activemq.jar:5.15.9]
因爲會卡住,爲了使他退出,再加個超時參數:
timeout 10 activemq producer --user admin --password 8PpXb8rN --destination check_test_queue --messageCount 2
好,這就是主體內容了。
附錄
一個簡單的檢測腳本:
# 使用命令行生產者命令發送消息,如果10秒沒發送成功,則代表掛了
timeout 10 bin/activemq producer --user xxx --password xxx --destination check_test_queue --messageCount 2
code=$(echo $?)
d=$(date)
if [[ "$code" -ne 0 ]];then
echo "$d:獲取MQ狀態異常"
bin/activemq restart
else
echo "$d:MQ正常"
fi
然後用corntab每5分鐘定時檢測:
*/5 * * * * /apache-activemq-5.15.9/check-mq.sh >> /apache-activemq-5.15.9/check.log 2>&1 &