本地啓動RocketMQ未映射主機名產生的超時問題

問題描述

參考RocketMQ官方文檔在本地啓動一個驗證環境的時候遇到超時報錯問題。

本地環境OS:CentOS Linux release 8.5.2111

首先,進入到RocketMQ安裝目錄,如:~/opt/rocketmq-all-5.2.0-bin-release

執行如下命令啓動NameServer:

$ sh bin/mqnamesrv

該命令執行很慢,但是最終還是顯示啓動NameServer成功了,輸出日誌如下:

Java HotSpot(TM) 64-Bit Server VM warning: Using the DefNew young collector with the CMS collector is deprecated and will likely be removed in a future release
Java HotSpot(TM) 64-Bit Server VM warning: UseCMSCompactAtFullCollection is deprecated and will likely be removed in a future release.
The Name Server boot success. serializeType=JSON, address 0.0.0.0:9876

執行jps命令也能看到相應進程:

$ jps
13730 NamesrvStartup

執行如下命令啓動Broker + Proxy:

$ sh bin/mqbroker -n localhost:9876 --enable-proxy

該命令執行非常漫長,差不多要90s左右纔會輸出如下日誌:

Sat Feb 24 19:48:03 CST 2024 rocketmq-proxy startup successfully

~/logs/rocketmqlogs/proxy.log日誌中也能看到broker啓動成功的日誌:

2024-02-24 19:47:53 INFO main - The broker[broker-a, 192.168.88.135:10911] boot success. serializeType=JSON and name server is localhost:9876

注意:日誌中的broker-a是在broker.conf文件中配置的brokerName參數,如下所示:

brokerClusterName = DefaultCluster
brokerName = broker-a # 配置的默認brokerName參數
brokerId = 0
deleteWhen = 04
fileReservedTime = 48
brokerRole = ASYNC_MASTER
flushDiskType = ASYNC_FLUSH

再次執行jps命令確認相應進程是否已經啓動:

$ jps
jps
13730 NamesrvStartup
14410 ProxyStartup

一切似乎看起來都正常,從~/logs/rocketmqlogs/namesrv.log~/logs/rocketmqlogs/proxy.log日誌中也看不出明顯的異常。

但是在創建Topic時就會報錯:

$ sh bin/mqadmin updatetopic -n localhost:9876 -t TestTopic -c DefaultCluster

該命令在執行大約40s左右就會輸出如下報錯日誌:

org.apache.rocketmq.tools.command.SubCommandException: UpdateTopicSubCommand command failed
        at org.apache.rocketmq.tools.command.topic.UpdateTopicSubCommand.execute(UpdateTopicSubCommand.java:198)
        at org.apache.rocketmq.tools.command.MQAdminStartup.main0(MQAdminStartup.java:164)
        at org.apache.rocketmq.tools.command.MQAdminStartup.main(MQAdminStartup.java:114)
Caused by: org.apache.rocketmq.remoting.exception.RemotingTimeoutException: invokeSync call the addr[127.0.0.1:9876] timeout
        at org.apache.rocketmq.remoting.netty.NettyRemotingClient.invokeSync(NettyRemotingClient.java:549)
        at org.apache.rocketmq.client.impl.MQClientAPIImpl.getBrokerClusterInfo(MQClientAPIImpl.java:1961)
        at org.apache.rocketmq.tools.admin.DefaultMQAdminExtImpl.examineBrokerClusterInfo(DefaultMQAdminExtImpl.java:577)
        at org.apache.rocketmq.tools.admin.DefaultMQAdminExt.examineBrokerClusterInfo(DefaultMQAdminExt.java:318)
        at org.apache.rocketmq.tools.command.CommandUtil.fetchMasterAddrByClusterName(CommandUtil.java:94)
        at org.apache.rocketmq.tools.command.topic.UpdateTopicSubCommand.execute(UpdateTopicSubCommand.java:171)
        ... 2 more

從報錯信息看似乎是無法連接127.0.0.1:9876,但是經過驗證發現該地址是一定可以連通的,再幾經嘗試之後依然報錯。

於是換了一臺Windows機器繼續驗證,奇怪的是在Windows機器上一切正常,而且我注意到在Windows環境啓動RocketMQ的時候brokerName使用是主機名,如下日誌:

# zhangsan是主機名
The broker[zhangsan, 20.5.133.188:10911] boot success. serializeType=JSON and name server is localhost:9876

於是腦袋中突然閃現一個疑問,是不是因爲沒有在CentOS的/etc/hosts文件中映射主機名與127.0.0.1地址導致的。

驗證後果然就正常的。

原因追蹤

根據相關報錯日誌梳理RocketMQ的源代碼,報錯是因爲在NettyRemotingClient.invokeSync()方法中做了超時判斷。

@Override
public RemotingCommand invokeSync(String addr, final RemotingCommand request, long timeoutMillis)
    throws InterruptedException, RemotingConnectException, RemotingSendRequestException, RemotingTimeoutException {
    long beginStartTime = System.currentTimeMillis();
    final Channel channel = this.getAndCreateChannel(addr);
    String channelRemoteAddr = RemotingHelper.parseChannelRemoteAddr(channel);
    if (channel != null && channel.isActive()) {
        long left = timeoutMillis; // 默認超時時長是5000ms
        try {
            long costTime = System.currentTimeMillis() - beginStartTime;
            left -= costTime;
            if (left <= 0) { // 當執行時長超過5s時直接拋出異常
                throw new RemotingTimeoutException("invokeSync call the addr[" + channelRemoteAddr + "] timeout");
            }
            RemotingCommand response = this.invokeSyncImpl(channel, request, left);
            updateChannelLastResponseTime(addr);
            return response;
        }
        //其他代碼省略...
    }
    //其他代碼省略...
}

由於是做了超時檢查拋出的異常,所以單純從日誌信息看就會認爲是無法連接127.0.0.1:9876,實際上該地址是可以連通的。

進一步追蹤發現,是在執行Netty的ReflectiveChannelFactory.newChannel()方法耗時較長,約10s左右。

@Override
public T newChannel() {
    try {
        // constructor是NioSocketChannel.class
        // 所以本質上這裏是要通過反射的方式實例化一個NioSocketChannel對象
        T t = constructor.newInstance();
        return t;
    } catch (Throwable t) {
        throw new ChannelException("Unable to create Channel from class " + constructor.getDeclaringClass(), t);
    }
}

驗證代碼如下:

long start = System.currentTimeMillis();
Constructor constructor = NioSocketChannel.class.getConstructor();
constructor.newInstance();
System.out.println(String.format("%s ms", System.currentTimeMillis() - start));

執行後輸出日誌:

10144 ms

奇怪的是,當在/etc/hosts文件中明確指定主機名與127.0.0.1的映射關係後,執行就非常快。

暫時還不清楚這個地方的深層次原因是什麼,爲什麼通過反射方式實例化NioSocketChannel對象會跟主機名與127.0.0.1的映射有關係呢?

【參考】
Windows 啓動RocketMQ

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章