RabbitMQ heartbeat原理

RabbitMQ的heartbeat是用於客戶端與RabbitMQ之間連接的存活狀態檢測,類似於tcp keepalives功能。本文將介紹RabbitMQ的heartbeat功能何時被創建以及如何檢測連接存活狀態。

1. RabbitMQ連接建立的協議流程

對於本文的研究主要聚焦到connection.tune和connection.tune-ok流程進行說明。

2. channel_max,frame_max,heartbeat參數值說明

客戶端與RabbitMQ之間建立連接的流程圖如上所示,其中與RabbitMQ建立連接時的協議交互主要在channel 0上進行,客戶端與RabbitMQ之間只會建立一個tcp連接,然後在該tcp連接上會建立多個channel,一個tcp連接所能包括最多的channel個數是由客戶端和RabbitMQ共同協商決定。

協商的流程在connection.tune和connection.tune-ok的過程中。首先,RabbitMQ收到connection.start-ok消息後,進行相關的處理,處理完成後,對用戶進行驗證工作,驗證完成後,組裝connection.tune消息發送給客戶端,其中發送的connection.tune消息包括channel_max,frame_max以及heartbeat的值,這些值是RabbitMQ本身的配置值。代碼如下:

%% rabbit_reader.erl
%% 驗證階段(去系統查找該用戶及其對應的密碼是否合法正確)
auth_phase(Response,
           State = #v1{connection = Connection =
                                        #connection{protocol       = Protocol,
                                                    auth_mechanism = {Name, AuthMechanism},
                                                    auth_state     = AuthState},
                       sock = Sock}) ->
    %% 通過驗證模塊得到對應的用戶名字和密碼
    case AuthMechanism:handle_response(Response, AuthState) of
        {refused, Username, Msg, Args} ->
            %% 驗證拒絕,則中斷當前rabbit_reader進程
            auth_fail(Username, Msg, Args, Name, State);
        {protocol_error, Msg, Args} ->
            %% 向rabbit_event發佈用戶驗證結果
            notify_auth_result(none, user_authentication_failure,
                               [{error, rabbit_misc:format(Msg, Args)}],
                               State),
            rabbit_misc:protocol_error(syntax_error, Msg, Args);
....
            %% 組裝connection.tune消息
            Tune = #'connection.tune'{frame_max   = get_env(frame_max),
                                      channel_max = get_env(channel_max),
                                      heartbeat   = get_env(heartbeat)},
            %% 將connection.tune消息發送給客戶端
            ok = send_on_channel0(Sock, Tune, Protocol),
            %% 將得到的用戶存儲起來
            State#v1{connection_state = tuning,                                 %% 將connection_state狀態字段置爲tuning,等待connection.tune_ok消息的返回
                     connection = Connection#connection{user       = User,
                                                        auth_state = none}}
    end.

如上圖黃色標記的代碼,從RabbitMQ的環境變量中獲取frame_max,channel_max和heartbeat的值,即

bash-4.4# rabbitmqctl environment|grep -E "channel_max|frame_max|heartbeat"
      {channel_max,2047},
      {frame_max,131072},
      {heartbeat,60},

這些值將發給客戶端進行協商,協商完成後客戶端將發送connection.tune-ok消息給RabbitMQ,該消息中帶有協商後的frame_max,channel_max和heartbeat的值,即:

# amqp/connection.py:Connection
   def _setup_listeners(self):
        self._callbacks.update({
            spec.Connection.Start: self._on_start,
            spec.Connection.OpenOk: self._on_open_ok,
            spec.Connection.Secure: self._on_secure,
            spec.Connection.Tune: self._on_tune,
            spec.Connection.Close: self._on_close,
            spec.Connection.Blocked: self._on_blocked,
            spec.Connection.Unblocked: self._on_unblocked,
            spec.Connection.CloseOk: self._on_close_ok,
        })

# amqp/connection.py:Connection
    def _on_tune(self, channel_max, frame_max, server_heartbeat, argsig='BlB'):
        client_heartbeat = self.client_heartbeat or 0
        self.channel_max = channel_max or self.channel_max
        self.frame_max = frame_max or self.frame_max
        self.server_heartbeat = server_heartbeat or 0

        # negotiate the heartbeat interval to the smaller of the
        # specified values
        if self.server_heartbeat == 0 or client_heartbeat == 0:
            self.heartbeat = max(self.server_heartbeat, client_heartbeat)
        else:
            self.heartbeat = min(self.server_heartbeat, client_heartbeat)

        # Ignore server heartbeat if client_heartbeat is disabled
        if not self.client_heartbeat:
            self.heartbeat = 0

        self.send_method(
            spec.Connection.TuneOk, argsig,
            (self.channel_max, self.frame_max, self.heartbeat),
            callback=self._on_tune_sent,
        )

其中_setup_listeners函數爲amqp模塊中的註冊回調函數,即當RabbitMQ回覆connection.tune消息時,當客戶端使用amqp模塊時,則amqp模塊將調用_on_tune函數進行相應的處理。

_on_tune函數的功能就是將從RabbitMQ獲取到的frame_max,channel_max和heartbeat值與自身(這裏是所說的客戶端)的相對應的參數值進行對比協商。

在客戶端側,默認情況下,frame_max的值爲131072,channel_max的值爲65535,heartbeat的值爲60s。

在RabbitMQ側,默認情況下,frame_max,channel_max和heartbeat的值如下:

bash-4.4# rabbitmqctl environment|grep -E "channel_max|frame_max|heartbeat"
      {channel_max,2047},
      {frame_max,131072},
      {heartbeat,60},

2.1 frame_max和channel_max參數值

通過分析_on_tune函數的代碼可知,frame_max和channel_max最終的值是基於這樣的原則進行設置的:首先看RabbitMQ通過connection.tune消息傳遞過來的值,如果RabbitMQ中沒有設置,則使用客戶端的frame_max和channel_max值。

就上述而已,RabbitMQ通過connection.tune消息由創建過來frame_max和channel_max值,因此使用RabbitMQ設置的frame_max和channel_max值,即frame_max爲131072,channel_max爲2047。

2.2 heartbeat參數值

根據RabbitMQ官方文檔以及分析_on_tune函數可知,heartbeat參數值的設置也是通過客戶端和RabbitMQ進行協商設置的。協商的前提是客戶端必須設置了heartbeat值,如果客戶端設置heartbeat值爲0,則表示客戶端與RabbitMQ不使用heartbeat功能。如果其中RabbitMQ和客戶端的heartbeat都設置爲非0,則最終的heartbeat值取小的那個值。如果其中RabbitMQ和客戶端的heartbeat一個設置爲0,則最終的heartbeat值取大的那個值(如果另外一個也被設置爲0,則heartbeat最終值爲0)。

2.3 RabbitMQ處理協商後的值

從目前的分析來看,協商的處理流程主要是在客戶端進行處理,即RabbitMQ通過發送connection.tune消息給客戶端,客戶端解析connection.tune消息中的RabbitMQ中的發送frame_max,channel_max和heartbeat值,然後與客戶端自身的frame_max,channel_max和heartbeat值進行對比協商,獲取最終的參數值。

_on_tune函數通過協商獲取到最終的frame_max,channel_max和heartbeat值後,通過組裝connection.tune-ok消息發送給RabbitMQ進行處理。處理流程如下:

%% 處理connection.tune_ok消息
%% frame_max:和客戶端通信時所允許的最大的frame size.默認值爲131072,增大這個值有助於提高吞吐,降低這個值有利於降低時延
%% channel_max:最大鏈接數
handle_method0(#'connection.tune_ok'{frame_max   = FrameMax,
                                     channel_max = ChannelMax,
                                     heartbeat   = ClientHeartbeat},
               State = #v1{connection_state = tuning,                       %% 在向客戶端發送connection.tune消息的時候將該狀態字段置爲tuning
                           connection = Connection,
                           helper_sup = SupPid,
                           sock = Sock}) ->
    %% 驗證協商客戶端發過來的整數值和服務器設置的frame_max整數值
    ok = validate_negotiated_integer_value(
           frame_max,   ?FRAME_MIN_SIZE, FrameMax),
    %% 驗證協商客戶端發過來的整數值和服務器設置的channel_max整數值
    ok = validate_negotiated_integer_value(
           channel_max, ?CHANNEL_MIN,    ChannelMax),
    %% 在rabbit_connection_helper_sup監督進程下啓動queue_collector進程
    {ok, Collector} = rabbit_connection_helper_sup:start_queue_collector(
                        SupPid, Connection#connection.name),
    %% 創建心跳包消息Frame結構
    Frame = rabbit_binary_generator:build_heartbeat_frame(),
    %% 創建發送心跳包消息的函數
    SendFun = fun() -> catch rabbit_net:send(Sock, Frame) end,
    Parent = self(),
    %% 創建向自己發送心跳超時的消息的函數
    ReceiveFun = fun() -> Parent ! heartbeat_timeout end,
    %% 在rabbit_connection_helper_sup監督進程下啓動兩個心跳進程
    %% 一個在ClientHeartbeat除以2後檢測RabbitMQ向客戶端發送數據的心跳檢測進程
    %% 一個是在ClientHeartbeat時間內檢測RabbitMQ的當前rabbit_reader進程的socket接收數據的心跳檢測進程
    Heartbeater = rabbit_heartbeat:start(
                    SupPid, Sock, Connection#connection.name,
                    ClientHeartbeat, SendFun, ClientHeartbeat, ReceiveFun),
    State#v1{connection_state = opening,                                    %% 接收connection.tune_ok消息將connection_state狀態置爲opening
             connection = Connection#connection{
                                                frame_max   = FrameMax,
                                                channel_max = ChannelMax,
                                                timeout_sec = ClientHeartbeat},
             queue_collector = Collector,
             heartbeater = Heartbeater};

其中RabbitMQ在收到客戶端發送過來的connection.tune-ok消息後,還將對frame_max和channel_max參數進行協商校驗。即

%% negotiated:協商
%% 驗證協商客戶端發過來的整數值和服務器設置的整數值
validate_negotiated_integer_value(Field, Min, ClientValue) ->
    %% 從rabbit應用拿到Field對應的配置數據
    ServerValue = get_env(Field),
    if ClientValue /= 0 andalso ClientValue < Min ->
           %% 驗證的客戶端值比服務器的最小值小,則將當前rabbit_reader進程終止
           fail_negotiation(Field, min, ServerValue, ClientValue);
       ServerValue /= 0 andalso (ClientValue =:= 0 orelse
                                     ClientValue > ServerValue) ->
           %% 驗證 的客戶端值大於服務器設置的最大值,則將當前rabbit_reader進程終止
           fail_negotiation(Field, max, ServerValue, ClientValue);
       true ->
           ok
    end.

最終協商的frame_max和channel_max參數值需要處於RabbitMQ所設置的對應參數值的最大值和最小值之間。其中他們的最小值在RabbitMQ中如下:

-define(FRAME_MIN_SIZE, 4096).
-define(CHANNEL_MIN, 1).

最大值爲:

bash-4.4# rabbitmqctl environment|grep -E "channel_max|frame_max|heartbeat"
      {channel_max,2047},
      {frame_max,131072},
      {heartbeat,60},

就上面舉例的環境來說,frame_max和channel_max的值分別爲131072和2047,Heartbeat最終的值採用在客戶端協商的值,即60s。

3. RabbitMQ heartbeat原理

3.1 RabbitMQ側

通過第二節的分析,heartbeat的值是通過客戶端和RabbitMQ在connection.tune和connection.tune-ok消息中協商決定的。就我們的環境而言,最終協商的heartbeat的值爲60s。在確定heartbeat值後,即在處理connection.tune-ok消息時,將在RabbitMQ中啓動兩個心跳進程:一個在ClientHeartbeat除以2後檢測RabbitMQ向客戶端發送數據的心跳檢測進程, 一個是在ClientHeartbeat時間內檢測RabbitMQ的當前rabbit_reader進程的socket接收數據的心跳檢測進程。代碼如下:

%% rabbit_read.erl
handle_method0(#'connection.tune_ok'{frame_max   = FrameMax,
                                     channel_max = ChannelMax,
                                     heartbeat   = ClientHeartbeat},
               State = #v1{connection_state = tuning,                       %% 在向客戶端發送connection.tune消息的時候將該狀態字段置爲tuning
                           connection = Connection,
                           helper_sup = SupPid,
                           sock = Sock}) ->
......
    SendFun = fun() -> catch rabbit_net:send(Sock, Frame) end,
    Parent = self(),
    %% 創建向自己發送心跳超時的消息的函數
    ReceiveFun = fun() -> Parent ! heartbeat_timeout end,
    %% 在rabbit_connection_helper_sup監督進程下啓動兩個心跳進程
    %% 一個在ClientHeartbeat除以2後檢測RabbitMQ向客戶端發送數據的心跳檢測進程
    %% 一個是在ClientHeartbeat時間內檢測RabbitMQ的當前rabbit_reader進程的socket接收數據的心跳檢測進程
    Heartbeater = rabbit_heartbeat:start(
                    SupPid, Sock, Connection#connection.name,
                    ClientHeartbeat, SendFun, ClientHeartbeat, ReceiveFun),
    State#v1{connection_state = opening,                                    %% 接收connection.tune_ok消息將connection_state狀態置爲opening
             connection = Connection#connection{
                                                frame_max   = FrameMax,
                                                channel_max = ChannelMax,
                                                timeout_sec = ClientHeartbeat},
             queue_collector = Collector,
             heartbeater = Heartbeater};

其中rabbit_heartbeat的代碼如下:

start(SupPid, Sock, Identity,
      SendTimeoutSec, SendFun, ReceiveTimeoutSec, ReceiveFun) ->
    %% 啓動Sock數據發送的心跳檢測進程
    {ok, Sender} =
        start_heartbeater(SendTimeoutSec, SupPid, Sock,
                          SendFun, heartbeat_sender,
                          start_heartbeat_sender, Identity),
    %% 啓動Sock數據接收的心跳檢測進程
    {ok, Receiver} =
        start_heartbeater(ReceiveTimeoutSec, SupPid, Sock,
                          ReceiveFun, heartbeat_receiver,
                          start_heartbeat_receiver, Identity),
    {Sender, Receiver}.

%% 實際的啓動Sock數據發送的心跳檢測進程
start_heartbeat_sender(Sock, TimeoutSec, SendFun, Identity) ->
    %% the 'div 2' is there so that we don't end up waiting for nearly
    %% 2 * TimeoutSec before sending a heartbeat in the boundary(邊界) case
    %% where the last message was sent just after a heartbeat.
    %% send_oct: 查看socket上發送的字節數
    %% 進程定時檢測tcp連接上是否有數據發送(這裏的發送是指rabbitmq發送數據給客戶端),如果一段時間內沒有數據發送給客戶端,則發送一個心跳包給客戶端,然後循環進行下一次檢測
    %% sock數據發送的心跳檢測進程在超時後執行完向客戶端發送心跳消息後,則繼續進行心跳檢測操作
    heartbeater({Sock, TimeoutSec * 1000 div 2, send_oct, 0,
                 fun () -> SendFun(), continue end}, Identity).
%% 實際的啓動Sock數據接收的心跳檢測進程
start_heartbeat_receiver(Sock, TimeoutSec, ReceiveFun, Identity) ->
    %% we check for incoming data every interval, and time out after
    %% two checks with no change. As a result we will time out between
    %% 2 and 3 intervals after the last data has been received.
    %% recv_oct: 查看socket上接收的字節數
    %% 進程定時檢測tcp連接上是否有數據的接收,如果一段時間內沒有收到任何數據,則判定爲心跳超時,最終會關閉tcp連接。另外,rabbitmq的流量控制機制可能會暫停heartbeat檢測
    %% sock數據接收的心跳進程在超時後執行完向rabbit_reader進程發送停止的消息後,則自己也立刻停止
    heartbeater({Sock, TimeoutSec * 1000, recv_oct, 1,
                 fun () -> ReceiveFun(), stop end}, Identity).

通過分析上述代碼可知,rabbit_heartbeat將啓動兩個進程(在erlang中一般稱爲進程,不是系統進程):一個進程爲heartbeat_sender,另一個爲heartbeat_receiver。

heartbeat_sender進程在發現connection上沒有數據發送的情況下,將每隔heartbeat/2時間間隔向客戶端發送心跳消息。

heartbeat_receiver進程在發現connection上沒有數據接收的情況下,將在兩次heartbeat後,發送heartbeat_timeout給父進程,讓其關閉客戶端與RabbitMQ之間建立的tcp連接。即RabbitMQ會出現類似如下日誌:

=ERROR REPORT==== 9-Sep-2019::16:22:55 ===
closing AMQP connection <0.27892.1951> (10.101.71.0:49814 -> 10.101.63.7:5672 - neutron-server:323:29f77387-90c6-42d0-bf62-44780368c7f7):
missed heartbeats from client, timeout: 60s

3.2 客戶端側

# oslo_messaging/_drivers/impl_rabbit.py:Connection
    def _heartbeat_start(self):
        if self._heartbeat_supported_and_enabled():
            self._heartbeat_exit_event = eventletutils.Event()
            self._heartbeat_thread = threading.Thread(
                target=self._heartbeat_thread_job)
            self._heartbeat_thread.daemon = True
            self._heartbeat_thread.start()
        else:
            self._heartbeat_thread = None

    def _heartbeat_thread_job(self):
        """Thread that maintains inactive connections
        """
        while not self._heartbeat_exit_event.is_set():
            with self._connection_lock.for_heartbeat():

                try:
                    try:
                        self._heartbeat_check()
                        # NOTE(sileht): We need to drain event to receive
                        # heartbeat from the broker but don't hold the
                        # connection too much times. In amqpdriver a connection
                        # is used exclusively for read or for write, so we have
                        # to do this for connection used for write drain_events
                        # already do that for other connection
                        try:
                            self.connection.drain_events(timeout=0.001)
                        except socket.timeout:
                            pass
                    except kombu.exceptions.OperationalError as exc:
                        LOG.info(_LI("A recoverable connection/channel error "
                                     "occurred, trying to reconnect: %s"), exc)
                        self.ensure_connection()
                except Exception:
                    LOG.warning(_LW("Unexpected error during heartbeart "
                                    "thread processing, retrying..."))
                    LOG.debug('Exception', exc_info=True)

            self._heartbeat_exit_event.wait(
                timeout=self._heartbeat_wait_timeout)
        self._heartbeat_exit_event.clear()

客戶端在與RabbitMQ之間建立連接後,默認將啓動一個線程,用於與RabbitMQ之間的心跳數據交互。其中_heartbeat_thread_job函數中的_heartbeat_check函數用於客戶端發送心跳數據到RabbitMQ,drain_events函數用於客戶端接收RabbitMQ發送來的心跳數據。

4 總結

1 對於channel_max,frame_max和heartbeat 3個參數值的設置主要在客戶端與RabbitMQ之間建立連接中的connection.tune和connection.tune-ok協議中進行協商決定。首先在RabbitMQ側,RabbitMQ通過組裝且發送connection.tune報文給客戶端,告訴客戶端在RabbitMQ側的channel_max,frame_max和heartbeat 3個參數的值。然後在客戶端側,通過對比RabbitMQ發送過來的3個參數值與自身的相對應的參數值做對比,最終獲得一個協商的值。最終通過組裝connection.tune-ok報文發送給RabbitMQ側。

2 通過協商後的heartbeat值,被用於RabbitMQ側和客戶端側發送心跳的超時時間。在RabbitMQ收到connection.tune-ok報文後,將在RabbitMQ中啓動兩個心跳進程:一個在ClientHeartbeat除以2後檢測RabbitMQ向客戶端發送數據的心跳檢測進程, 一個是在ClientHeartbeat時間內檢測RabbitMQ的當前rabbit_reader進程的socket接收數據的心跳檢測進程。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章