Raft源碼分析(二) - Role轉換

時光粒子源碼

分佈式一致性/分佈式存儲等開源技術探討, GitHub:https://timequark.github.io/


先來看一下raft白皮書中的 role 角色轉換圖:

 

下面的是個人土製的轉換圖:

 

 

raft 中的 Role 角色共有三類

  • Leader

    Leader的職能有:

    (1)處理read/write請求

    (2)存儲 Log 數據

    (3)向集羣其它節點發送 heartbeat 心跳請求,確保集羣通信正常

    (4)向Follower發送Log Entry數據,完成 Replication 冗餘

    (5)跟蹤Follower的數據複製狀態

    (6)Log Compation(raftos目前不完備)

    (7)snapshot(raftos目前不完備)

    Leader 會不停的向集羣其它節點發送 heartbeat 心跳,且每個心跳請求都有一個 ID (int類型遞增),如果收到過半節點的 append_entries_response,則重置 step_down_timer 定時器;如果沒有收到過半節點的迴應,累計次數超過 step_down_missed_heartbeats 次,step_down_timer 會被觸發,Leader 退化爲 Follower 。

  • Candidate

    只用來做 election 選舉。

    首先,term + 1,voted_for 置爲自身的 ID,給自己投1票,然後廣播 request_vote 請求。收到過半 vote_granted 爲 True 的 response 後,升級爲 Leader。如果定時器觸發前,沒有贏得過半的投票,則直接轉變成 Follower 角色。

    下面小節會具體分析 request_vote 請求攜帶的參數。

  • Follower

    接收來自 Leader 的 append_entries 請求、來自 Candidate 的 request_vote 請求。這裏要注意以下幾點:

    (1)Follower.start 時, init_storage 方法只能第一次加載時纔對 term 置 0,但每次都會重置 voted_for。

    (2)on_receive_append_entries 只有在順利通過 @validate_term、@validate_commit_index 驗證時,纔會重置 election_timer,否則就有退化爲 Candidate 進行重新選舉的可能。

    (3)on_receive_request_vote 只有在沒有投過票,並且來自 Candidate 的 last_log_term、last_log_index 有效時,纔會迴應 vote_granted 爲 True。

    (4)on_receive_request_vote 沒有重置 election_timer 動作。因爲作爲 Follower 自身,並不知道此次選舉是否會有新的 Leader 生成,只能通過有效的 on_receive_request_vote 才能感知 Leader 的存在。

Leader


state.py

class Leader(BaseRole):
    """Raft Leader
    Upon election: send initial empty AppendEntries RPCs (heartbeat) to each server;
    repeat during idle periods to prevent election timeouts

    — If command received from client: append entry to local log, respond after entry applied to state machine
    - If last log index ≥ next_index for a follower: send AppendEntries RPC with log entries starting at next_index
    — If successful: update next_index and match_index for follower
    — If AppendEntries fails because of log inconsistency: decrement next_index and retry
    — If there exists an N such that N > commit_index, a majority of match_index[i] ≥ N,
    and log[N].term == self term: set commit_index = N
    """

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        self.heartbeat_timer = Timer(config.heartbeat_interval, self.heartbeat)
        self.step_down_timer = Timer(
            config.step_down_missed_heartbeats * config.heartbeat_interval,
            self.state.to_follower
        )

        # Heartbeat 時自加1
        self.request_id = 0
        # 收到 append_entries_response 時,根據 request_id ,判定是否有過半 Follower 迴應
        self.response_map = {}

    def start(self):
        self.init_log()

        # LIUHAO: Trigger leader call 'append_entries' automatically
        self.heartbeat()
        self.heartbeat_timer.start()

        self.step_down_timer.start()

    def stop(self):
        self.heartbeat_timer.stop()
        self.step_down_timer.stop()

    # init_log 是在 start() 方法而不是 __init__ 方法中調用,
    # Candidate 升級爲 Leader 時,只有 next_index、match_index 會重新初始化,其它數據保持不變
    def init_log(self):
        # LIUHAO
        # - Initiate next_index of each follower to leader's last_log_index+1. Leader will try to broadcast 'append_entries' command to each follower with lastest log data.
        #         If follower reply not 'success', next_index will descrease automatically.
        #         If follower reply 'success', leader will update 'match_index' to 'last_log_index' of follower.
        # - 'self.state.cluster' doesn't include this node refer to register.py:register
        self.log.next_index = {
            follower: self.log.last_log_index + 1 for follower in self.state.cluster
        }

        # LIUHAO
        # - Initiate match_index to 0. match_index will catch up to the 'next_index' of each server after leader broadcasting 'append_entries' commands and receives 'success' response
        self.log.match_index = {
            follower: 0 for follower in self.state.cluster
        }

    async def append_entries(self, destination=None):
        """AppendEntries RPC — replicate log entries / heartbeat
        Args:
            destination — destination id

        Request params:
            term — leader’s term
            leader_id — so follower can redirect clients
            prev_log_index — index of log entry immediately preceding new ones
            prev_log_term — term of prev_log_index entry
            commit_index — leader’s commit_index

            entries[] — log entries to store (empty for heartbeat)
        """

        # Send AppendEntries RPC to destination if specified or broadcast to everyone
        # 支持 send 單點或 broadcast 廣播消息
        destination_list = [destination] if destination else self.state.cluster
        for destination in destination_list:
            data = {
                'type': 'append_entries',

                'term': self.storage.term,
                'leader_id': self.id, # LIUHAO: It's just a leader_id. When a Follower receives 'append_entries' message, the Follower will update its Leader property.
                'commit_index': self.log.commit_index,

                'request_id': self.request_id
            }

            next_index = self.log.next_index[destination]
            prev_index = next_index - 1

            if self.log.last_log_index >= next_index:
                # Follower 節點數據未同步時,這裏僅僅只同步 1 個 entry
                data['entries'] = [self.log[next_index]]

            else:
                # heartbeat 心跳,不攜帶數據
                data['entries'] = []

            # Follower 需要檢查上一個 Log Entry 的 index、term 是否與 Leader 匹配,確保 Follower 數據的一致性
            data.update({
                'prev_log_index': prev_index,
                'prev_log_term': self.log[prev_index]['term'] if self.log and prev_index else 0
            })

            asyncio.ensure_future(self.state.send(data, destination), loop=self.loop)

    @validate_commit_index
    @validate_term
    def on_receive_append_entries_response(self, data):
        sender_id = self.state.get_sender_id(data['sender'])

        # Count all unqiue responses per particular heartbeat interval
        # and step down via <step_down_timer> if leader doesn't get majority of responses for
        # <step_down_missed_heartbeats> heartbeats

        if data['request_id'] in self.response_map:
            self.response_map[data['request_id']].add(sender_id)

            if self.state.is_majority(len(self.response_map[data['request_id']]) + 1):
                # 迴應過半,重置 step_down_timer,刪除 response_map 中 request_id 的請求記錄
                self.step_down_timer.reset()
                del self.response_map[data['request_id']]

        if not data['success']:
            # LIUHAO: next_index is descreasing. Maybe in order to tolerant the follower to recover log data and catch up Leader
            # next_index[follower] 自減 1,供下一次 append_entries 使用
            self.log.next_index[sender_id] = max(self.log.next_index[sender_id] - 1, 1)

        else:
            # LIUHAO: Trace next_index, match_index for follower inside Leader.
            # append_entries 成功時,
            # next_index[follower_id] 更新爲Follower的last_log_index+1,
            # match_index[follower_id]更新爲Follower的last_log_index
            self.log.next_index[sender_id] = data['last_log_index'] + 1
            self.log.match_index[sender_id] = data['last_log_index']
            # 更新commit_index
            self.update_commit_index()

        # Send AppendEntries RPC to continue updating fast-forward log (data['success'] == False)
        # or in case there are new entries to sync (data['success'] == data['updated'] == True)
        if self.log.last_log_index >= self.log.next_index[sender_id]:
            # LIUHAO: Continue to send data to the follower
            # 繼續向 Follower 同步數據
            asyncio.ensure_future(self.append_entries(destination=sender_id), loop=self.loop)

    def update_commit_index(self):
        commited_on_majority = 0

        # 在當前[commit_index+1, last_log_index+1)範圍內遍歷,Leader中的 index 已得到 match_index 半數以
        # 上 Follower 迴應,並且,log[index]['term'] 與最新 storage.term 相同時,更新 commit_index
        for index in range(self.log.commit_index + 1, self.log.last_log_index + 1):
            commited_count = len([
                1 for follower in self.log.match_index
                if self.log.match_index[follower] >= index
            ])

            # If index is matched on at least half + self for current term — commit
            # That may cause commit fails upon restart with stale logs
            is_current_term = self.log[index]['term'] == self.storage.term
            if self.state.is_majority(commited_count + 1) and is_current_term:
                commited_on_majority = index

            else:
                break

        if commited_on_majority > self.log.commit_index:
            self.log.commit_index = commited_on_majority

    # Write 接口
    async def execute_command(self, command):
        """Write to log & send AppendEntries RPC"""
        self.apply_future = asyncio.Future(loop=self.loop)

        entry = self.log.write(self.storage.term, command)
        asyncio.ensure_future(self.append_entries(), loop=self.loop)
Candidate
state.py

class Candidate(BaseRole):
    """Raft Candidate
    — On conversion to candidate, start election:
        — Increment self term
        — Vote for self
        — Reset election timer
        — Send RequestVote RPCs to all other servers
    — If votes received from majority of servers: become leader
    — If AppendEntries RPC received from new leader: convert to follower
    — If election timeout elapses: start new election
    """

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        # election 超時後,自動轉變成 Follower
        self.election_timer = Timer(self.election_interval, self.state.to_follower)
        self.vote_count = 0

    def start(self):
        """Increment current term, vote for herself & send vote requests"""
        # 開始 election 時,term 自加 1,且給自己投一票
        self.storage.update({
            'term': self.storage.term + 1,
            'voted_for': self.id
        })

        self.vote_count = 1

        # 發送拉票消息
        self.request_vote()

        # 啓動 election timer
        self.election_timer.start()

    def stop(self):
        self.election_timer.stop()

    def request_vote(self):
        """RequestVote RPC — gather votes
        Arguments:
            term — candidate’s term
            candidate_id — candidate requesting vote
            last_log_index — index of candidate’s last log entry
            last_log_term — term of candidate’s last log entry
        """
        data = {
            'type': 'request_vote',

            'term': self.storage.term,
            'candidate_id': self.id,
            'last_log_index': self.log.last_log_index,
            'last_log_term': self.log.last_log_term
        }
        #
        # 向集羣中其它所有節點廣播 request_vote 消息,不論其它節點的 Role 是 Leader、Folloer、還是 Candidate,
        # 每個節點各自到什麼時間,做什麼事,
        # 因此 BaseRole 中抽象了以下幾個方法的空實現,來應對可能接收到的各中消息的可能:
        # - on_receive_request_vote(self, data)
        # - on_receive_request_vote_response(self, data)
        # - on_receive_append_entries(self, data)
        # - on_receive_append_entries_response(self, data)
        #
        self.state.broadcast(data)

    @validate_term
    def on_receive_request_vote_response(self, data):
        """Receives response for vote request.
        If the vote was granted then check if we got majority and may become Leader
        """

        if data.get('vote_granted'):
            self.vote_count += 1

            # 得到過半投票後,Candidate 切換成 Leader
            if self.state.is_majority(self.vote_count):
                self.state.to_leader()

    @validate_term
    def on_receive_append_entries(self, data):
        """If we discover a Leader with the same term — step down"""
        # LIUHAO
        # Confusion here. When 'storage.term' < data['term'], @validate_term will keep 'storage.term' update and change self to Follower.
        # Then the code here will change self to Follower again. What I thought is that 'split vote' case may happen.
        # This doesn't make any problem ??? . Whatever....
        # 
        # 這裏有個二次切換 Follower 的問題,情景如下:
        #   集羣中有兩個以上的 Candidate 在選舉,例如叫 A、B,且 A.term > B.term;
        #   當A選舉成功,A 成爲 Leader,緊接着向 B 發送 append_entries 消息,Candidate B 在
        #   on_receive_append_entries 中 @validate_term 將 B.term := A.term,且切換成 Follower,
        #   這裏判斷 B.term == A.term,會再次切換成 Follower
        # 
        # 上面描述的情景是有一定概率出現的,由於 Follower 的 election_interval 的隨機性,再加上網絡狀態良好的話,
        # 所以,出現上面情景的概率不會高。
        if self.storage.term == data['term']:
            self.state.to_follower()

    @staticmethod
    def election_interval():
        return random.uniform(*config.election_interval)

 

Candidate


state.py

class Candidate(BaseRole):
    """Raft Candidate
    — On conversion to candidate, start election:
        — Increment self term
        — Vote for self
        — Reset election timer
        — Send RequestVote RPCs to all other servers
    — If votes received from majority of servers: become leader
    — If AppendEntries RPC received from new leader: convert to follower
    — If election timeout elapses: start new election
    """

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        # election 超時後,自動轉變成 Follower
        self.election_timer = Timer(self.election_interval, self.state.to_follower)
        self.vote_count = 0

    def start(self):
        """Increment current term, vote for herself & send vote requests"""
        # 開始 election 時,term 自加 1,且給自己投一票
        self.storage.update({
            'term': self.storage.term + 1,
            'voted_for': self.id
        })

        self.vote_count = 1

        # 發送拉票消息
        self.request_vote()

        # 啓動 election timer
        self.election_timer.start()

    def stop(self):
        self.election_timer.stop()

    def request_vote(self):
        """RequestVote RPC — gather votes
        Arguments:
            term — candidate’s term
            candidate_id — candidate requesting vote
            last_log_index — index of candidate’s last log entry
            last_log_term — term of candidate’s last log entry
        """
        data = {
            'type': 'request_vote',

            'term': self.storage.term,
            'candidate_id': self.id,
            'last_log_index': self.log.last_log_index,
            'last_log_term': self.log.last_log_term
        }
        #
        # 向集羣中其它所有節點廣播 request_vote 消息,不論其它節點的 Role 是 Leader、Folloer、還是 Candidate,
        # 每個節點各自到什麼時間,做什麼事,
        # 因此 BaseRole 中抽象了以下幾個方法的空實現,來應對可能接收到的各中消息的可能:
        # - on_receive_request_vote(self, data)
        # - on_receive_request_vote_response(self, data)
        # - on_receive_append_entries(self, data)
        # - on_receive_append_entries_response(self, data)
        #
        self.state.broadcast(data)

    @validate_term
    def on_receive_request_vote_response(self, data):
        """Receives response for vote request.
        If the vote was granted then check if we got majority and may become Leader
        """

        if data.get('vote_granted'):
            self.vote_count += 1

            # 得到過半投票後,Candidate 切換成 Leader
            if self.state.is_majority(self.vote_count):
                self.state.to_leader()

    @validate_term
    def on_receive_append_entries(self, data):
        """If we discover a Leader with the same term — step down"""
        # LIUHAO
        # Confusion here. When 'storage.term' < data['term'], @validate_term will keep 'storage.term' update and change self to Follower.
        # Then the code here will change self to Follower again. What I thought is that 'split vote' case may happen.
        # This doesn't make any problem ??? . Whatever....
        # 
        # 這裏有個二次切換 Follower 的問題,情景如下:
        #   集羣中有兩個以上的 Candidate 在選舉,例如叫 A、B,且 A.term > B.term;
        #   當A選舉成功,A 成爲 Leader,緊接着向 B 發送 append_entries 消息,Candidate B 在
        #   on_receive_append_entries 中 @validate_term 將 B.term := A.term,且切換成 Follower,
        #   這裏判斷 B.term == A.term,會再次切換成 Follower
        # 
        # 上面描述的情景是有一定概率出現的,由於 Follower 的 election_interval 的隨機性,再加上網絡狀態良好的話,
        # 所以,出現上面情景的概率不會高。
        if self.storage.term == data['term']:
            self.state.to_follower()

    @staticmethod
    def election_interval():
        return random.uniform(*config.election_interval)

 

Follower


state.py

class Follower(BaseRole):
    """Raft Follower

    — Respond to RPCs from candidates and leaders
    — If election timeout elapses without receiving AppendEntries RPC from current leader
    or granting vote to candidate: convert to candidate
    """

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        # 注意這裏的 election_interval 是隨機生成的,隨機範圍參照 config.py
        self.election_timer = Timer(self.election_interval, self.start_election)

    def start(self):
        # 初始化 storage (term、voted_for)
        self.init_storage()
        self.election_timer.start()

    def stop(self):
        self.election_timer.stop()

    def init_storage(self):
        """Set current term to zero upon initialization & voted_for to None"""
        
        # 僅僅首次初始化爲0,storage 文件生成後,這裏邏輯全程不會再進入
        if not self.storage.exists('term'):
            self.storage.update({
                'term': 0,
            })

        # 清空 voted_for
        self.storage.update({
            'voted_for': None
        })

    @staticmethod
    def election_interval():
        return random.uniform(*config.election_interval)

    @validate_commit_index
    @validate_term
    def on_receive_append_entries(self, data):
        # LIUHAO: Update 'leader_id' to 'leader' property of Class State!
        #         We can have a look at description in Class State. Like the following part:
        #
        #         # <Leader object> if state is leader
        #         # <state_id> if state is follower
        #         # <None> if leader is not chosen yet
        #         leader = None
        self.state.set_leader(data['leader_id'])

        # Reply False if log doesn’t contain an entry at prev_log_index whose term matches prev_log_term
        try:
            prev_log_index = data['prev_log_index']
            # 檢查Leader側提供的Follower的prev_log_index、Leader的term,與本地相比,是否有效
            # 如果無效,則直接返回 False
            # 注意:
            # raft白皮書有提到,無效時,可以攜帶Follower的 last_log_index,給到 Leader 側,這樣做可以使
            # Leader 側快速定位 Follower 的 next_index,進而減少Leader側無效的 append_entries 通信次數
            if prev_log_index > self.log.last_log_index or (
                prev_log_index and self.log[prev_log_index]['term'] != data['prev_log_term']
            ):
                response = {
                    'type': 'append_entries_response',
                    'term': self.storage.term,
                    'success': False,

                    'request_id': data['request_id']
                }
                # 異步迴應Leader
                asyncio.ensure_future(self.state.send(response, data['sender']), loop=self.loop)
                return
        except IndexError:
            pass

        # If an existing entry conflicts with a new one (same index but different terms),
        # delete the existing entry and all that follow it
        # 將Leader發過來的entries數據,存至Log中 new_index 開始的位置
        new_index = data['prev_log_index'] + 1
        try:
            # 有衝突時,直接擦除至尾部,向Leader看齊
            if self.log[new_index]['term'] != data['term'] or (
                self.log.last_log_index != prev_log_index
            ):
                self.log.erase_from(new_index)
        except IndexError:
            pass
            # LIUHAO: TODO
            # 'log.write' will append entries to its tail. Should we reply Leader False message???

        # It's always one entry for now
        for entry in data['entries']:
            self.log.write(entry['term'], entry['command'])

        # Update commit index if necessary
        # 注意這裏的條件,Follower的commit_index 小於 Leader的commit_index時,才更新
        # 問題:
        # Follower的commit_index 大於 Leader的commit_index時,如何處理?
        # 思考:
        # 大於的情形有可能是 Follower 曾經是 Leader,commit_index 比較新 ,因爲某些原因降級成 Follower。
        # 但是,這種情形也不合理,因爲 Leader 的 commit_index 只有收到過半Follower的 append_entries_response 後纔會更新,
        # 如此,Follower 的 commit_index 一定是小於 Leader 的 commit_index,直至 Leader 同步完最後一個 last_log_index 
        # 的 entry,Follower 的 commit_index 等於 Leader 的 commit_index(因爲  Leader 的 update_commit_index 遍歷範圍
        # [commit_index+1, last_log_index+1) 時 index 最大值爲 last_log_index )。 
        if self.log.commit_index < data['commit_index']:
            self.log.commit_index = min(data['commit_index'], self.log.last_log_index)

        # Respond True since entry matching prev_log_index and prev_log_term was found
        response = {
            'type': 'append_entries_response',
            'term': self.storage.term,
            'success': True,

            'last_log_index': self.log.last_log_index, # LIUHAO: Here, 'log.last_log_index' will be updated for that more than 1 entry be appended to the Log list 
            'request_id': data['request_id']
        }
        asyncio.ensure_future(self.state.send(response, data['sender']), loop=self.loop)

        # 重置選舉定時器
        self.election_timer.reset()

    @validate_term
    def on_receive_request_vote(self, data):
        # LIUAHO: Insure that Follower has not voted for any Candidate
        if self.storage.voted_for is None and not data['type'].endswith('_response'):

            # Candidates' log has to be up-to-date

            # If the logs have last entries with different terms,
            # then the log with the later term is more up-to-date. If the logs end with the same term,
            # then whichever log is longer is more up-to-date.

            if data['last_log_term'] != self.log.last_log_term:
                up_to_date = data['last_log_term'] > self.log.last_log_term
            else:
                up_to_date = data['last_log_index'] >= self.log.last_log_index

            if up_to_date:
                self.storage.update({
                    'voted_for': data['candidate_id']
                })

            response = {
                'type': 'request_vote_response',
                'term': self.storage.term,
                'vote_granted': up_to_date
            }

            asyncio.ensure_future(self.state.send(response, data['sender']), loop=self.loop)

    def start_election(self):
        self.state.to_candidate()


def leader_required(func):

    @functools.wraps(func)
    async def wrapped(cls, *args, **kwargs):
        # 確保或等待當前集羣中存在 Leader
        await cls.wait_for_election_success()
        # 如果 Leader 不是自己,拋出異常
        if not isinstance(cls.leader, Leader):
            raise NotALeaderException(
                'Leader is {}!'.format(cls.leader or 'not chosen yet')
            )

        return await func(cls, *args, **kwargs)
    return wrapped

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章