Redis 的集羣容錯與故障轉移

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://zdran.com/20210519.html","title":"","type":null},"content":[{"type":"text","text":"原文地址","attrs":{}}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"1. Redis 服務模式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Redis 的服務模式主要有三種,分別是單機模式、主從模式、集羣模式。在不同的模式下當機器發生故障時,Redis 應對故障的方法也是不一樣的。其中單機模式下主要是通過備份與恢復解決故障。而主從模式下則是通過哨兵在從節點中重新選舉主節點替換故障節點的工作來實現的,集羣模式下也是通過類似於主從模式的方式重新選舉新的主節點來實現的。區別則是在集羣模式下不需要哨兵來主持選舉了。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2. 單機模式下的備份與恢復","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Redis 支持兩種備份方式,分別是 RDB 備份和 AOF 備份。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.1 RDB 備份","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RDB 持久化支持手動執行,也可以通過配置定期執行。RDB 備份是將某個時間點的 Redis 數據保存到一個 RDB 的文件中,這個文件保存了當前時間點 Redis 數據庫中的數據。當 Redis 服務啓動時,可以指定載入 RDB 文件,恢復 Redis。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有兩個命令可以生成 RDB 文件,分別是 SAVE 和 BGSAVE。其中 SAVE 命令將會阻塞 Redis 服務器,即執行 SAVE 命令期間,Redis 服務將拒絕所有的命令請求。只有在執行結束後,才能正常使用。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而 BGSAVE 則是通過子進程執行的,所以不會阻塞 Redis 服務。而且我們可以手動是的設置自動執行 BGSAVE,可以設置每個多少時間,或者執行多少次寫操作就執行一次 BGSAVE。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.2 AOF 備份","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"AOF 與 RDB 備份方式的區別是,RDB 文件保存的是 Redis 服務器裏具體的數據,而 AOF 備份則保存的是 Redis 的命令。當 Redis 的 AOF 功能是打開的時候,每次 Redis 執行完一個","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"寫命令","attrs":{}},{"type":"text","text":"後,就會將命令寫到 aof_buf 緩衝區的末尾(類似於 Java 的 writer 方法)。每次當 Redis 執行時間事件結束後,都會根據配置決定將 aof_buf 緩衝區的內容刷新到磁盤中去(類似於 Java 的 flush 方法)。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.3 數據恢復","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當 Redis 服務器啓動時可以指定加載 RDB 文件 或者是 AOF 文件。Redis 服務器會優先載入 AOF 文件,如果 AOF 功能沒有啓動的話,纔會自動載入 RDB 文件。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"3. 主從模式下的故障轉移","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Redis 的 SLAVEOF 命令可以讓一個 Redis 服務器去複製另外一個 Redis 服務器。這種模式稱爲","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"主從模式","attrs":{}},{"type":"text","text":",或者叫","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"主從複製","attrs":{}},{"type":"text","text":"。被複制的服務器稱爲主服務器。複製主服務器的服務器稱爲從服務器。主從服務器將保存相同的數據。在主服務器宕機的情況下,可以將從服務器變更爲主服務器繼續執行命令,而且數據不會丟失。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/2b/2bbc9c0e7df6ab0ba48e3c8597370120.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.1 主從複製","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主從複製有兩種情況,一種是第一次開始複製,被稱爲同步(sync),另外一個是命令傳播(command propagate)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"同步","attrs":{}},{"type":"text","text":"是指當執行 slaveof 命令時,從服務器需要同步主服務器的數據,從服務器需要通過 sync 命令來實現。從服務器收到 slaveof 命令後,會向主服務器發送 sync 命令,主服務器收到 sync 命令後,會執行 BGSAVE 命令。在生成文件期間收到的寫命令都會保存到緩衝區。之後會將生成後的 RDB 文件發送給從服務器,從服務器加載 RDB 文件,更新數據庫的數據。然後主服務器會將緩衝區的命令發送給從服務器。從服務器同步緩衝區的數據。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"命令傳播","attrs":{}},{"type":"text","text":"是指在執行完同步過程後,主從服務器的狀態是一致的,但是如果主服務器再次收到寫命令後,主從服務器的狀態又會出現不一致,這個時候主服務器就會把寫命令發送給從服務器,用來保證主從服務器的數據一致。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.2 部分重同步","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有兩種情況下需要執行同步操作,一種是初次收到 slaveof 命令需要執行的同步操作,另外一種情況是在主從服務器斷連後重新連接需要執行的同步操作。在舊版的 Redis 服務器中,斷連後會執行同步操作,但是會執行全量的同步。即重新生成新的 RDB 文件。這樣就會同步大量不需要同步的數據,更好的一種做法是同步斷連期間的數據。所以在新版(v2.8+)的 Redis 數據庫中支持了","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"部分同步的功能","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"新版的 psync 命令支持","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"全量重同步","attrs":{}},{"type":"text","text":"和","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"部分重同步","attrs":{}},{"type":"text","text":"兩種方式,其中全量重同步是用在初次複製的情況下的,而部分重同步則是在出現斷連後重新複製的情況下,當從服務器重新連接主服務器後,主服務器會將斷連期間執行的寫命令重新發送給從服務器。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"部分重同步的實現主要靠三個部分:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主服務器的複製偏移量和從服務器的複製偏移量","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主服務器的複製積壓緩衝區","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務器的運行ID","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"依靠這三個部分,Redis 實現了部分重同步的功能。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.2.1 複製偏移量","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"需要執行復制的主從服務器會分別維護一個複製偏移量,主服務器每次向從服務器傳播 N 個字節時就會在複製偏移量上加 N。而從服務每次收到N個字節後也會將自己的複製偏移量加上 N 。如果主從服務器的狀態一致,那麼複製偏移量總是相等的。如果出現斷線,那麼複製偏移量就會出現不相等的情況。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a9/a9651c5e04a840aab87ea4a9282cfec9.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.2.2 複製積壓緩衝區","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"複製積壓緩衝區是主服務器維護的一個先進先出的隊列。當主服務器執行命令傳播時,不僅僅是將數據發送給從服務,而且也會將數據發送到複製積壓緩衝區,而且複製積壓緩衝區會爲每個字節保留對應的複製偏移量,","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/4f/4faee0759995ae44d31a912301144246.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當從服務器重連之後,從服務器將自己的複製偏移量發送給主服務器,如果從服務器發送過來的複製偏移量在複製緩衝區內,那麼主服務器就會執行","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"部分重同步","attrs":{}},{"type":"text","text":",如果不在複製積壓緩衝區,則會執行","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"全量重同步","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.2.3 服務器ID","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每個從服務器都會記錄主服務器的服務器ID,當斷線重連後,從服務器會將自己保存的服務器ID發送給主服務器,如果主服務器發現與當前的服務器ID不一致,則會直接執行","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"全量重同步","attrs":{}},{"type":"text","text":",如果一致則會使用複製積壓緩衝區嘗試執行","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"部分重同步","attrs":{}},{"type":"text","text":",嘗試失敗則會執行","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"全量重同步","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.2 哨兵","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"哨兵(Sentinel)是 Redis 高可用的解決方案:由一個或多個哨兵實例組成的哨兵系統可以監控任意多個 Redis 實例。並且在監控到主服務器出現宕機、下線等情況可以直接將當前主服務器的某個從服務器自動的升級爲主服務器,由新的主服務器繼續處理請求。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.3 leader 選舉","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當監控到主服務器發生下線情況後,監控這個主服務的多個哨兵實例會選舉出一個 leader 哨兵,由這個 leader 實例負責進行後續的故障轉移流程。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每個哨兵實例都會保存一個被稱爲紀元(epoch)的屬性,每進行一次選舉,不論是否成功,都會自增一次。這個紀元參數實際上就是一個計數器,並不是什麼特別的屬性。在一個配置紀元裏,每個哨兵都有一次機會將某個哨兵設置爲局部領頭 leader,一旦選定,在當前紀元裏不可更改。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每個監控到主服務器發生","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"客觀下線","attrs":{}},{"type":"text","text":"的哨兵,都會要求其他哨兵將自己設置爲局部領頭 leader。如果其他服務器還沒有設置過局部領頭 leader,那麼就會同意這次請求。否則就會拒絕。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以 三個哨兵節點爲例。A、B、C 三個哨兵節點同時監控一個主服務器,B 節點發現主服務器下線,首先會將自己的current_epoch +1 ,然後將局部領頭 leader 設置爲自己,再向 A、C 兩個節點發送選舉命令","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"SENTINEL is-master-down-by-addr","attrs":{}}],"attrs":{}},{"type":"text","text":",並攜帶 current_epoch,和自己的 runid(即服務器ID),希望A,C 節點可以選舉自己爲局部領頭 leader 。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A、C節點收到請求後,首先判斷請求攜帶的 epoch是否大於自己當前的 epoch ,如果大於自己的 epoch,說明當前節點還沒有選擇局部領頭 leader。此時就會將自己的 epoch 更新爲請求中的 epoch,並將自己的 leader_runid 設置成 B,然後返回 leader_runid = B 的結果。說明 A、C 節點選擇了 B 節點作爲局部領頭 leader。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果請求中的 epoch 不大於自己的 epoch,比如此時有一個 D 節點也發現主服務器下線,並且發送命令","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"SENTINEL is-master-down-by-addr","attrs":{}}],"attrs":{}},{"type":"text","text":"給到 A、C。因爲 A、C 當前的 epoch 已經是 1 ,則會拒絕當前請求(返回 leader_runid=B的結果)。相當於告訴 D 節點,已經選舉 B 節點爲局部領頭 leader。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"命令","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"SENTINEL is-master-down-by-addr","attrs":{}}],"attrs":{}},{"type":"text","text":"收到的響應不僅僅會攜帶 leader_runid,還會攜帶 leader_epoch。B 節點收到響應後,首先會判斷 leader_epoch 是否跟當前的 epoch一致,如果一致則判斷 leader_runid 是否跟當前的 runid 一致,如果一致則說明目標哨兵已經將當前節點設置爲局部領頭 leader。完整過程見下圖:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7e/7e2d8383fb3892e51d664f8b26296726.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果某個哨兵被半數以上的哨兵選舉爲局部領頭leader,那麼這個哨兵將成爲領頭 leader,即上圖中的 B 哨兵。如果在一段時間內沒有一個哨兵超過半數,則會再次發起重新選舉。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"後續的轉移過程將由領頭leader執行。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.4 轉移過程","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"轉移過程主要有三步:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"在從服務中挑選出一個合適的從服務器,並將它設置爲主服務器。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"讓所有的從服務器複製新的主服務器","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"將已下線的主服務器設置爲新的主服務器的從服務器","attrs":{}}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.4.1 新服務器的挑選","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"哨兵並不是在從服務器中隨機挑選一個從服務器,而是經過一系列的過濾,然後再根據優先級,選出最高的從服務器。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,哨兵獲取到已下線的主服務器的所有從服務器的列表,然後依次過濾","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"過濾掉所有下線,或者斷線的的從服務器,保證列表裏的服務器都是在線的。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"過濾掉最近 5 秒內沒有回覆過領頭哨兵 INFO 命令的服務器,保證列表裏的服務器通信都是正常的。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"過濾掉與下線的主服務器連接斷開超過 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"down-after-milliseconds*10","attrs":{}}],"attrs":{}},{"type":"text","text":"毫秒的服務器,保證列表裏的服務器斷開時間不長,數據較新。","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後根據從服務器的優先級進行排序,如果有相同的優先級,則根據複製偏移量進行排序,複製偏移量大的在前,如果複製偏移量相同,則根據 runid 進行排序,runid 小的在前。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以最後選出的新服務器就是,優先級最高,複製偏移量最大,runid 最小的從服務。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.4.2 讓從服務器複製新的主服務器","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先會向上面選擇出來的從服務器發送 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"SLAVEOF no one","attrs":{}}],"attrs":{}},{"type":"text","text":"命令,停止複製動作,然後向其他的從服務器發送 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"SLAVEOF ip port","attrs":{}}],"attrs":{}},{"type":"text","text":"命令,讓所有的從服務器複製新的主服務器。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.4.3 將下線的主服務器設置爲從服務器","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲舊的主服務器已經下線了,所以這裏的設置是保存在哨兵的實例裏的,當監聽到就的主服務器從新上線後,哨兵就會發送 SLAVEOF 命令,讓它成爲新的主服務器的從服務器。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"4. 集羣模式下的故障轉移","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一個 Redis 集羣是由多個 Redis 節點構成的,節點間可以相互通信。而每個節點又可以是多個 redis 服務器構成的主從模式。比如,我們可以將 九個節點,三三的構成主從模式(一主兩從),然後將三個主節點構成集羣(三主六從)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/bd/bd1902d4dea309acd6d21900856182b0.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"詳細的過程可以參考 ","attrs":{}},{"type":"link","attrs":{"href":"https://zdran.com/20201219.html","title":"","type":null},"content":[{"type":"text","text":"Redis 集羣內的數據同步","attrs":{}}]},{"type":"text","text":"。下面主要介紹下,在集羣模式下是如何進行故障轉移的。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.1 故障檢測","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"集羣中的每個節點都會保存一份集羣裏所有節點的狀態表。類似於下面這種。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"
ID角色狀態
1000主節點在線
1001從節點在線
2000主節點在線
2001從節點在線
3000主節點在線
3001從節點在線
"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"集羣中的每個節點都會定期的向其他節點發送 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"PING","attrs":{}},{"type":"text","text":" 消息,以此來檢測對方是否下線。如果再規定的時間內沒有收到 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"PONG","attrs":{}},{"type":"text","text":" 消息的響應,就會將該節點標記爲 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"疑似下線(PFAIL)","attrs":{}},{"type":"text","text":"。比如,1000 向 2000 發送 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"PING","attrs":{}},{"type":"text","text":" 消息,如果在一定時間內沒有收到 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"PONG","attrs":{}},{"type":"text","text":" 消息的響應,就會修改自己維護的狀態表,將 2000 標記爲 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"疑似下線(PFAIL)","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"集羣中的各個節點會通過消息,來 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"交換","attrs":{}},{"type":"text","text":" 自己維護的節點狀態表。比如某個節點是在線、下線還是疑似下線。注意是 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"交換","attrs":{}},{"type":"text","text":",而不是 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"同步","attrs":{}},{"type":"text","text":"。如果某個節點收到了其他節點發來的疑似下線狀態,會創建一份","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"下線報告","attrs":{}},{"type":"text","text":"保存下來。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"比如,1000 收到了 2000 發來的消息,2000 標記了 3000 是 疑似下線狀態,那麼。1000 就會創建一份下線報告保存到自己的狀態表裏。這時 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1000","attrs":{}},{"type":"text","text":" 的狀態表結構如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"
ID角色狀態下線報告
1000主節點在線
1001從節點在線
2000主節點在線
2001從節點在線
3000主節點在線[2000上報3000下線]
3001從節點在線
"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果集羣裏半數以上的主節點,都將某個主節點標記爲疑似下線,那麼這個主節點將被標記爲**下線(FAIL)**狀態,將主節點標記爲下線狀態的節點將會向集羣廣播一條消息,所有收到消息的節點,都會將節點標記爲下線狀態。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"比如上面的例子,如果 1000 也將 3000 標記爲疑似下線狀態,並且收到了 2000 的下線報告,那麼 1000 就會將 3000 標記爲下線,並廣播 3000 下線的消息,其他節點收到消息後,也會將 3000 標記爲下線。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.2 leader 選舉","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當從節點發現自己複製的主節點進入下線狀態後,就會觸發進入選舉過程,redis 集羣將會在下線的主節點的從節點中選出一個節點作爲新的主節點。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"選舉過程與上面的哨兵選舉過程非常類似。都是基於 Raft 算法做的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,集羣有一個","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"配置紀元","attrs":{}},{"type":"text","text":" 的計數器。初始值爲0,每進行一次故障轉移就會自增+1。集羣中只有主節點有投票的權利,且每個主節點在一次故障轉移期間只有一次投票的機會。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當集羣中的從節點發現自己複製的主節點下線後,就會廣播一條消息,要求所有收到這條消息且具有投票權利的節點向自己投票。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/92/922ce3989d94c2c31cd35c6c4a663b6e.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果主節點還沒有投票,則會返回 ACK 消息,表示支持該從節點成爲主節點。如果某個從節點收到的票數超過主節點個數的一半,那麼這個從節點將成爲新的主節點。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/07/07f97135a3625386d392bf155570bd62.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.3 瓜分問題","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果集羣中除下線節點外剩餘的節點剛好能夠被從節點瓜分,在極端情況下就會出現所有從節點的票數都不會超過一半,就會造成本次選舉失敗,然後配置紀元+1,進行重新選舉。比如上圖中的例子,如果兩個從節點同時發起投票,那麼就有可能每個從節點都收到一票,導致本次選舉失敗。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"redis集羣爲例儘量避免出現這種情況做了一下優化,在發現主節點下線後,從節點並不會立即發送消息,而是延遲一定的時間再發送選舉消息,這個時間是隨機的。計算公式如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"DELAY = 500 milliseconds \n + random delay between 0 and 500 milliseconds \n + SLAVE_RANK * 1000 milliseconds\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"500ms 加上一個隨機時間,SLAVE_RANK 是複製偏移量的差值,即與主節點的數據差距越小的從節點,等待的時間就會越小,越有可能成爲新的主節點。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.2 轉移過程","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"新選舉出的主節點會首先會執行","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"SLAVEOF no one","attrs":{}}],"attrs":{}},{"type":"text","text":",停止對舊主節點的複製動作。然後會將舊主節點的槽位指派給自己。最後向集羣發送廣播消息,讓集羣中的其他節點知道新的主節點已經接管所有的槽位,舊主節點已經完成下線動作。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這樣就完成了整個的故障轉移過程,新的主節點將開始接受與自己負責的槽位有關的數據。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"5 結束","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Redis 的三種工作方式對於服務的可用性是不斷提高的,最基本的單機模式採用的是RDB和AOF 的備份方式。這種模式下一旦服務宕機,Redis 服務將立即不可用,不過可以手動的通過備份文件恢復。需要人工操作。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而主從模式下,加入了哨兵機制進行監控,當主節點發生宕機後可以進行故障轉移,可以 Redis 服務依然可以使用。缺點是依然只有一個節點處理命令,無法應對高併發的大量請求。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"集羣模式是可用性最高的一種工作模式,而且集羣模式下可以在集羣內部進行故障轉移動作,不需要引入哨兵。是目前常用的一種模式。","attrs":{}}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章