從Paxos到Raft,分佈式一致性算法解析

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、CAP理論和BASE理論"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"理論是指導業界實現的綱領,也是提煉了多年研究的精華,在分佈式一致性領域,最主要的指導理論是CAP和BASE兩個。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. CAP理論"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CAP理論是Eric Brewer教授在2000年提出 的,是描述分佈式一致性的三個維度,分別是指:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"(1)一致性(Consistency)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每次讀操作都能保證返回的是最新數據;在分佈式系統中,如果能針對一個數據項的更新執行成功後,所有的請求都可以讀到其最新的值,這樣的系統就被認爲具有嚴格的一致性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"(2)可用性(Availablity)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"任何一個沒有發生故障的節點,會在合理的時間內返回一個正常的結果,也就是對於每一個請求總能夠在有限時間內返回結果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"(3)分區容忍性(Partition-torlerance)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當節點間出現網絡分區,照樣可以提供滿足一致性和可用性的服務,除非整個網絡環境都發生了故障。那麼,什麼是網絡分區呢?分區是系統中可能發生的故障之一,可能有節點暫時無法提供服務:發生了像 長時間GC、CPU死循環、鏈接池耗盡或者是網絡通信故障等問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/16\/16c423c830343d5e237403e10b63eeec.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CAP指出,"},{"type":"text","marks":[{"type":"strong"}],"text":"一個分佈式系統只能滿足三項中的兩項而不可能滿足全部三項"},{"type":"text","text":"。舉例來看:如下圖所示,假設有五個節點:n1~n5 ,因網絡分區故障被分成兩組:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[n1、n2]和[n3~n5],那麼當n1處理客戶端請求時(爲了支持\"容忍網絡分區\",即支持 P):"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"如果要保證C(一致性),那麼它需要把消息複製到所有節點,但是網絡分區導致無法成功複製到n3~n5,所以它只能返回\"處理失敗\"的結果給客戶端。(這時系統就處於不可用狀態,即喪失了A) "}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"如果要保證可用性A,那麼n1就只能把消息複製到n2,而不用複製到n3~n5(或者無視複製失敗\/超時),但n3同時也可能在處理客戶端請求,n3也爲了保證A而做了同樣的處理。那麼 [n1、n2]和[n3~n5]的狀態就不一致了,於是就喪失了C(一致性)。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"如果不容忍網絡分區,則P(分區容忍性)不能滿足。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/06\/0603f4296dde3ac03a918f0ed1c91d4e.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"既然CAP理論無法全部滿足,爲了指導工業實現,就需要降低標準,因此出現了BASE理論。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2. BASE理論"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"BASE理論是eBay的架構師Dan Pritchett提出的,它的思想是:“即使無法做到強一致性,但每個應用都可以根據自身的業務特點,採用適當的方式來使系統達到最終一致性” 。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"BASE理論有三項指標:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"基本可用(Basically Available)"},{"type":"text","text":":是指分佈式系統在出現不可預知故障的時候,允許損失部分可用性:比如響應時間、功能降級等;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"軟狀態( Soft State)"},{"type":"text","text":":也稱爲弱狀態,是指允許系統中的數據存在中間狀態,並認爲該中間狀態的存在不會影響系統的整體可用性,即允許系統在不同節點之間進行數據同步的過程存在延時;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"最終一致性( Eventual Consistency)"},{"type":"text","text":":強調的是系統中所有的數據副本,在經過一段時間的同步後,最終能夠達成一致的狀態。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"BASE理論面向的是大型高可用、可擴展的分佈式系統。BASE通過犧牲強一致性來獲得可用性,並允許數據短時間內的不一致,但是最終達到一致狀態。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/07\/07d78e24b0e4a8f1e7d63d6fe66dd687.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來,我們一起學習經典的分佈式算法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、經典分佈式算法"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. PAXOS算法"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Paxos算法是Leslie Lamport 在1990年提出的,經典且完備的分佈式一致性算法。Leslie大佬也在這個算法中展示了他不同常人的智商。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該算法的目標是:在不同的節點間,對一個key的取值達成共識(同一個值)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"(1)角色"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"協議將系統中的節點分爲三種角色:Proposer(提案者)、Acceptor(決議者)、Leaner(學習者),他們的具體職責爲:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"提案者:對key的值提出自己的值;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"決議者:對提案者的提議進行投票,選定一個提案,形成最終決策;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"學習者:學習決議者達成共識的決策。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/60\/602396b5472822d57cc394a094ced232.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"(2)約束條件推導"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本小節將通過推導來引出Paxos算法的約束條件,故事靈感來自於分佈式理論:深入淺出Paxos算法【1】,在原文基礎上優化了可讀性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"https:\/\/my.oschina.net\/u\/150175\/blog\/2992187"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在介紹paxos算法協議前,我們先來看一個故事,故事名叫《小明姓什麼》。在孤兒院長大的小明馬上就要步入社會了,但是孤兒院的老師還沒有找到小明的姓氏,因此決定開會解決這個問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在會議上,由提案者P提出姓氏,然後決議者A來決定選擇哪一個。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"場景1"},{"type":"text","text":": Pa向 A1提出“趙”, Pb向A2,A3提出“錢”。現在有超過半數的決策者接受了“錢”,所以最終的決策結果就是“錢”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏引出了一個基礎的約束條件:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"P0. 當集羣中,超過半數的Acceptor接受了一個議案,那我們就可以說這個議案被選定了(Chosen)"}]}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b0\/b08b0ba2327771e7e72ed509719ffad8.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"P0 是完備的條件,但是不實用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"場景2 "},{"type":"text","text":":Pa向 A1提出“趙”, Pb向A2提出“錢”, Pc向A3提出“孫”,這樣就出現了三個決議者各執一票,無法形成多數派決議的局面。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,必須允許一個決策者能夠接受多個議案,後接受的議案可以覆蓋之前的議案。這樣,Pc可以向所有決策者提起議案“孫”,形成一個多數派決議。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/0f\/0fdb3c0ea2a40c28a5f8163a991fb289.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是,現在Pa Pb提案者也可以利用重新提案的能力,來重新提出自己的提案,導致形成的共識重新被推翻。會議亂成了一鍋粥。因此還需要新的約束條件來保證會議正常進行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們提煉一下問題:在不同版本的提案中,選擇一個固定的值作爲全局決議。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Paxos算法就是爲解決這種不一致問題而提出的,算法目標有兩個:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"T1:一次選舉必須要選定一個議案(不能出現所有議案都被拒絕的情況);"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"T2:一次選舉必須只選定一個議案(不能出現兩個議案有不同的值,卻都被選定的情況)。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先介紹一個概念 “"},{"type":"text","marks":[{"type":"strong"}],"text":"編號"},{"type":"text","text":"”:編號在算法中是唯一自增的鍵值,假設第一個提出的議案版本是n1, 接下來提出的第二個議案版本 n2 = n1+1 。有了編號之後,一條提案也變成了(Num, Value)二元組。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/16\/16fb5f9c2666c6b03eaab1266a531d34.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了達成上述目標,算法提出了一組約束條件:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"P1:一個Acceptor必須接受它收到的第一個議案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"P2:如果一個值爲v的議案被選定了,那麼被選定的更大編號的議案,它的值必須也是v。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"a:如果一個值爲v的議案被選定了,那麼Acceptor接受的更大編號的議案,它的值必須也是v。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"b:如果一個值爲v的議案被選定了,那麼Proposer提出的更大編號的議案,它的值必須也是v。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/4a\/4a080d2d322537ae8daf6ee185e737bd.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"P2 和 P2a、P2b 理解起來比較費勁。P2這個約束是爲了解決上述場景2中,形成決議後又被推翻覆蓋的問題。有了約束2之後,當形成了決議“孫”,則後續提出的議案的值必須爲“孫”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼,如何才能滿足P2呢,P2a約束是在場景2的問題決策域提出的。爲了能夠滿足P2a,自然可以在問題的產生域也提出約束, 既P2b。如果算法能夠滿足P2b,也就是將解決“產生一致性問題”的時機提前。所以,P2b是P2a的充分條件,只要能滿足P2b,那P2a就自動滿足。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/82\/82f6a1ddfdaddd97de5129a74134005d.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼,作爲提案者,如何提前知道一個目前的決議(多數派)議案呢?接下來,我們就提出了新的約束P2c:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"P2c: 在所有Acceptor中,任意選取半數以上的集合,稱這個集合爲S。Proposal新的提案Pnew (Nnew , Vnew )必須符合下面兩個條件之一:  1)如果S中所有Acceptor都沒有接受過議案,那麼Pnew的編號保證唯一性和遞增,Pnew的值可以是任意值。  2)如果S中有Acceptor曾經接受過議案,找出編號最大的議案PN (N,V) 。那麼Pnew的編號必須大於N,Pnew的值必須等於V。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在P2c中,引入了多數派集合S。根據集合S中是否有被接受的議案,來推斷全局節點是否已經形成決議。根據約束P0,如果一個議案(Nchosen, Vchosen)是被選中的,那麼它必然是被多於半數節點選中的。那麼,在任意一個多數派集合S中,一定會存在至少一個節點A,它的數據中已經決議的議案是(Nchosen, Vchosen)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"反之,如果多數派集合中沒有被決議的議案,表示此時系統中還沒有一個被選定的決議。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/87\/87462d69cef2314c6439b0bcbdb9fe00.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"看這個例子,假設目前選定了(3, Va)狀態;下次發起提案時,proposer選定A3、4、5作爲集合S進行查詢;提出了編號爲4的提案,在P2c約束條件下,提案必定爲(4,Va)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,滿足P2c的算法也滿足P2b算法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是也可能存在這種情況,當proposer發查詢請求時,還沒有一個被確認的提案,所以發出了(4, Vb)提案,這樣還是會破壞P2b約束。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b3\/b3cd2330b6654dc94e2fb6961b49e4c7.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,當一個議案在提出時(即使已經在發送的半路上了),它必須能夠知道當前已經提出的議案的最大編號。這聽起來很魔幻,畢竟跑在網線上的請求很難被召回。因此,我們提出了新的約束條件;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"P3:議案(n,v)在提出前,必須將自己的編號通知給半數以上的Acceptor。收到通知的Acceptor將n跟自己之前收到的通知進行比較,如果n更大,就將n確認爲最大編號。當半數以上Acceptor確認n是最大編號時,議案(n,v)才能正式提交。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"P4:Acceptor收到一個新的編號n,在確認n比自己之前收到的編號大時,必須承諾不再接受比n小的議案。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了P3的約束,則能夠保證兩個不同編號的議案,不可能同時被認爲是最大編號;同時再加上P4約束,保證了一個決議者不會對自己接受的議案版本進行回退。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"至此,Paxos算法的約束條件就介紹完了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/bd\/bda24a806e32f9d6ec8d65bc6d50759d.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"(3)協議流程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在Paxos中,一個決策過程(Round、Phase)分爲兩個階段:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"phase1(準備階段):"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"Proposer向超過半數(n\/2+1)Acceptor發起prepare消息(發送編號);"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"如果Acceptor 收到一個編號爲 N 的 Prepare 請求,且 N 大於該 Acceptor 已經響應過的所有 Prepare 請求的編號,那麼它就會將它已經接受過的編號最大的提案(如果有的話)作爲響應反饋給 Proposer,同時該Acceptor 承諾不再接受任何編號小於 N 的提案。否則拒絕返回。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"phase2(決議階段或投票階段):"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"如果超過半數Acceptor回覆promise,Proposer向Acceptor發送accept消息。注意此時的V:V 就是收到的響應中編號最大的提案的,如果響應中不包含任何提案,那麼V 就由 Proposer 自己決定;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"Acceptor檢查accept消息是否符合規則,只要該 Acceptor 沒有對編號大於 N 的 Prepare 請求做出過響應,它就接受該提案。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/0c\/0c57cc9442780d53b03a529ea5647551.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"在實際發展中,Paxos算法也演變出一系列變種"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"PBFT(實用拜占庭容錯)算法:是一種共識算法,較高效地解決了拜占庭將軍問題;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Multi-Paxos算法:優化了prepare階段的效率,同時允許多個Leader併發提議;以及FastPaxos、EPaxos等,這些演變是針對某些問題進行的優化,內核思想還是依託於Paxos。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. Raft算法"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Paxos協議從被提出,一直是分佈式一致性算法的標準協議,但是它不易理解,對於工程師來說實現起來很複雜,以至於算法提出近30年都沒有一個完全版的實現方案。業界很多實現都是Paxos-like。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Raft算法是斯坦福大學2017年提出,論文名稱《In Search of an Understandable Consensus Algorithm》,望文生義,該算法的目的是易於理解。Raft這一名字來源於\"Reliable, Replicated, Redundant, And Fault-Tolerant\"(“可靠、可複製、可冗餘、可容錯”)的首字母縮寫。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Raft使用了分治思想把算法流程分爲三個子問題:選舉(Leader election)、日誌複製(Log replication)、安全性(Safety)三個子問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Leader選舉"},{"type":"text","text":":當前leader跪了或集羣初始化的情況下,新leader被選舉出來。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"日誌複製"},{"type":"text","text":":leader必須能夠從客戶端接收請求,然後將它們複製給其他機器,強制它們與自己的數據一致。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"安全性"},{"type":"text","text":":如何保證上述選舉和日誌複製的安全,使系統滿足最終一致性。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"(1)概念介紹"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在Raft中,節點被分爲Leader Follower Cabdidate三種角色:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Leader"},{"type":"text","text":":處理與客戶端的交互和與follower的日誌複製等,一般只有一個Leader;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Follower"},{"type":"text","text":":被動學習Leader的日誌同步,同時也會在leader超時後轉變爲Candidate參與競選;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Candidate"},{"type":"text","text":":在競選期間參與競選;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b1\/b1a40bab271b924a959abbaac49f8182.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Term"},{"type":"text","text":":是有連續單調遞增的編號,每個term開始於選舉階段,一般由選舉階段和領導階段組成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ec\/ec2cdafb4206d759b8bafd342c107737.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"隨機超時時間"},{"type":"text","text":":Follower節點每次收到Leader的心跳請求後,會設置一個隨機的,區間位於[150ms, 300ms)的超時時間。如果超過超時時間,還沒有收到Leader的下一條請求,則認爲Leader過期\/故障了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"心跳續命"},{"type":"text","text":":Leader在當選期間,會以一定時間間隔向其他節點發送心跳請求,以維護自己的Leader地位。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"(2)協議流程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"a. 選舉流程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當某個follower節點在超時時間內未收到Leader的請求,它則認爲當前Leader宕機或者當前無Leader,將發起選舉, 既從一個Follower變成Candidate。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個轉變過程中會發生四件事:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"增加本地節點的Current Term值;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"將狀態切換爲Candidate;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該節點投自己一票;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"批量向其他節點發送拉票請求(RequestVote RPC)。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這個過程中,其他Follower節點收到拉票請求後,需要判斷請求的合法性,然後爲第一個到達的合法拉票請求進行投票。投票過程對於Follower的約束有三點:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在一個任期Term中只能投一張票;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"候選人的Term值大於Current Term,且候選人已執行的Log序號不低於本地Log(Log及已執行的概念見《日誌複製》小節),則拉票請求是合法的;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"只選擇首先到達的合法拉票請求;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果一個Candidate收到了超過半數的投票,則該節點晉升爲Leader,會立刻給所有節點發消息,廣而告之,避免其餘節點觸發新的選舉;開始進行日誌同步、處理客戶端請求等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果一次選舉中,Candidate在選舉超時時間內沒有收到超過半數的投票,也沒有收到其他Leader的請求,則認爲當前Term選舉失敗,進入下一個選舉週期。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/38\/381055e20de903c3a1c1df47724a96cd.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"b. 日誌複製"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在瞭解日誌同步前,需要了解“複製狀態機”這個概念。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"複製狀態機(Replicated state machines)"},{"type":"text","text":"是指:不同節點從相同的初始狀態出發,執行相同順序的輸入指令集後,會得到相同的結束狀態。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"If two identical, deterministic processes begin in the same state and get the same inputs in the same order, they will produce the same output and end in the same state."}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分佈式一致性算法的實現是基於複製狀態機的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在Raft算法中,節點初始化後具有相同初始狀態。爲了提供相同的輸入指令集這個條件,raft將一個客戶端請求(command)封裝到一個log entry中。Leader負責將這些log entries 複製到所有的Follower節點,然後節點按照相同的順序應用commands,達到最終的一致狀態。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當Leader收到客戶端的寫請求,到將執行結果返回給客戶端的這個過程,從Leader視角來看經歷了以下步驟:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本地追加日誌信息;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"並行發出 AppendEntries RPC請求;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"等待大多數Follower的迴應。收到查過半數節點的成功提交回應,代表該日誌被複制到了大多數節點中(committed);"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在狀態機上執行entry command。既將該日誌應用到狀態機,真正影響到節點狀態(applied);"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"迴應Client 執行結果;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"確認Follower也執行了這條command;如果Follower崩潰、運行緩慢或者網絡丟包,Leader將無限期地重試AppendEntries RPC,直到所有Followers應用了所有日誌條目。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/29\/295990d30c020563b7af89322f1578be.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"logs由順序排列的log entry組成 ,每個log entry包含command和產生該log entry時的leader term。從上圖可以看到,五個節點的日誌並不完全一致,raft算法爲了保證高可用,並不是強一致性,而是最終一致性,leader會不斷嘗試給follower發log entries,直到所有節點的log entries都相同。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在前面的流程中,leader只需要日誌被複制到大多數節點即可向客戶端返回,而一旦向客戶端返回成功消息,那麼系統就必須保證log在任何異常的情況下都不會發生回滾。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏推薦兩個Raft算法動畫演示,通過動畫能夠對Raft算法的選舉和日誌複製過程有直觀清晰的認知。"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"raft step by step入門演示: "},{"type":"text","marks":[{"type":"italic"}],"text":"http:\/\/thesecretlivesofdata.com\/raft\/"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"raft 官方演示: "},{"type":"text","marks":[{"type":"italic"}],"text":"https:\/\/raft.github.io\/"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在動畫演示中,可以通過調整時間流速和時間軸來觀察節點間通信。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/8e\/8e016aeb827eb44dd7fba3814a2e8d4a.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"也可以通過單擊節點或者請求包來查看節點的具體狀態和請求包的數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/05\/05ab3c372120f7904b124b06df3a8f70.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. 安全性及約束"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"(1)選舉安全性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"即任一任期內最多一個leader被選出。在一個集羣中任何時刻只能有一個leader。系統中同時有多餘一個leader,被稱之爲腦裂(brain split),這是非常嚴重的問題,會導致數據的覆蓋丟失。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在raft中,通過兩點保證了這個屬性:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一個節點某一任期內最多隻能投一票;而節點B的term必須比A的新,A才能給B投票。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"只有獲得多數投票的節點纔會成爲leader。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"(2)日誌append only"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,leader在某一term的任一位置只會創建一個log entry,且log entry是append-only。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其次,一致性檢查。leader在AppendEntries請求中會包含最新log entry的前一個log的term和index,如果follower在對應的term index找不到日誌,那麼就會告知leader日誌不一致, 然後開始同步自己的日誌。同步時,找到日誌分叉點,然後將leader後續的日誌複製到本地。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"(3)日誌匹配特性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果兩個節點上的某個log entry的log index相同且term相同,那麼在該index之前的所有log entry應該都是相同的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Raft的日誌機制提供兩個保證,統稱爲Log Matching Property:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不同機器的日誌中如果有兩個entry有相同的偏移和term號,那麼它們存儲相同的指令。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果不同機器上的日誌中有兩個相同偏移和term號的日誌,那麼日誌中這個entry之前的所有entry保持一致。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"(4)Leader完備性 "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"被選舉人必須比自己知道的更多(比較term 、log index)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果一個log entry在某個任期被提交(committed),那麼這條日誌一定會出現在所有更高term的leader的日誌裏面 。選舉人必須比自己知道的更多(比較term,log index)如果日誌中含有不同的任期,則選最新的任期的節點;如果最新任期一致,則選最長的日誌的那個節點。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"選舉安全性中包含了Leader完備性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"(5)狀態機安全性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"狀態機安全性由日誌的一致來保證。在算法中,一個日誌被複制到多數節點纔算committed, 如果一個log entry在某個任期被提交(committed),那麼這條日誌一定會出現在所有更高term的leader的日誌裏面。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三、總語"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"    "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分佈式算法已經經歷了近40年的發展(1982- ),湧現出各種各樣的算法。正如Chubby的作者所說:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“這個世界上只有一種一致性算法,那就是 Paxos”"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Paxos算法的貢獻和地位無可替代。理解了Paxos算法,再去理解其他算法,則會勢如破竹。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"    "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Raft算法以其易於理解和實現的特性,推動分佈式一致性算法進入了廣泛應用階段,不再是工程師心中的那顆拿得起,放不下的硃砂痣。🐶"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分佈式一致性算法的前沿理論也在飛速發展着,值得我們繼續學習和關注。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule"},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"頭圖:Unsplash"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作者:董友康"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文:https:\/\/mp.weixin.qq.com\/s\/JfJ8R8qrT-g6G7MDN9Tk6g"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文:從Paxos到Raft,分佈式一致性算法解析"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"來源:雲加社區 - 微信公衆號 [ID:QcloudCommunity]"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"轉載:著作權歸作者所有。商業轉載請聯繫作者獲得授權,非商業轉載請註明出處。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章