分佈式鎖:效率與正確性的衡權

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"提到分佈式鎖,很多人也許會脫口而出 “redis”,可見利用 redis 實現分佈式鎖已被認爲是最佳實踐。這兩天有個同事問我一個問題:“如果某個服務拿着分佈式鎖的時候,redis 實例掛了怎麼辦?重啓以後鎖丟了怎麼辦?利用主從可以嗎?加 fsync 可以嗎?”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此我決定深究這個話題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"備註:本文中,因爲信息源使用的術語不同,Correctness 與 Safety 分別翻譯成正確性和安全性,實際上二者在分佈式鎖話題的範疇中意思相同。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Efficiency & Correctness"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果想讓單機/實例上的多個線程去執行同一組任務,爲了避免任務被重複執行,使用本地環境提供的 Lock 原語即可實現;但如果想讓單機/實例上,或多機/實例上的多個進程去搶同一組任務,就需要分佈式鎖。總體來說,對分佈式鎖的要求可以從兩個角度來考慮:"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"效率 (Efficiency):爲了避免一個任務被執行多次,每個執行方在任務啓動時先搶鎖,在絕大多數情況下能避免重複工作。即便在極其偶然的情況下,分佈式鎖服務故障導致同一時刻有兩個執行方搶到鎖,使得某個任務被執行兩次,總體看來依然無傷大雅。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"正確性 (Correctness):多個任務執行方僅能有一方成功獲取鎖,進而執行任務,否則系統的狀態會被破壞。比如任務執行兩次可能破壞文件結構、丟失數據、產生不一致數據或其它不可逆的問題。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以效率和正確性爲橫軸和縱軸,得到一個直角座標系,那麼任何一個 (分佈式) 鎖解決方案就可以認爲是這個座標系中的一個點:"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8e/8e45f895ecfbff9d1ebded57db6a38c6.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Solutions"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在進入分佈式鎖解決方案之前,必須要明確:"},{"type":"text","marks":[{"type":"strong"}],"text":"分佈式鎖只是某個特定業務需求解決方案的一部分"},{"type":"text","text":",業務功能的真正實現是"},{"type":"text","marks":[{"type":"strong"}],"text":"業務服務"},{"type":"text","text":"、"},{"type":"text","marks":[{"type":"strong"}],"text":"分佈式鎖"},{"type":"text","text":"、"},{"type":"text","marks":[{"type":"strong"}],"text":"存儲服務"},{"type":"text","text":"以及"},{"type":"text","marks":[{"type":"strong"}],"text":"其它有關各方"},{"type":"text","text":"共同努力的結果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本節分別討論"},{"type":"text","marks":[{"type":"strong"}],"text":"側重效率"},{"type":"text","text":"和"},{"type":"text","marks":[{"type":"strong"}],"text":"側重正確性"},{"type":"text","text":"的兩類解決方案,並給出相應的實現思路。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"For Efficiency"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果任務的執行具備冪等性,或即使少量任務重複執行對整體功能沒有影響,開發者就可以選擇側重效率的解決方案。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Design Requirements"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"側重效率解決方案的設計要求可以概括如下:"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"High Efficiency"},{"type":"text","text":":搶鎖、釋放鎖操作高效 (這裏暫不設定具體的 QPS)"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Weak Safety"},{"type":"text","text":":在絕大多數時刻,只要持有時間不超過 TTL,只能有一個 client 能取到鎖"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Liveness"},{"type":"text","text":"Deadlock free:如果 client 搶鎖後崩潰或者出現網絡分區,其它 client 不能永遠等待下去No Fault tolerance/Availability:不需要容錯機制,不需要高可用保證"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Implementation: Single Redis Instance"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你面對的業務場景側重效率,那麼基於單實例 redis 的解決方案就是你的菜。redis 是內存數據庫,請求的執行效率基本能滿足要求。下面介紹的就是 redis 官方提供的解決方案縮略版,你也可以"},{"type":"link","attrs":{"href":"https://redis.io/topics/distlock","title":null},"content":[{"type":"text","text":"閱讀原文"}]},{"type":"text","text":"。"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/bc/bc50e65b61f1e02e0a82608880cabd39.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"resource_name 是分佈式鎖的 id,它的唯一可以保證只有一個 client 可以 SET 成功;NX 表示僅在 resource_name 在 redis 中不存在時才執行,用於保證 Week Safety;PX 30000 表示 30 秒超時,用來保證 Deadlock free;my_random_value 是每個取鎖請求的 id,釋放鎖的邏輯如下:"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7c/7c04fced43d79d9edbf7abb4e63ca3ed.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"釋放鎖時,當且僅當入參與之前加鎖時用的 my_random_value 相等時才刪除相應的鍵值對。my_random_valu 的唯一可以保證釋放鎖 (unlock) 操作的安全性,鎖不會被誤釋放、獲取、惡意釋放。一種誤釋放的場景如下圖所示:"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/34/345c91a9586d3e0750066b4247ffb190.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"client 1 獲取鎖成功。因爲某種原因,如網絡延遲、GC 導致 client 1 執行任務時間超過 TTL"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"client 2 獲取鎖成功"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"client 1 完成任務,執行釋放鎖操作,此時如果沒有 my_random_value,client 2 獲取的鎖將被誤釋放"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"client 3 獲取鎖成功"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此時,client 2 和 3 都認爲自己取鎖成功,違背了 Week Safety。"}]},{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"Fault Tolerance & Availability"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果只有單實例,掛了以後鎖服務就立即不可用。如果實例故障後立即重啓,則鎖可能立即被搶走,鎖的安全性無法保證。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"For Correctness"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果任務的執行不具備冪等性,且系統可能因爲任務重複執行陷入麻煩,開發者就必須考慮側重正確性的解決方案。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Design Requirements"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"側重正確性解決方案的設計要求概括如下:"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Strong Safety: 在任意時刻,只要持有時間不超過 TTL,只能有一個 client 能取到鎖"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"LivenessDeadlock free:如果 client 搶鎖後崩潰或者出現網絡分區,其它 client 不能永遠等待下去Strong Fault Tolerance & Availability:高容錯、高可用"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Implementation: Concensus Service"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在雲原生環境下,一個系統要做到高容錯、高可用,就必須能夠橫向擴容,橫向擴容的結果就是任何在單機/實例下證明有效的算法都要能夠合理地遷移到多實例中。以單實例版 redis 分佈式鎖爲例,要遷移到多實例上,拿掉 Single Point of Failure (SPOF) 就意味着失去 Single Source of Truth (SSOT),詳細討論見 Redlock & Debate 一節。在多實例環境下要實現 Strong Safety,就必須引入經過理論與實踐檢驗的共識算法,如 Raft、Viewstamped Replication、Zab 以及 Paxos,這些算法在 asynchronous model with unreliable failure detectors (翻譯成人話就是"},{"type":"text","marks":[{"type":"strong"}],"text":"算法的正確性不依賴系統中任何事件發生時間"},{"type":"text","text":") 的故障模型下,依然能在一定故障 (少於半數節點故障) 存在的情況下保證系統對一些信息達成共識。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本節就不詳細展開對 Concensus Service 的討論,感興趣的可以閱讀 DDIA 的 Consistency & Consensus 一節,或者翻閱相關論文。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"The Whole Story"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"理想狀況下,開發者利用側重正確性解決方案的最終目的是:"},{"type":"text","marks":[{"type":"strong"}],"text":"在任何情況下每個任務僅被執行一次"},{"type":"text","text":"。僅靠側重正確性的分佈式鎖方案還不夠。考慮這樣一個場景:"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e7/e72c5423b479cbf6596b6eaf01ab2c26.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"client 1 獲取鎖成功,但不巧遇到了 stop-the-world GC 暫停,導致鎖過期"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"client 2 獲取鎖成功,執行任務,併成功寫入結果到 storage 中"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"client 1 GC 結束後,執行任務,併成功寫入結果到 storage 中"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同樣的任務重複執行,client 2 寫入的數據被 client 1 寫入的數據覆蓋,出現更新丟失 (lost updates)。上述場景中,lock service 完全按照要求運行,但結局感人肺腑。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但如果 lock service 能和 storage 配合起來,就能解決更新丟失問題:"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/49/49754fcca2fb73a67315d5763e13bcaf.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"client 1 獲取鎖成功,同時獲得單調自增 token = 33,遇到 stop-the-world GC 暫停,導致鎖過期"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"client 2 獲取鎖成功,同時獲得單調自增 token = 34,執行任務,併成功寫入 storage"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"client 1 GC 結束後,執行任務,寫入 storage 時,後者發現 token 比最新寫入的 token 小,拒絕執行"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從上面這個例子可以看出,擁有一個側重正確性的分佈式鎖解決方案僅僅是一個完整的任務執行去重方案"},{"type":"text","marks":[{"type":"strong"}],"text":"必要條件"},{"type":"text","text":"。這與消息系統中的 exactly-once delivery guarantee 十分類似,exactly-once 語義的正確實現,需要參與各方的正確合作纔可能實現。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Redlock & Debate"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"單實例 redis 架構上的明顯缺陷就是存在單點故障 (SPOF),即如果節點掛了鎖也就丟了,節點重啓後就會有別的 client 搶鎖成功,從而出現兩個 client 同時擁有鎖,且鎖都未過期的情況。我們可以做什麼來改變這點?"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Master/Slave"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"既然存在單點故障,那就增加節點吧!利用 master/slave 的部署形式,master 掛了就將 slave 升級爲 master。但 master 到 slave 的數據同步是異步執行的,這意味着 race condition 可能出現:"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"client A 從 master 上獲取鎖"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"master 在將 kv 同步到 slave 之前故障"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"slave 稱爲 master"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"client B 從新的 master 上獲取鎖,此時 A、B 都擁有鎖"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果 master 已經將 kv 同步到 slave,則 fail-over 能夠有效避免 SPOF 遇到的問題,因此綜合考慮 master/slave 能夠在某種程度上提高鎖服務的容錯能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"也許我們可以使用 "},{"type":"link","attrs":{"href":"https://redis.io/commands/wait","title":null},"content":[{"type":"text","text":"WAIT"}]},{"type":"text","text":" 命令,WAIT 的語義是等待所有 replicas 返回 ack,或者超時,而並非實現共識。master 與 slave 之間的數據同步可能剛好遇到長時間的網絡故障導致 WAIT 超時,因此使用 WAIT 同樣可以提高鎖服務的容錯能力,但額外的等待時間會降低鎖服務的效率,可能得不償失。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Redlock"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Redlock 是 redis 的核心工程師 Salvatore Sanfilippo (antirez) 提出的基於多個 redis master 的分佈式鎖方案。假設整個系統中存在有 N (奇數,以 5 爲例) 個相互獨立的 master 節點,它們之間沒有直接通信行爲,一個 client 想要成功獲取分佈式鎖,需要執行以下步驟:(詳情請閱讀 "},{"type":"link","attrs":{"href":"https://redis.io/topics/distlock","title":null},"content":[{"type":"text","text":"原文"}]},{"type":"text","text":")"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"計算當前時間,精確到毫秒"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"順序地向 N 個節點發送取鎖請求 (單實例解決方案中的 Lock),N 個請求使用相同的 key 和 random value。每個請求的超時時間應該遠遠小於鎖的過期時間,如 5~50 milliseconds 之於 10 seconds。如果遇到某個節點宕機或者請求超時後,client 會立即跳過該節點向下一個節點發送請求。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"當且僅當 client 獲得超過半數節點的 ack 後,同時取鎖消耗的時間(利用步驟 1 的時間與當前時間對比)小於鎖的過期時間,就認爲取鎖成功。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"如果取鎖成功,那麼鎖的"},{"type":"text","marks":[{"type":"strong"}],"text":"實際超時時間 (vt)"},{"type":"text","text":"爲原始超時時間與取鎖消耗時間的差值,即經過 vt 後,其它的 client 就有可能取鎖成功。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":5,"align":null,"origin":null},"content":[{"type":"text","text":"如果 client 未獲得超過半數節點的 ack,則它會嘗試向每個節點發送釋放鎖的請求。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲什麼要按順序取鎖,而不併發取鎖?"},{"type":"text","marks":[{"type":"strong"}],"text":"併發取鎖更容易觸發腦裂 (split brain) "},{"type":"text","text":"。"}]},{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"Asynchronous"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個算法是否依賴於時鐘同步?答案是"},{"type":"text","marks":[{"type":"strong"}],"text":"一定程度上依賴"},{"type":"text","text":"。首先 client 上的時鐘漂移可能導致 TTL 計算不準確,其次有一種極端場景:"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"client 1 從 A、B、C 節點上分別取鎖成功,D、E 訪問超時"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"C 節點上的時鐘發生漂移,導致鎖過期"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"client 2 從 C、D、E 節點上分別取鎖成功,A、B 訪問超時"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"client 1 和 client 2 都認爲自己取鎖成功,違背了 safety 要求"}]}]}]},{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"Retry"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"取鎖的過程可能出現 split brain,導致誰都搶不到鎖,這時候如果所有 clients 都立即重試,split brain 很可能再次出現。解決方案也比較簡單,使用隨機退後的策略即可。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Debate"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Redlock 問世後,引起了 Martin Kleppman 的 "},{"type":"link","attrs":{"href":"http://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html","title":null},"content":[{"type":"text","text":"異議"}]},{"type":"text","text":",而後 Salvatore Sanfilippo 又寫了一篇 "},{"type":"link","attrs":{"href":"http://antirez.com/news/101","title":null},"content":[{"type":"text","text":"文章"}]},{"type":"text","text":" 反駁,可謂是理論與實踐之間的一次碰撞,詳情可點擊鏈接查看原文,這裏給出我的總結:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"回到本文開頭中的座標系,Redlock 實際上是用效率換取正確性的一種分佈式鎖方案,它與其它方案之間的關係大致如下圖所示:"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/21/21f4877cd651f9039d795113485d83fe.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"它不能保證分佈式鎖的 safety 性質。在實踐中,我們通常要麼就側重效率,做好冪等,放棄要求鎖服務絕對正確;要麼就要求鎖服務能保證絕對正確,犧牲效率,很難說服自己去使用一種折衷的解決方案。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Back to the Question"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"回到本文開篇的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"問:“如果某個 client 拿着鎖執行任務時,redis 掛了怎麼辦?”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"答:“看業務要求,大部分場景下掛了就隨它去吧,通過主從、集羣部署、甚至打開 fsync 、使用 WAIT 命令、實現 Redlock,都可以有限地增加鎖服務的容錯能力和可用性,但因爲沒有共識算法的支持,分佈式鎖的絕對正確仍然無法保證。如果不能犧牲正確性,就採用基於共識算法的分佈式鎖服務,但"},{"type":"text","marks":[{"type":"strong"}],"text":"請務必關注具體業務場景的完整解決方案"},{"type":"text","text":"。“"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"References"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"http://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html","title":null},"content":[{"type":"text","text":"Martin Kleppmann: How to do distributed locking"}]}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://redis.io/topics/distlock","title":null},"content":[{"type":"text","text":"redis: Distributed locks with Redis"}]}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"http://antirez.com/news/101","title":null},"content":[{"type":"text","text":"Is Redlock safe?"}]}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"http://courses.csail.mit.edu/6.852/08/papers/CT96-JACM.pdf","title":null},"content":[{"type":"text","text":"Unreliable Failure Detectors for Reliable Distributed Systems"}]}]}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章