Facebook強一致性鍵值存儲ZippyDB架構簡介

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Facebook工程團隊最近發佈了一篇博客文章,"},{"type":"link","attrs":{"href":"https:\/\/engineering.fb.com\/2021\/08\/06\/core-data\/zippydb\/","title":null,"type":null},"content":[{"type":"text","text":"闡述瞭如何構建其通用的鍵值存儲的"}]},{"type":"text","text":",也就是ZippyDB。ZippyDB是Facebook最大的鍵值存儲,已經投入生產環境超過了六年的時間。它爲應用程序在各個方面提供了靈活性,包括可調整的持久性、一致性、可用性以及低延遲保證等方面。ZippyDB的使用場景包括分佈式文件系統的元數據、用於內部和外部目的的事件計數,以及用於各種應用特性的產品數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Facebook的軟件工程師Sarang Masti對創建ZippyDB的動機進行了深入分析:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ZippyDB使用"},{"type":"link","attrs":{"href":"http:\/\/rocksdb.org\/","title":null,"type":null},"content":[{"type":"text","text":"RocksDB"}]},{"type":"text","text":"作爲底層的存儲引擎。在ZippyDB之前,Facebook的各個團隊都直接使用RocksDB來管理他們的數據。這導致每個團隊在解決類似的挑戰時造成了工作的重複,比如一致性、容錯、故障恢復、副本以及容量管理等。爲了解決這些不同團隊的需求,我們創建了ZippyDB,以提供一個高度持久化和一致性的鍵值數據存儲,通過將所有的數據轉移到ZippyDB上並解決管理這種數據相關的挑戰,大大提升了產品開發的速度。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一個ZippyDB部署(叫做“tier”)由分佈到全世界範圍多個區域(region)的計算和存儲資源組成。每個部署都以多租戶的方式託管多個用例。ZippyDB會將屬於某個用例的數據劃分爲分片(shard)。根據配置,它會跨多個區域爲每個分片創建副本,從而實現容錯性,這個過程可以使用"},{"type":"link","attrs":{"href":"https:\/\/en.wikipedia.org\/wiki\/Paxos_%28computer_science%29","title":null,"type":null},"content":[{"type":"text","text":"Paxos"}]},{"type":"text","text":"或異步副本來實現。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/imgopt.infoq.com\/fit-in\/1200x2400\/filters:quality(80)\/filters:no_upscale()\/news\/2021\/09\/facebook-zippydb\/en\/resources\/1ZippyDb-Architecture-1631795724578.jpg","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"圖片來源:"},{"type":"link","attrs":{"href":"https:\/\/engineering.fb.com\/2021\/08\/06\/core-data\/zippydb\/","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"https:\/\/engineering.fb.com\/2021\/08\/06\/core-data\/zippydb\/"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每個分片副本的子集都是某個"},{"type":"link","attrs":{"href":"https:\/\/en.wikipedia.org\/wiki\/Quorum_%28distributed_computing%29","title":null,"type":null},"content":[{"type":"text","text":"quorum"}]},{"type":"text","text":"組的一部分,在這裏數據會被同步複製,從而能夠在出現故障的時候提供高持久性和可用性。如果以follower的形式配置了其他副本的話,將會採用異步複製的方式。Follower能夠讓應用程序擁有多個區域內的副本以支持寬鬆一致性的低延遲讀取,同時能夠保持較小的quorum大小以實現更低的寫入延遲。這種分片內副本角色配置的靈活性能夠讓應用程序根據自身的需要平衡持久性、寫入的性能和讀取的性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ZippyDB爲應用程序提供了可配置的一致性和持久性等級,它們可以在讀取和寫入API中以可選項的形式進行指定。對於寫入來講,ZippyDB默認會將數據持久化到大多數副本的Paxos的日誌中並將數據寫入到主RocksDB上。這樣的話,對於主節點的讀取能夠始終看到最新的寫入。除此之外,它還支持一個更低延遲的快速確認(fast-acknowledge)模式,在這種模式下,在主節點上排隊進行副本操作的時候,寫入就會進行確認。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於讀取來講,ZippyDB支持最終一致、讀取自己的寫入(read-your-write,該模式指的是系統能夠保證一旦某個條目被更新,同一個客戶端發起的任意讀取請求都會返回更新後的數據,參見該"},{"type":"link","attrs":{"href":"https:\/\/arpitbhayani.me\/blogs\/read-your-write-consistency","title":null,"type":null},"content":[{"type":"text","text":"文章"}]},{"type":"text","text":"的闡述——譯者注)和強讀模式。“對於‘讀取自己的寫入’模式,客戶端會緩存服務器在進行寫入時得到的最新序列號,並且會在隨後的讀取查詢中使用該版本號”。ZippyDB在實現強讀取的時候,會將讀取操作路由到主節點上,從而避免與quorum進行對話。“在某些極端的情況下,主節點尚未得到更新的消息,這時候對主節點的強讀就變成了對quorum的檢查和讀取。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/imgopt.infoq.com\/fit-in\/1200x2400\/filters:quality(80)\/filters:no_upscale()\/news\/2021\/09\/facebook-zippydb\/en\/resources\/1ZippyDB-Transactions-1631795724578.jpg","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"圖片來源:"},{"type":"link","attrs":{"href":"https:\/\/engineering.fb.com\/2021\/08\/06\/core-data\/zippydb\/","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"https:\/\/engineering.fb.com\/2021\/08\/06\/core-data\/zippydb\/"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ZippyDB支持事務和條件性的寫入,從而能夠適用於要對一組鍵進行原子讀取-修改-寫入操作的使用場景。Masti介紹了ZippyDB的實現:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所有事務在分片上默認是序列化的,我們不支持更低的隔離級別。這簡化了服務器端的實現,並且便於在客戶端推斷出並行執行事務的正確性。事務使用"},{"type":"link","attrs":{"href":"https:\/\/dl.acm.org\/doi\/10.1145\/568271.223787","title":null,"type":null},"content":[{"type":"text","text":"樂觀併發控制"}]},{"type":"text","text":"來探測和解決衝突,作用原理如上圖所示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ZippyDB中的分片,通常被稱爲物理分片或p分片,是服務器側的數據管理單位。應用程序將其核心空間(key space)劃分爲μshard(微分片)。每個p-shard通常託管着幾萬個μshard。根據Masti的說法,“這個額外的抽象層允許ZippyDB在客戶端不做任何改變的情況下透明地重新分片(reshard)數據”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ZippyDB利用"},{"type":"link","attrs":{"href":"https:\/\/engineering.fb.com\/2018\/10\/08\/core-data\/akkio\/","title":null,"type":null},"content":[{"type":"text","text":"Akkio"}]},{"type":"text","text":"實現p-shard和μshard之間的映射,從而得到了進一步優化。Akkio將μshard放置在信息通常被訪問的地理區域。通過這種方式,Akkio有助於減少數據集的重複,這樣就爲低延遲訪問提供一個比在每個區域放置數據更有效的解決方案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.infoq.com\/news\/2021\/09\/facebook-zippydb\/","title":null,"type":null},"content":[{"type":"text","text":"ZippyDB: The Architecture of Facebook’s Strongly Consistent Key-Value Store"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章