如何將 Schemaless 演化成分佈式 SQL 數據庫

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2016 年,我們發表了關於 Schemaless—Uber Engineering 的可擴展數據存儲的博文("},{"type":"link","attrs":{"href":"https:\/\/eng.uber.com\/schemaless-part-one-mysql-datastore\/?fileGuid=0LDCToc14JAt9WIP","title":"","type":null},"content":[{"type":"text","text":"一"}]},{"type":"text","text":"、"},{"type":"link","attrs":{"href":"https:\/\/eng.uber.com\/schemaless-part-two-architecture\/?fileGuid=0LDCToc14JAt9WIP","title":"","type":null},"content":[{"type":"text","text":"二"}]},{"type":"text","text":")。在這兩篇博文中,我們介紹了 Schemaless 的設計,並解釋了開發它的原因。今天這篇文章我們將要講的是 Schemaless 向通用事務性數據庫 Docstore 的演化歷程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Docstore 是一個通用的多模型數據庫,它在分區級別上提供了嚴格的序列化一致性模型,並且可以橫向擴展以滿足高容量工作負載。諸如 Transaction(事務)、Materialized View(物化視圖)、Associations(關聯)和 Change Data Capture(變更數據捕獲)等功能,結合建模的靈活性和豐富的查詢支持,顯著提高了開發人員的工作效率,並縮短了 Uber 新應用的交付時間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Docstore 目前已經投入生產,並服務於業務關鍵用例。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"動機"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/eng.uber.com\/schemaless-part-one-mysql-datastore\/?fileGuid=0LDCToc14JAt9WIP","title":"","type":null},"content":[{"type":"text","text":"Schemaless"}]},{"type":"text","text":"最初被設計爲一個僅有附加的數據存儲。最小的實體被稱爲單元格,它是不可變的。去除可變性降低了系統的複雜性,並使其不易出錯。然而,隨着時間的推移,我們意識到,由於限制性的 API 和建模能力,使得用戶很難將其作爲一個通用的數據庫來使用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Schemaless 的缺點導致了 Cassandra 的推出,它確實提供了很多靈活性和易用性。但是,Cassandra 還有其他缺點。Uber 的數據足跡很大,因此可擴展性和效率必須齊頭並進。在 Uber 的規模下,我們發現,Cassandra 在操作方面不夠成熟,同時它也不能提供理想的效率水平。而 Cassandra 提供的一致性,最終也阻礙了開發人員的工作效率,因爲他們必須圍繞着缺乏強一致性的問題進行設計,這就使得應用架構變得更加複雜。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了開發和運行 Schemaless 和 Cassandra 的第一手經驗,我們得出的結論認爲,將 Schemaless 演化爲一個通用的事務性數據庫是最佳選擇。Schemaless 歷來是一個高度可靠的系統,但現在我們需要關注可用性,同時實現相似或更好的可靠性。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"設計上的考慮"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們並不想構建 NoSQL 系統,相反,我們想實現兩全其美:文檔模型的模式靈活性和傳統關係模型中的模式約束。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了在數據上約束模式,我們在 Docstore 中設計了表。使用數據的應用程序通常採用某種結構。這意味着,它們要麼利用讀時模式(schema-on-read),即應用程序在讀取數據時對數據進行解釋;要麼利用寫時模式 (schema-on-write) ,確保模式是顯式的,而數據庫則確保數據模式的一致性。缺省情況下,我們支持後一種方法“寫時模式”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Docstore 除了上面的模式約束之外,還提供了模式靈活性,而且模式是可以演化的。Docstore 允許共存不同模式的記錄,並且模式更新無需重建全表。稀疏性和對複雜嵌套數據類型的支持是 Docstore 的一流特性。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"功能集"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Docstore 內置了以下功能。它整合了 Uber 軟件生態系統,只需點擊一下按鈕即可進行配置。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/0f\/41\/0f84yy778f2226fdb812d5e1c8e56041.jpg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 1:Docstore 功能"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Docstore 的架構是層次化的,Docstore 的部署稱爲實例。每個實例分爲查詢引擎層、存儲引擎層和控制平面。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/ff\/49\/ff8290e85222386e29efb3f3c2400d49.jpg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 2:Docstrore 層次架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"查詢層是無狀態的,它負責將請求路由到存儲層。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"負責存儲數據的存儲引擎被組織成一組分區,數據分佈在這些分區上。控制平面負責爲 Docstore 分區分配分片,並根據故障事件自適應地調整分片的位置。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Docstore 具有表的概念。表看上去類似於關係型數據庫表,其結構由行、列和值組成。對於 Docstore 中表的建模方式沒有任何限制,Docstore 可以使用用戶定義的類型將嵌套的記錄存儲爲行。舉例來說,如果數據具有與文檔相似的結構,並且整個層次結構只加載一次,那麼這就很有用。Docstore 還支持“關聯”,允許表示一對多和多對多的關係。我們稱之爲“靈活的文檔模型”,因爲它支持對關係型和層次型的數據模型進行建模。在本系列博文的第二部分中,我們將介紹 Docstore 的數據建模。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每個表可以有一個或多個物化視圖。物化視圖是一種視圖,它通過使用不同的列,允許以不同於主表的方式對數據進行分區。增加由非主鍵列進行分區的物化視圖,可以有效地通過該列來查詢數據,並允許不同的查詢訪問模式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每個表都必須有一個主鍵,而主鍵可以由一個或多個列組成。主鍵標識了表中的行,並強制執行唯一約束。從內部看,主鍵和分區鍵列都存儲爲字節數組,並通過對鍵列值進行保序編碼來獲取值。Docstore 按照主鍵值的排序順序存儲行。這種方法與複合分區鍵相結合,可以實現複雜的查詢模式,包括使用給定的分區鍵抓取所有行,或者使用主鍵的剩餘部分來縮小特定查詢的相關行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/25\/09\/25d61850715170be2cefab6dab594d09.jpg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 3:Docstore 表佈局"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當然,對於我們來說,下一步就是在設計過程中實現分片邏輯。表被分片並分佈在多個分片上:對應用程序來說是透明的。每個分片代表表中幾百 GB 的一組行,它被完整地分配到一個分區。一個分區可以包含一個或多個分片。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主要設計考慮是讓應用程序通過選擇鍵來控制數據局部性(data locality)。這就是我們在主鍵之外引入分區鍵的原因。應用程序可以選擇在模式中明確定義分區鍵,否則,Docstore 就會使用主鍵來對數據進行分片。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通常情況下,每個 Docstore 實例中都有多個分區。爲解決單點故障問題,分區是由 3~5 個節點組成的一組,每個節點是一個物理隔離單元,部署在一個獨立的區域中。每個分區都會被複制到多個地理位置,以提供數據中心故障的恢復能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/42\/42ec347cce22bf26cb5d7b97cc1976bf.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 4:Docstore 數據分區"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"複製狀態機"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了保證一致性,每個分區都會運行 Raft 共識協議。有一個領導者和多個跟隨者。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/9a\/04\/9a6d5768781b662c119769b2c86c1904.jpg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 5:Docstore 複製狀態機"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所有的寫入均由領導者發起。執行共識協議以保持分區中各節點複製日誌的一致性。這樣就確保了分區中的所有節點都以相同的順序包含相同的寫入,從而保證了可序列化。只有在達成共識的情況下,在每個節點上運行的狀態機纔會繼續提交寫入。這樣就提供了一個非常好的屬性,即如果對一個鍵的寫入提交成功,則通過同一鍵所有後續的讀取將返回該特定操作或隨後某個寫入操作的相同數據。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一致性模型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Docstore 在分區級別上提供了嚴格的可序列化一致性模型。這樣用戶就可以很好地瞭解到事務是按順序執行的。事務的順序是這樣的:一個事務“A”在事務“B”之前啓動和提交,並且始終發生在事務“B”之前。這樣可以確保讀操作總是從最近的寫操作返回結果。用 CAP 定理的術語來說,Docstore 更傾向於一致性而不是可用性,因此它是一個 CP 系統。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"事務"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Docstore 使用 MySQL 作爲底層數據庫引擎。在複製狀態機中,複製單位是一個 MySQL 事務。所有的操作都在 MySQL 事務的上下文中執行,以保證 ACID 語義。這些事務隨後使用 Raft 共識協議在節點間進行復制。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/69\/79\/6938a4bb13feee583e879acc3d56e179.jpg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 6:事務中的操作序列"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們依靠 MySQL 進行併發控制。要知道,MySQL 依靠行鎖來實現寫操作(插入、更新、刪除)的併發控制,這一點很重要。這樣,MySQL 就有效地序列化了對同一行的併發更新,並且當控制流到達客戶端發出提交時,所有的鎖都已經處理完畢。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過圖 7 的流程圖,我們可以看出事務是在時間上交錯的。在時間軸上,用不同位置的方框表示交錯,也就是不同方框對應着不同時間的“事件”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/20\/ee\/2052ff364c4cc20e2d8fe6980ecc0fee.jpg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 7:交錯插入"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於 Raft 複製狀態機的實現,MySQL 的事務可以以高可用的方式公開給客戶端,即所有的複製體相互協調應用事務,這樣,複製體之間就可以實現自動故障轉移,同時即使發生故障轉移,事務的 ACID 屬性也會保持不變。需要注意的是,由於我們依賴於將 MySQL 的事務公開給客戶端,因此集成了 MySQL 事務的所有優點和約束。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/44\/28\/44917331ae3fb850ebb6eebae95b3728.jpg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 8:Docstore 事務流"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這篇文章中,我們已經闡述了 Docstore 的起源及其背後的動機。此外,我們還深入分析了該架構,並解釋瞭如何在 Docstore 中處理事務。在本系列博文的下一個部分,我們將重點討論數據建模和模式管理。我們將介紹 Docstore 如何支持分層和關係模型,以及哪些類型的應用應該選擇這些數據模型。我們將深入研究 Docstore 中的物化視圖,這是本系列博文的第三部分,也是最後一部分。其中包括動機、物化視圖刷新框架以及我們計劃如何利用物化視圖,儘管在查詢中沒有明確提及。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Ovais Tariq,Uber 核心存儲團隊的高級經理,領導運營存儲平臺組,專注於提供一個世界級的平臺,爲 Uber 所有關鍵業務功能和業務線提供動力。該平臺爲數千萬 QPS 提供服務,可用性達到 99.99% 以上,並存儲了數十個 PB 的運營數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Deba Chatterjee,Uber 基礎設施團隊擔任高級產品經理。在加入 Uber 之前,Deba 曾在數據庫創業公司和甲骨文公司擔任各種產品管理職務。在進入產品管理之前,Deba 負責管理大型數據倉庫的性能。Deba 擁有賓夕法尼亞大學的技術管理碩士學位。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Himank Chaudhary,Uber Docstore 的技術負責人。主要關注領域是構建分佈式數據庫,隨着 Uber 的超速發展而擴展。在加入 Uber 之前,他曾在雅虎的郵件後端團隊建立元數據存儲。Himank 擁有紐約州立大學計算機科學碩士學位,專業爲分佈式系統。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}},{"type":"strong"}],"text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"https:\/\/eng.uber.com\/schemaless-sql-database\/"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章