GitHub關係型數據庫垂直分庫實踐

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"十多年前,與當時的大多數Web應用程序一樣,GitHub也是一個使用Ruby on Rails開發的網站,它的大部分數據都保存在MySQL數據庫中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多年來,這個架構經歷了多次迭代,以滿足GitHub的增長和不斷變化的彈性需求。例如,我們單獨將某些功能的數據保存在獨立的MySQL數據庫中;我們增加了讀副本數量,將讀負載分攤到多臺機器上;我們還使用了ProxySQL,減少主MySQL實例打開的連接數。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但不管怎樣,GitHub仍然只有一個主數據庫集羣(我們稱之爲mysql1),這個集羣保存着GitHub核心功能所需的大部分數據,比如用戶信息、代碼倉庫、Issues和拉取請求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着GitHub的增長,這種架構難免會面臨巨大的挑戰。我們努力讓數據庫系統保持合理的大小,並使用更新、更強大的機器。任何一個影響mysql1的故障都會影響所有在這個集羣保存數據的功能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2019年,爲了滿足增長和可用性方面的需求,我們啓動了一個計劃,目標是改進我們對關係型數據庫進行分庫的工具和能力。正如你所想的那樣,這是一項複雜而艱鉅的任務,需要引入和創建各種各樣的工具。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這樣做的結果是,在2021年,數據庫主機的負載降低了50%。這極大減少了與數據庫相關的故障,並提升了GitHub網站的可靠性。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"虛擬分庫"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們引入的第一個概念叫作數據庫模式虛擬分庫。在進行真正的數據庫分表之前,我們要先確保在應用層面能夠將表分開,並且不影響團隊開發新功能或修改已有的功能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲此,我們將數據庫表按照領域進行分組,並使用SQL Linter來分清領域之間的邊界。這樣我們才能安全地進行數據分庫,避免執行跨分庫的查詢和事務。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"模式領域(Schema Domain)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模式領域是我們用來實現虛擬分庫的一個工具。模式領域就是指那些經常一起被用在查詢(例如表連接和子查詢)和事務中的數據庫表的集合。例如,模式領域gists包含了與gists、gist_comments和starred_gists這些功能相關的表。因爲它們具有相關性,所以應該被分在一起,它們合在一起被稱爲一個模式領域。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模式領域之間有清晰的邊界,並暴露出各個功能之間模糊的依賴關係。在Rails應用程序中,這些信息保存在db\/schema-domains.yml配置文件中,如下所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"gists:\n - gist_comments\n - gists\n - starred_gists\nrepositories:\n - issues\n - pull_requests\n - repositories\nusers:\n - avatars\n - gpg_keys\n - public_keys\n - users"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"SQL Linter"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們基於模式領域構建了兩個Linter,用於確保領域之間具有清晰的虛擬邊界。我們在查詢語句上添加註解,就可以識別出那些跨越多個模式領域的查詢和事務,並可以允許一些例外情況。如果一個領域沒有違反這個規則,就可以進行虛擬分庫,它們的物理表就可以被遷移到另一個數據庫集羣中。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Query Linter"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Query Linter用於檢查只有屬於同一個模式領域的表才能被針對同一個數據庫的查詢引用。如果它檢測到查詢中包含來自不同領域的表,就會拋出異常。異常中帶有有用的信息,可以幫助開發人員解決問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲Linter只在開發和測試環境中啓用,開發人員可以在開發過程中發現不合規的查詢。另外,在CI運行期間,Linter可以確保不會有新的不合規查詢被引入。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linter還提供了特殊的"},{"type":"text","marks":[{"type":"strong"}],"text":"\/* cross-schema-domain-query-exempted *\/"},{"type":"text","text":"註釋,用它來註解SQL查詢語句可以允許一些例外情況,將上述的異常忽略掉。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們還給ActiveRecord增加了新方法,這樣添加註釋就更容易了:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"Repository.joins(:owner).annotate(\"cross-schema-domain-query-exempted\")\n# => SELECT * FROM `repositories` INNER JOIN `users` ON `users`.`id` = `repositories.owner_id` \/* cross-schema-domain-query-exempted *\/"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"將所有查詢加上註解,就可以得到需要修改的查詢語句的清單。以下是我們用來解決例外情況的常用方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有時候,我們只需要把表連接查詢拆成單獨的查詢。例如,用ActiveRecord的preload方法取代includes方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另一種比較有挑戰性的情況是has_many :through關係導致需要連接來自不同模式領域的表。對於這種情況,我們提供了通用解決方案:has_many新增了disable_joins選項,告訴ActiveRecord不要執行底層表連接操作,改爲執行多次查詢,並在查詢之間傳遞主鍵值。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在應用層進行數據連接,而不是在數據庫層,這也是一種常見的解決方案。例如,使用兩個單獨的查詢替代INNER JOIN,然後在Ruby中執行“union”操作(例如,A.pluck(:b_id) & B.where(id:...))。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有時候,這樣做會帶來性能上的極大提升。根據數據結構和數據集勢的不同,MySQL的查詢計劃器有時會生成性能較差的查詢執行計劃,而應用層的數據連接可以獲得較穩定的性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與大多數與穩定性和性能相關的變更一樣,這些都用Scientist庫做過實驗。我們對新舊兩種實現進行了實驗對比,可以客觀地評估每一個變更的性能。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Transaction Linter"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了查詢語句之外,事務也是我們的一個關注點。現有的應用程序代碼都是基於一定的數據庫模式。MySQL事務可以保證同一數據庫不同表之間的一致性。如果事務中的查詢所涉及的表被移到其他數據庫中,那就無法保證一致性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了弄清楚需要檢查哪些事務,我們引入了Transaction Linter。與Query Linter類似,它可以確保一個事務所涉及的表都屬於同一個模式領域。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個Linter運行在生產環境中,進行大量的採樣,並將對性能的影響降到最低。結果被收集起來,用於分析哪些地方存在跨領域事務,這樣我們就可以決定是否要更新某些代碼或修改我們的數據模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於那些對事務一致性要求很高的地方,我們將數據抽取到同屬一個模式領域的新表中。這樣可以確保它們位於同一個數據庫集羣中,繼續享有事務一致性保證。這種情況多發生在“多態性”表上,這些表的數據來自不同的模式領域(例如,reactions表保存了來自多個不同功能的數據,如Issues、拉取請求、討論等)。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"不停機遷移數據"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模式領域在經過虛擬分拆之後,就可以進行物理表遷移。爲了進行數據遷移,我們採用了兩種不同的方法:Vitess和寫切換(Write-Cutover)。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Vitess"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Vitess是一個建立在MySQL之上的伸縮層,用於滿足數據分片需求。我們用了它的垂直分片特性,在不停機的情況下將一些表遷移到一起。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們在Kubernetes集羣上部署了Vitess的VTGate。應用程序連接到這些VTGate端點上,而不是直接連接到MySQL。VTGate實現了同樣的MySQL協議,對於應用程序來說與MySQL沒有什麼兩樣。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"VTGate進程通過Vitess的另一個組件VTTablet與MySQL實例發生交互。Vitess的數據表遷移特性是通過VReplication來實現的,這個組件負責在數據庫集羣之間複製數據。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"寫切換"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在2020年初,Vitess的採用還處在早期階段。除此之外,我們還採用了另一種遷移大規模數據表的方法。這樣可以降低依賴單一解決方案所帶來的風險,確保GitHub網站的持續可用性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們利用MySQL的常規復制特性將數據遷移到另一個集羣。在一開始,新集羣被加到舊集羣的複製樹中,然後再用一個腳本快速執行一些變更來實現切換。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/d0\/c2\/d0ed640c3f58e670d2ecc9afd6ff27c2.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"在進行寫切換之前的MySQL集羣"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在運行腳本之前,我們先調整應用程序和數據庫複製結構,將目標集羣cluster_b作爲現有集羣cluster_a的子集羣。我們用ProxySQL實現MySQL主實例之間的多路客戶端連接。cluster_b上的ProxySQL將流量路由到cluster_a的主實例上。有了ProxySQL,我們可以快速改變數據庫的流量路由,將對客戶端(也就是我們的Rails應用程序)的影響降到最低。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於這樣的結構,我們可以很自然地將數據庫連接遷移到cluster_b。所有的讀流量都流向複製了cluster_a主實例數據的主機,所有的寫流量仍然流向cluster_a主實例。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨後,我們開始執行切換腳本:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"開啓cluster_a主實例的只讀模式。這個時候,所有向cluster_a和cluster_b的寫入操作都是不允許的。所有嘗試向數據庫執行寫入操作的Web請求都會失敗,並返回500錯誤。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從cluster_a主實例讀取最後執行的MySQL GTID。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"輪詢cluster_b主實例,確認最後執行的GTID已達到。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"停止從cluster_a到cluster_b的複製。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"更新cluster_b的ProxySQL配置,將流量重定向到cluster_b主實例。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"關閉cluster_a和cluster_b主實例的只讀模式。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大功告成!"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經過精心的準備和調整,我們發現,即使是我們最繁忙的數據庫表,執行完以上6個步驟也只需要幾十毫秒。由於我們是在一天內流量最不繁忙的時間進行切換,因寫入失敗而導致的用戶可感知錯誤非常少。這樣的結果已經超出了我們的預期。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"發現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們通過寫切換來拆分mysql1——我們最初的數據庫主集羣。我們一次性遷移了130張最繁忙的數據庫表,它們爲GitHub的核心功能提供支撐:代碼倉庫、Issues和拉取請求。寫切換是我們用來降低遷移風險的一種策略,讓我們可以使用多種獨立的工具。另外,因爲部署拓撲問題和需要提供讀己之所寫(Read-Your-Write)支持,我們並沒有在所有地方都使用Vitess作爲遷移數據庫表的工具,但我們預計在未來會將它作爲數據遷移的主要工具。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"結果"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在文章簡介裏所提到的mysql1,也就是我們的數據庫主集羣,它保存着GitHub核心功能的大部分數據,比如用戶、代碼倉庫、Issues和拉取請求。從2019年開始,我們逐漸具備了對這個關係型數據庫進行伸縮的能力,並獲得瞭如下結果:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在2019年,mysql1平均每秒處理95萬個查詢,其中90萬個查詢發生在副本上,5萬個發生在主實例上。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在,也就是在2021年,同樣是這些表,它們分佈在不同的集羣中。在兩年之內,它們見證了持續的增長,而且一年比一年快。所有這些集羣的服務器加在一起,平均每秒處理120萬個查詢,其中112萬5千個查詢發生在副本上,7萬5千個發生在主實例上。與此同時,每臺主機的平均負載減少了一半。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這極大減少了與數據庫相關的故障,並提升了GitHub網站的可靠性。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"更多的分庫策略"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了垂直分庫,我們也進行水平分庫(也就是分片)。我們可以將數據庫表拆分到多個集羣中,爲可持續的增長提供支持。我們將在後續文章中分享更多與之相關的工具、Linter和Rails改進的細節內容。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"結論"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在過去的十多年,GitHub學會了如何通過伸縮數據庫來滿足不斷增長的需求。我們通常選擇的是“普通”的技術,這些技術被證明很適合我們的規模,因爲對於我們來說,可靠性是最爲重要的。與此同時,我們也使用一些被業界證明可行的工具,有了這些工具,我們只需要對代碼做簡單的修改,它們爲我們的數據庫在未來增長鋪平了道路。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:https:\/\/github.blog\/2021-09-27-partitioning-githubs-relational-databases-scale\/"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章