三萬倍提升,起飛的 PostgreSQL 主從優化實踐

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、背景介紹"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"騰訊雲數據庫PostgreSQL作爲支撐着騰訊內部大量的業務,這些業務不僅僅包含有正式線上運行的,也包括內部測試開發所使用的數據庫。不同業務有着不同的述求,不同的使用方法會帶來不同的數據庫問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作爲一個數據庫平臺,需要支持各種不同的業務場景,本文重點講述在大量drop的業務場景下所遇到的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當前業務場景因爲其安全要求特別高,對數據的更新特別慎重,不能隨意更新。所以業務架構設計將需要修改的主庫數據通過數據轉換拉取到可編輯的分支庫中。只有在審覈後才合入到主庫當中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/3f\/3f8affa0df5d29be36c3e4ee2bf678a4.png","alt":"Image","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"剛剛我們講到了爲了保證核心空間數據安全性,不能被任意修改,在業務系統中設計了可編輯分支庫和主庫的一套邏輯。具體實現是,不同類型的數據分散存放於不同數據庫實例當中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"當終端採集到的數據需要對主實例數據修改時,不會直接修改主庫數據,會從指定的分支庫中進行變更。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"變更完成後,通過校驗和審覈後,將變更數據同步至主庫實例當中。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"完成數據的merge之後,當前分支庫就有可能不需要了,需要刪除。但是分支實例是可以複用的,所以分支實例保留。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過上述3個步驟最大程度的保證了數據的安全,然後落實到PostgreSQL數據層,意味着就需要分支庫就會不斷的新增表,並且完成更新會丟棄掉這些表。所以數據庫中有着大量的create\/drop表,這就引入了今天要講到的重點—PG內核關於主從同步的痛點。"},{"type":"text","marks":[{"type":"strong"}],"text":"PostgreSQL主從複製在大量處理此類的drop操作的時候會導致日誌堆積,應用變慢"},{"type":"text","text":"的問題。不僅僅是在高可用場景下,拉一個從庫作爲只讀實例也同樣會出現此類情況,一旦遇到此類場景就會出現以下幾種嚴重的後果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據應用慢,主從切換RTO受到嚴重影響,一旦處於業務高峯期,每一秒受到的損失都難以承受。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"只讀實例數據更新緩慢,導致主實例與只讀實例數據不一致,嚴重的還會導致業務出現BUG,導致數據錯亂等問題。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"若主從同步級別爲remote_apply,還會導致主庫hang住,導致主庫的drop同時也變慢,且DDL會持有排他鎖,會導致實例的一系列故障等。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、原理分析"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"關於PostgreSQL的主從複製處理邏輯,大家知道PG備機通過物理複製實現主從同步功能。日誌同步到備機之後,備機會解析wal日誌,來與主庫保持數據一致,而PG備機在恢復一條drop table語句時要做的操作有哪些呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"恢復系統表,例如pg_class,pg_attrbute,pg_type等,相當於移除表的元信息;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"close表對應的文件;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"遍歷buffer中的頁面,如果緩存的是該表的頁面,則標記爲invalid,後面其他進程可以使用該頁面,這裏就調用的前文提到的 DropRelFileNodesAllBuffers;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"發異步失效消息給其他backend,通知該表已刪除;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":5,"align":null,"origin":null},"content":[{"type":"text","text":"刪除表對應的外存文件。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":6,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/71\/712e90fa13ddf3eb02c0b6753cb74478.webp","alt":"Image","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"單看上面的流程圖中感覺挺簡單,但是PG內核在第三步invalid buffer的時候,有一個罪魁禍首就是DropRelFileNodesAllBuffers這個函數。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲這裏PG的實現是需要從頭到尾遍歷整個shard_buffer,查看buffer是否緩存有將要刪除的表的數據,將其標記爲失效。而PG中頁面大小默認爲8K,以shard_buffer大小 16GB 爲例,則一共有 16GB\/8K = 200W個page,每刪除一個表這裏需要循環200萬+次,如果表上面有索引,每個索引也要循環 200萬次(當然如果一個事務內刪除的表比較多,PG做了優化,循環內可以使用二分查找判斷是否是需要淘汰的頁面)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以從業務上看,當存在大量數據導入並且快速刪除表的循環的時候,因爲主庫可以併發執行所以感覺不出性能的影響,但是因爲PG的備庫是單進程的recovery,就會出現主備同步日誌堆積,數據延遲問題的問題,如下圖所示(吉代表G):"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/95\/95ce7a97323dd4f52f328a872791cbab.png","alt":"Image","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三、問題修護"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在官網發佈的補丁和修護計劃中也沒有發現想要修改這個點的一個計劃,所以就只能我們自己開始操刀了。那麼如何解決呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"回到剛剛的流程圖中可以發現,第三部 invaild buffers 這個步驟實際上並非一個串行的操作,和其他步驟沒有什麼聯繫,於是我們做了一個優化,就是"},{"type":"text","marks":[{"type":"strong"}],"text":"將invalid buffers步驟從整體步驟中抽出來,單獨放到一個子進程中去實現,這樣整體消耗日誌的速度就會加快,即可解決日誌堆積的問題"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是按照上面的做法,解決了日誌堆積問題之後,也帶來了另外幾個問題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當清理buffer動作未完成時,最後一步unlink file時就已經完成了,此時數據庫如果正在做checkpoint時,就會去flush buffer 中還未標記爲不可用的page,此時就會導致打開文件錯誤。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當清理buffer動作未完成時,刪除文件執行完成後,又創建了一個和剛剛刪除的文件同名的文件,會導致後續的文件在內存中的映射會被異步的置爲invalid。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼如何解決呢,我們這裏是將recover drop table操作的時候將表信息寫入一個共享的hash表中,當invalid buffer結束時將表從hash表中移除,這樣如果在此過程中發生打開文件失敗,則就檢查是否存在此hash表中即可。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"並且如果在新創建文件的時候也去遍歷一下此隊列,如果隊列中存在同名文件正在invalied buffer,則等待即可。而PG關於表文件命名是一個uint32整數保存,採用的是“全局分配,局部存儲”的方式,即一個實例下的所有數據庫使用一個計數器生成文件號,生成的文件保存在各自庫的目錄下,分配時,如果當前庫下已有同名文件,則嘗試下一個,直到沒有衝突爲止,計數器繞圈後重新開始。所以一個數據庫下面支持的文件數理論上最多爲uint32類型上限,40億左右。表,索引,物化視圖,toast表等都使用該計數器統一編號,所以發生文件名重用的情況可能存在,概率不大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經過優化後,可以明顯發現"},{"type":"text","marks":[{"type":"strong"}],"text":"同類場景下主備同步差異由以前的最高4百多GB下降到了十多MB,主從同步性能增強了3W多倍"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/17\/17b973933913613b4c0f5dc0a79b0942.png","alt":"Image","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"四、結語"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據庫是所有業務的基石,其每一個微小的改動都會對業務造成極大影響。所以在後端優化時每一步都是小心翼翼,本次優化對PostgreSQL數據庫本身性能和能力也是一個極大挑戰,我們克服了種種問題,完美的適應了業務場景。並且此特性在開源版本中仍未進行修改,後續我們繼續優化此類特性,並且計劃將提供至社區中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule"},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"頭圖:Unsplash"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作者:唐陽"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文:https:\/\/mp.weixin.qq.com\/s\/Us0HE0KmO5rxhj8Le70DJA"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文:三萬倍提升,起飛的PostgreSQL主從優化實踐"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"來源:雲加社區 - 微信公衆號 [ID:QcloudCommunity]"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"轉載:著作權歸作者所有。商業轉載請聯繫作者獲得授權,非商業轉載請註明出處。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章