生產一個pg庫停了後,起庫的時候則需要很長時間,記錄一下相應的原理。
- 如backup_label文件不存在(當前沒有在做備份),正情況情況下, 在恢復的開始, 服務器首先讀取
pg_control
,然後讀取檢查點記錄; 接着它通過從檢查點記錄裏標識的日誌位置開始向前掃描執行 REDO操作。 因爲數據頁的所有內容都保存在檢查點之後的第一個頁面修改的日誌裏(假設full_page_writes沒有被禁用), 所以自檢查點以來的所有變化的頁都將被恢復到一個一致的狀態 - 數據庫正做備份,pg庫宕機了,此時數據目錄會生成backup_label文件,則會讀取backup_lable 中的check_point 點,以及備份期間記錄的相應日誌,對於這個文件的描述如下: 見src/backend/access/transam/xlog.c
/*
* read_backup_label: check to see if a backup_label file is present
*
* If we see a backup_label during recovery, we assume that we are recovering
* from a backup dump file, and we therefore roll forward from the checkpoint
* identified by the label file, NOT what pg_control says. This avoids the
* problem that pg_control might have been archived one or more checkpoints
* later than the start of the dump, and so if we rely on it as the start
* point, we will fail to restore a consistent database state.
backup_label 文件的內容如下:
START WAL LOCATION: 472D/82000028 (file 000000060000472D00000082)
CHECKPOINT LOCATION: 472D/82150EB8
BACKUP METHOD: pg_start_backup
BACKUP FROM: master
START TIME: 2020-05-23 07:23:18 HKT
LABEL: 2020-05-23 07:23:17 with pg_rman
在這種情況下,如果有pg_xlog或pg_wal 下面沒有相應的 從pg_start_backup()以來的 日誌啓庫時會報錯,需要確認是不是恢復備份,如果不是則要remove backup_label 文件。
LOG: could not open file "pg_xlog/000000020000000000000084" (log file 0, segment 132): No such file or directory
LOG: invalid checkpoint record
PANIC: could not locate required checkpoint record
HINT: If you are not restoring from a backup, try removing the file "/POSTGRES/data/PG820/backup_label".
生產環境中,如果備份有很多歸檔日誌,起庫的時候則需要很長時間。
參考:https://www.postgresql.org/message-id/D960CB61B694CF459DCFB4B0128514C293CEB7@exadv11.host.magwien.gv.at