一次Mysql slave庫恢復實戰記錄

狀況描述:
今天登錄一個mysql數據庫slave節點主機發現/var/lib/mysql下存放大量的mysql-relay-bin文件,最早的文件創建日期甚至是2018年,我記得在slave同步完master的日誌操作記錄後,會刪除這些文件(默認設置不會刪除,我記錯了)。
查看mysql slave狀態,發現如下報錯:

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: *.*.*.*
                  Master_User: dbsync
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000095
          Read_Master_Log_Pos: 869242147
               Relay_Log_File: mysqld-relay-bin.000146
                Relay_Log_Pos: 871280529
        Relay_Master_Log_File: mysql-bin.000075
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: cdb,cdb_admin
          Replicate_Ignore_DB: mysql
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1594
                   Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 871280384
              Relay_Log_Space: 19994786573
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1594
               Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
1 row in set (0.00 sec)

ERROR: 
No query specified

原因:
我在master節點上刪除了名稱爲mysql-bin.00007的文件,其中包括mysql-bin.000075,至此,mysql從庫找不到該文件,是無法同步了。

解決辦法:
1.在slave庫上重新指定同步位置。(不可行)

slave stop;
CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000095',MASTER_LOG_POS=869242147; //mysql master節點上mysql-bin.000095的已有位置
slave start;

slave節點上show slave status,依然報錯,具體的報錯內容沒有複製下來,只記得errno爲1236,Slave_IO_Running進程不運行,Slave_SQL_Running進程運行,大概描述就是某個庫的某個表有問題。
在多次嘗試指定不同的同步位置(報錯的位置,master上mysql-bin-000095剛寫過的位置)依然存在該錯誤。
實際上,表記錄已經有問題,就拿描述中提出的那個表來說,slave庫存放了約1200條記錄,master庫則有1900+的記錄。除非手工將這些數據補上,否則由於記錄操作數據的日誌已經丟失(被我刪除),是找不到最近的一致的日誌操作執行位置的。

  1. 重做slave庫。
    由於數據差異太大,而且我覺得不光一張表出現了數據不一樣的問題,所以乾淨點,把從庫重做。

1)比對master、slave節點庫配置信息,保證一致。(我不知道爲什麼設置了雙主模式,實際上我只有一個實例跑在master節點上啊?)

2)在master、slave節點上查看流量情況(show processlist),保證要重做的slave庫上沒有業務的流量接入。

3)停止master節點上slave進程。(這個停了以後,我就沒開過,不知道有沒有問題,待觀察)

4)記錄master節點上庫的日誌記錄位置,之後備份數據庫:

mysql> show master status;
+------------------+-----------+-------------------------------+------------------+
| File             | Position  | Binlog_Do_DB                  | Binlog_Ignore_DB |
+------------------+-----------+-------------------------------+------------------+
| mysql-bin.000095 | 871760173 | cdb,cdb_admin | mysql            |
+------------------+-----------+-------------------------------+------------------+
1 row in set (0.01 sec)
 mysqldump -u root -p --databases cdb,cdb_admin > bak.master.sql

5)保險起見,備份slave節點庫:
mysqldump -u root -p --databases cdb,cdb_admin > bak.slave.sql

6)重做開始:把master庫備份文件複製到slave節點上,導入該備份文件
mysql -u root -p < bak.master.sql

7)在slave節點上,重新指定讀master日誌的位置:

slave stop;
CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000095',MASTER_LOG_POS=871760173; //POS爲剛纔記錄的master節點日誌記錄位置
slave start;

8)slave節點上 show slave status;此時Slave_IO_Running,Slave_SQL_Running均運行起來了,刷新slave status,Read_Master_Log_Pos數值也開始增加,重新開始同步了。

總結:
清理文件時,要注意mysql-bin文件在master、slave節點日誌讀取和寫的位置啊!刪之前一定要確認日誌位置在master和slave斷已被讀過,不要亂刪,否則搞得從庫無法同步了,就算在slave節點上強行指定master日誌讀取位置或者跳過錯誤,也不排除slave庫上數據丟失的可能。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章