一、問題描述
Mysql主從複製模式中,slave上報錯 “relay log read failure”,導致主從同步停止。
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 10.0.0.93
Master_User: slaveuser
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: dd-bin.002542
Read_Master_Log_Pos: 752973519
Relay_Log_File: dd-relay.002949
Relay_Log_Pos: 950583160
Relay_Master_Log_File: dd-bin.002540
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB: mysql,test,information_schema
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1594
Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
Skip_Counter: 0
Exec_Master_Log_Pos: 950583017
Relay_Log_Space: 2900478067
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1594
Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
Replicate_Ignore_Server_Ids:
Master_Server_Id: 93
1 row in set (0.00 sec)
=============================================
二、原因分析
報錯信息爲從庫“無法讀取relay log 裏的條目”,可能原因爲master庫的binglog錯誤,或slave庫的中繼日誌錯誤。或者爲網絡問題及bug原因。
一般是由於網絡故障或slave庫壓力過大,導致relay-log格式錯誤造成的。找到當前已經同步的時間點,重新設置主從同步,就會產生新的中繼日誌,恢復正常。
三、問題處理
從"show slave status\G"的輸出中,找到如下信息:
Relay_Master_Log_File: dd-bin.002540 //slave庫已讀取的master的binlog
Exec_Master_Log_Pos: 950583017 //在slave上已經執行的position位置點
停掉slave,以slave已經讀取的binlog文件,和已經執行的position爲起點,重新設置同步。會產生新的中繼日誌,問題解決。
(不需要指定host,user,password等,默認使用當前已經設置好的)
mysql>stop slave; mysql>change master to master_log_file='dd-bin.002540' , master_log_pos=950583017; mysql>start slave; |
四、驗證結果
再次查看,錯誤已經解決,slave 開始追 master 的日誌
mysql>show slave status\G
Exec_Master_Log_Pos: 225927489 //slave上已經執行的position已經變化
Seconds_Behind_Master: 58527 //slave 落後主庫的時間,單位秒
過幾秒鐘,再次查看。離與master同步更近了
mysql>show slave status\G
Exec_Master_Log_Pos: 307469867
Seconds_Behind_Master: 29570
五、從relay-log中驗證最後同步日誌position。(作爲驗證,非必須)
還可以從 Relay_Log_File: dd-relay.002949 的記錄找到當前已經同步的position。
使用mysqlbinlog 查看中繼日誌 dd-relay.002949 最後的記錄
#mysqlbinlog dd-relay.002949 >/tmp/relay_log.sql #tail /tmp/relay_log.sql # at 950582947 #140914 3:32:30 server id 93 end_log_pos 950582990 Query thread_id=1256813 exec_time=0 error_code=0 SET TIMESTAMP=1410636750/*!*/; insert into blog_month_post_count (id, `count`) values (34509691, 0) on duplicate key update `count`=values(`count`) /*!*/; # at 950583133 #140914 3:32:30 server id 93 end_log_pos 950583017 Xid = 14033635514 COMMIT/*!*/; # at 950583160 #140914 3:32:30 server id 93 end_log_pos 950583092 Query thread_id=1256815 exec_time=0 error_code=0 SET TIMESTAMP=1410636750/*!*/; BEGIN /*!*/; DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; |
可以看到,中繼日誌中最後一次執行成功的position 爲 950583017,與 Exec_Master_Log_Pos: 950583017 記錄一致。