mysql高可用集羣——MHA架構

原文鏈接:https://www.cnblogs.com/duanxz/p/5063269.html

目錄
1.下載
2.搭建mha
 2.1 系統配置
 2.2 架構
 2.3 添加ssh公鑰信任
 2.4 安裝mha節點
 2.5 manager配置文件
 2.6 檢查
 2.7 啓動manager進程
 2.8 碰到的問題

3.測試切換
 3.1 正常切換測試
 3.2 回切測試
 3.3 雪崩測試
 3.4 主從不一致切換測試


下載
  mha鏈接地址:http://pan.baidu.com/s/1pJkDGX9#dir/path=%2Fmysql%2FHA%2Fmha
      或者:https://code.google.com/p/mysql-master-ha/
     添加一個yum源:wget http://dl.fedoraproject.org/pub/ ... ease-5-4.noarch.rpm
          rpm -ivh epel-release-5-4.noarch.rpm
     Mha分manager節點和node節點。可以下載源碼包,或者rpm包,隨個人喜好,注意,版本與系統內核要匹配。

搭建mha
1.系統配置
  配置如下:

系統 內核 Mysql版本 內存
CentOS release 5.8 Linux 2.6.18-308.el5xen Mysql 5.5.35 2G





2.架構
  服務器列表:

IP 機器名 角色
192.168.2.7 haproxy001 manager
192.168.1.241 Hd-dm-test01 Node
192.168.1.242 Hd-dm-test02 Node
192.168.1.243 Hd-dm-test03 node


  架構圖:
   <ignore_js_op> 

3.添加ssh公鑰信任

 (1)node節點之間配置公鑰信任,Manager節點向node節點添加公鑰信任

 (2)創建公鑰密鑰,並互傳公鑰

[iyunv@haproxy001 ~]$ ssh-keygen -t rsa

  結果如下:

[iyunv@haproxy001 ~]$ ls -l ~/.ssh/
-rw------- 1 leiche leiche 1675 07-04 10:18 id_rsa
-rw-r--r-- 1 leiche leiche  399 07-04 10:18 id_rsa.pub
-rw-r--r-- 1 leiche leiche 1182 07-04 10:27 known_hosts

  配置manager公鑰信任:

[iyunv@haproxy001 ~] ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
[iyunv@haproxy001 ~] ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
[iyunv@haproxy001 ~] ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]

  配置node節點間的公鑰信任:

[iyunv@hd-dm-test01~] ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
[iyunv@hd-dm-test01~] ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]

  Ssh-copy-id加節點信任是追加的方式,在~/.ssh/目錄下生成authorized_keys文件,可以打開查看:

[iyunv@hd-dm-test03 .ssh]# sudo cat authorized_keys 
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA35i+lWMDeWJYvamobI3OdmSFXLMt4ngsI6aiJCkv2ZMaeoKEzdeK2GWP9P/F7dU2j1DCgKh8yzF9o4E713k5KW92RmeyglDt590R0PPNvGDFAauQ5wEkufylYLbaJ0fwsJPjeM4ZBrT7RqdYGn3lGQFW32Cih9LcV2A+8NkQbrh/kG2wcTiaejFQFgSKn87nsokhg5L/zY6qkeuV0dPfqfpeSI8uhTI+VUC83/5odeunXrPE30o6fZZDpp2oszt2TEElldZ6pS9mL3ZhGuP/o/IPe+w5/cK9J4C815y2mr9Agr/UUQoj1K4WWRn5uZ5y/pyvAAlbPteNVVlx4djoIw== root@haproxy001
...

 (3)檢查

[iyunv@haproxy001 ~]# ssh 172.16.1.243 "/sbin/ifconfig |grep 'inet addr' |head -1"
          inet addr:172.16.1.243  Bcast:172.16.1.255  Mask:255.255.255.0

4.安裝mha節點

 在manager服務器2.7上安裝manager節點:

rpm -ivh mha4mysql-manager-0.53-0.noarch.rpm

 在manager和node服務器安裝node節點:

sudo rpm -ivh mha4mysql-node-0.53-0.noarch.rpm

 依賴包:

sudo yum -y install perl-DBD-MySQL.x86_64
sudo yum -y install perl-Log-Dispatch
sudo yum -y install perl-Config-Tiny
sudo yum -y install perl-Parallel-ForkManager

5.配置文件

 在manager服務器上創建目錄

mkdir /etc/masterha
mkdir -p /masterha/app1

 生成配置文件/etc/masterha/app1.cnf


[server default]
#manager dir
manager_workdir=/masterha/app1  
manager_log=/masterha/app1/manager.log  
remote_workdir=/masterha/app1

#mysql manager user
user=root
password=123456 

#node server user
ssh_user=root

#replication_user
repl_user=repluser
repl_password=hoodong

#checking master every second 
ping_interval=1

#promote script
#shutdown_script="" 
#master_ip_failover_script="/usr/local/bin/master_ip_failover" 
#master_ip_online_change_script="" 
#report_script=""

[server1]  
hostname=172.16.1.241  
master_binlog_dir="/data/mysqllog/3306" 
ssh_port=22
candidate_master=1 

[server2]  
hostname=172.16.1.242
master_binlog_dir="/data/mysqllog/3306" 
ssh_port=22
candidate_master=1

[server3]
hostname=172.16.1.243
master_binlog_dir="/data/mysqllog/3306"
ssh_port=22
candidate_master=1



6.檢查:

 檢查ssh


[iyunv@haproxy001 app1]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Sat Jul  5 12:57:24 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Jul  5 12:57:24 2014 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Sat Jul  5 12:57:24 2014 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Sat Jul  5 12:57:24 2014 - [info] Starting SSH connection tests..
Sat Jul  5 12:57:26 2014 - [debug] 
Sat Jul  5 12:57:24 2014 - [debug]  Connecting via SSH from [email protected](172.16.1.241:22) to [email protected](172.16.1.242:22)..
Sat Jul  5 12:57:24 2014 - [debug]   ok.
Sat Jul  5 12:57:24 2014 - [debug]  Connecting via SSH from [email protected](172.16.1.241:22) to [email protected](172.16.1.243:22)..
Sat Jul  5 12:57:26 2014 - [debug]   ok.
Sat Jul  5 12:57:26 2014 - [debug] 
Sat Jul  5 12:57:24 2014 - [debug]  Connecting via SSH from [email protected](172.16.1.242:22) to [email protected](172.16.1.241:22)..
Sat Jul  5 12:57:25 2014 - [debug]   ok.
Sat Jul  5 12:57:25 2014 - [debug]  Connecting via SSH from [email protected](172.16.1.242:22) to [email protected](172.16.1.243:22)..
Sat Jul  5 12:57:26 2014 - [debug]   ok.
Sat Jul  5 12:57:27 2014 - [debug] 
Sat Jul  5 12:57:25 2014 - [debug]  Connecting via SSH from [email protected](172.16.1.243:22) to [email protected](172.16.1.241:22)..
Sat Jul  5 12:57:26 2014 - [debug]   ok.
Sat Jul  5 12:57:26 2014 - [debug]  Connecting via SSH from [email protected](172.16.1.243:22) to [email protected](172.16.1.242:22)..
Sat Jul  5 12:57:27 2014 - [debug]   ok.
Sat Jul  5 12:57:27 2014 - [info] All SSH connection tests passed successfully.



 檢查複製


[iyunv@haproxy001 app1]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Sat Jul  5 12:57:05 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Jul  5 12:57:05 2014 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Sat Jul  5 12:57:05 2014 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Sat Jul  5 12:57:05 2014 - [info] MHA::MasterMonitor version 0.53.
Sat Jul  5 12:57:06 2014 - [info] Dead Servers:
Sat Jul  5 12:57:06 2014 - [info] Alive Servers:
Sat Jul  5 12:57:06 2014 - [info]   172.16.1.241(172.16.1.241:3306)
Sat Jul  5 12:57:06 2014 - [info]   172.16.1.242(172.16.1.242:3306)
Sat Jul  5 12:57:06 2014 - [info]   172.16.1.243(172.16.1.243:3306)
Sat Jul  5 12:57:06 2014 - [info] Alive Slaves:
Sat Jul  5 12:57:06 2014 - [info]   172.16.1.242(172.16.1.242:3306)  Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul  5 12:57:06 2014 - [info]     Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul  5 12:57:06 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Jul  5 12:57:06 2014 - [info]   172.16.1.243(172.16.1.243:3306)  Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul  5 12:57:06 2014 - [info]     Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul  5 12:57:06 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Jul  5 12:57:06 2014 - [info] Current Alive Master: 172.16.1.241(172.16.1.241:3306)
Sat Jul  5 12:57:06 2014 - [info] Checking slave configurations..
Sat Jul  5 12:57:06 2014 - [info] Checking replication filtering settings..
Sat Jul  5 12:57:06 2014 - [info]  binlog_do_db= , binlog_ignore_db= 
Sat Jul  5 12:57:06 2014 - [info]  Replication filtering check ok.
Sat Jul  5 12:57:06 2014 - [info] Starting SSH connection tests..
Sat Jul  5 12:57:09 2014 - [info] All SSH connection tests passed successfully.
Sat Jul  5 12:57:09 2014 - [info] Checking MHA Node version..
Sat Jul  5 12:57:09 2014 - [info]  Version check ok.
Sat Jul  5 12:57:09 2014 - [info] Checking SSH publickey authentication settings on the current master..
Sat Jul  5 12:57:10 2014 - [info] HealthCheck: SSH to 172.16.1.241 is reachable.
Sat Jul  5 12:57:10 2014 - [info] Master MHA Node version is 0.53.
Sat Jul  5 12:57:10 2014 - [info] Checking recovery script configurations on the current master..
Sat Jul  5 12:57:10 2014 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysqllog/3306 --output_file=/masterha/app1/save_binary_logs_test --manager_version=0.53 --start_file=mysql3306.000008 
Sat Jul  5 12:57:10 2014 - [info]   Connecting to [email protected](172.16.1.241).. 
  Creating /masterha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /data/mysqllog/3306, up to mysql3306.000008
Sat Jul  5 12:57:11 2014 - [info] Master setting check done.
Sat Jul  5 12:57:11 2014 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sat Jul  5 12:57:11 2014 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=172.16.1.242 --slave_ip=172.16.1.242 --slave_port=3306 --workdir=/masterha/app1 --target_version=5.5.35-log --manager_version=0.53 --relay_log_info=/data/mysqldata/3306/relay-log.info  --relay_dir=/data/mysqldata/3306/  --slave_pass=xxx
Sat Jul  5 12:57:11 2014 - [info]   Connecting to [email protected](172.16.1.242:22).. 
  Checking slave recovery environment settings..
    Opening /data/mysqldata/3306/relay-log.info ... ok.
    Relay log found at /data/mysqllog/3306, up to mysql-relay-bin.000005
    Temporary relay log file is /data/mysqllog/3306/mysql-relay-bin.000005
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sat Jul  5 12:57:11 2014 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=172.16.1.243 --slave_ip=172.16.1.243 --slave_port=3306 --workdir=/masterha/app1 --target_version=5.5.35-log --manager_version=0.53 --relay_log_info=/data/mysqldata/3306/relay-log.info  --relay_dir=/data/mysqldata/3306/  --slave_pass=xxx
Sat Jul  5 12:57:11 2014 - [info]   Connecting to [email protected](172.16.1.243:22).. 
  Checking slave recovery environment settings..
    Opening /data/mysqldata/3306/relay-log.info ... ok.
    Relay log found at /data/mysqllog/3306, up to mysql-relay-bin.000016
    Temporary relay log file is /data/mysqllog/3306/mysql-relay-bin.000016
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sat Jul  5 12:57:12 2014 - [info] Slaves settings check done.
Sat Jul  5 12:57:12 2014 - [info] 
172.16.1.241 (current master)
+--172.16.1.242
+--172.16.1.243

Sat Jul  5 12:57:12 2014 - [info] Checking replication health on 172.16.1.242..
Sat Jul  5 12:57:12 2014 - [info]  ok.
Sat Jul  5 12:57:12 2014 - [info] Checking replication health on 172.16.1.243..
Sat Jul  5 12:57:12 2014 - [info]  ok.
Sat Jul  5 12:57:12 2014 - [warning] master_ip_failover_script is not defined.
Sat Jul  5 12:57:12 2014 - [warning] shutdown_script is not defined.
Sat Jul  5 12:57:12 2014 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.



7.啓動manager節點進程

 啓動manager節點進程,日誌被計入/data/masterha/manager.log

masterha_manager --conf=/etc/masterha/app1.cnf &

 查看日誌


Sat Jul  5 17:41:15 2014 - [info] MHA::MasterMonitor version 0.53.
Sat Jul  5 17:41:15 2014 - [info] Dead Servers:
Sat Jul  5 17:41:15 2014 - [info] Alive Servers:
Sat Jul  5 17:41:15 2014 - [info]   172.16.1.241(172.16.1.241:3306)
Sat Jul  5 17:41:15 2014 - [info]   172.16.1.242(172.16.1.242:3306)
Sat Jul  5 17:41:15 2014 - [info]   172.16.1.243(172.16.1.243:3306)
Sat Jul  5 17:41:15 2014 - [info] Alive Slaves:
Sat Jul  5 17:41:15 2014 - [info]   172.16.1.242(172.16.1.242:3306)  Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul  5 17:41:15 2014 - [info]     Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul  5 17:41:15 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Jul  5 17:41:15 2014 - [info]   172.16.1.243(172.16.1.243:3306)  Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul  5 17:41:15 2014 - [info]     Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul  5 17:41:15 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Jul  5 17:41:15 2014 - [info] Current Alive Master: 172.16.1.241(172.16.1.241:3306)
Sat Jul  5 17:41:15 2014 - [info] Checking slave configurations..
Sat Jul  5 17:41:15 2014 - [info] Checking replication filtering settings..
Sat Jul  5 17:41:15 2014 - [info]  binlog_do_db= , binlog_ignore_db= 
Sat Jul  5 17:41:15 2014 - [info]  Replication filtering check ok.
Sat Jul  5 17:41:15 2014 - [info] Starting SSH connection tests..
Sat Jul  5 17:41:18 2014 - [info] All SSH connection tests passed successfully.
Sat Jul  5 17:41:18 2014 - [info] Checking MHA Node version..
Sat Jul  5 17:41:19 2014 - [info]  Version check ok.
Sat Jul  5 17:41:19 2014 - [info] Checking SSH publickey authentication settings on the current master..
Sat Jul  5 17:41:20 2014 - [info] HealthCheck: SSH to 172.16.1.241 is reachable.
Sat Jul  5 17:41:20 2014 - [info] Master MHA Node version is 0.53.
Sat Jul  5 17:41:20 2014 - [info] Checking recovery script configurations on the current master..
Sat Jul  5 17:41:20 2014 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysqllog/3306 --output_file=/masterha/app1/save_binary_logs_test --manager_version=0.53 --start_file=mysql3306.000008 
Sat Jul  5 17:41:20 2014 - [info]   Connecting to [email protected](172.16.1.241).. 
  Creating /masterha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /data/mysqllog/3306, up to mysql3306.000008
Sat Jul  5 17:41:20 2014 - [info] Master setting check done.
Sat Jul  5 17:41:20 2014 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sat Jul  5 17:41:20 2014 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=172.16.1.242 --slave_ip=172.16.1.242 --slave_port=3306 --workdir=/masterha/app1 --target_version=5.5.35-log --manager_version=0.53 --relay_log_info=/data/mysqldata/3306/relay-log.info  --relay_dir=/data/mysqldata/3306/  --slave_pass=xxx
Sat Jul  5 17:41:20 2014 - [info]   Connecting to [email protected](172.16.1.242:22).. 
  Checking slave recovery environment settings..
    Opening /data/mysqldata/3306/relay-log.info ... ok.
    Relay log found at /data/mysqllog/3306, up to mysql-relay-bin.000005
    Temporary relay log file is /data/mysqllog/3306/mysql-relay-bin.000005
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sat Jul  5 17:41:21 2014 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=172.16.1.243 --slave_ip=172.16.1.243 --slave_port=3306 --workdir=/masterha/app1 --target_version=5.5.35-log --manager_version=0.53 --relay_log_info=/data/mysqldata/3306/relay-log.info  --relay_dir=/data/mysqldata/3306/  --slave_pass=xxx
Sat Jul  5 17:41:21 2014 - [info]   Connecting to [email protected](172.16.1.243:22).. 
  Checking slave recovery environment settings..
    Opening /data/mysqldata/3306/relay-log.info ... ok.
    Relay log found at /data/mysqllog/3306, up to mysql-relay-bin.000016
    Temporary relay log file is /data/mysqllog/3306/mysql-relay-bin.000016
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sat Jul  5 17:41:21 2014 - [info] Slaves settings check done.
Sat Jul  5 17:41:21 2014 - [info] 
172.16.1.241 (current master)
+--172.16.1.242
+--172.16.1.243

Sat Jul  5 17:41:21 2014 - [warning] master_ip_failover_script is not defined.
Sat Jul  5 17:41:21 2014 - [warning] shutdown_script is not defined.
Sat Jul  5 17:41:21 2014 - [info] Set master ping interval 1 seconds.
Sat Jul  5 17:41:21 2014 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Sat Jul  5 17:41:21 2014 - [info] Starting ping health check on 172.16.1.241(172.16.1.241:3306)..
Sat Jul  5 17:41:21 2014 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..



 查看狀態

[iyunv@haproxy001 app1]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:21576) is running(0:PING_OK), master:172.16.1.241

8.碰到的問題:

問題1:執行ssh檢查時

Address 172.16.1.241 maps to localhost, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT!

解決:在manager服務器/etc/hosts加上其他服務器的解析

問題2:檢查複製時


Fri Jul  4 12:29:25 2014 - [info] Starting SSH connection tests..
cat: /masterha/app1/172.16.1.241_22_ssh_check.log: No such file or directory
Fri Jul  4 12:29:25 2014 - [error][/usr/lib/perl5/vendor_perl/5.8.8/Parallel/ForkManager.pm, ln354] 
cat: /masterha/app1/172.16.1.242_22_ssh_check.log: No such file or directory
Fri Jul  4 12:29:26 2014 - [error][/usr/lib/perl5/vendor_perl/5.8.8/Parallel/ForkManager.pm, ln354] 
cat: /masterha/app1/172.16.1.243_22_ssh_check.log: No such file or directory
Fri Jul  4 12:29:26 2014 - [error][/usr/lib/perl5/vendor_perl/5.8.8/Parallel/ForkManager.pm, ln354] 
Fri Jul  4 12:29:26 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln475] Error happend on checking configurations. SSH Configuration Check Failed!
at /usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm line 339
Fri Jul  4 12:29:26 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln644] Error happened on monitoring servers.
Fri Jul  4 12:29:26 2014 - [info] Got exit code 1 (Not master dead).



解決:剛開始用一個sudo權限用戶執行,因此報錯,換成root用戶,就沒問題了。

問題3:檢查複製時


Sat Jul  5 10:38:33 2014 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=172.16.1.242 --slave_ip=172.16.1.242 --slave_port=3306 --workdir=/masterha/app1 --target_version=5.5.35-log --manager_version=0.53 --relay_log_info=/data/mysqldata/3306/relay-log.info  --relay_dir=/data/mysqldata/3306/  --slave_pass=xxx
Sat Jul  5 10:38:33 2014 - [info]   Connecting to [email protected](172.16.1.242:22).. 
mysqlbinlog version is 3.2 (included in MySQL Client 5.0 or lower). This is not recommended. Consider upgrading MySQL Client to 5.1 or higher.
mysqlbinlog is 3.2 (included in MySQL Client 5.0 or lower), but MySQL server version is 5.5.35-log. mysqlbinlog can not parse row based events. Terminating script for safety reasons.
at /usr/bin/apply_diff_relay_logs line 463
Sat Jul  5 10:38:33 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln371] Slaves settings check failed!
Sat Jul  5 10:38:33 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln248] Slave configuration failed.
Sat Jul  5 10:38:33 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln475] Error happend on checking configurations.  at /usr/bin/masterha_check_repl line 48
Sat Jul  5 10:38:33 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln644] Error happened on monitoring servers.
Sat Jul  5 10:38:33 2014 - [info] Got exit code 1 (Not master dead).



解決:/usr/bin/mysqlbinlog版本過低。


[iyunv@haproxy001 ~]# apply_diff_relay_logs --command=test --slave_user=root --slave_host=172.16.1.242 --slave_ip=172.16.1.242 --slave_port=3306 --workdir=/masterha/app1 --target_version=5.5.35-log --manager_version=0.53 --relay_log_info=/data/mysqldata/3306/relay-log.info  --relay_dir=/data/mysqldata/3306/  --slave_pass=hoodong
mysqlbinlog version is 3.2 (included in MySQL Client 5.0 or lower). This is not recommended. Consider upgrading MySQL Client to 5.1 or higher.
mysqlbinlog is 3.2 (included in MySQL Client 5.0 or lower), but MySQL server version is 5.5.35-log. mysqlbinlog can not parse row based events. Terminating script for safety reasons.
at /usr/bin/apply_diff_relay_logs line 463
[iyunv@haproxy001 ~]# /usr/local/mysql/bin/mysqlbinlog -V
/usr/local/mysql/bin/mysqlbinlog Ver 3.3 for Linux at x86_64
[iyunv@haproxy001 ~]# which mysqlbinlog
/usr/bin/mysqlbinlog
[iyunv@haproxy001 ~]# /usr/bin/mysqlbinlog -V
/usr/bin/mysqlbinlog Ver 3.2 for redhat-linux-gnu at x86_64
[iyunv@haproxy001 ~]# sudo rm -f /usr/bin/mysqlbinlog 
[iyunv@haproxy001 ~]# sudo ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog





測試切換

未配置以下參數,進行切換測試:



#promote script
#shutdown_script="" 
#master_ip_failover_script="/usr/local/bin/master_ip_failover" 
#master_ip_online_change_script="" 
#report_script=""



3.1.正常切換測試

 測試內容:手動停止寫庫241mysql服務,觀察主從狀況,觀察manager進程狀況。

 3.1.1停止主寫241mysql端口

[iyunv@hd-dm-test01 ~]# sudo /etc/init.d/mysqld3306 stop
Stopping MySQL:                                            [  OK  ]

 3.1.2觀察manager日誌,記錄切換時間


[iyunv@haproxy001 ~]# sudo tail -f /masterha/app1/manager.log 
172.16.1.241 (current master)
+--172.16.1.242
+--172.16.1.243

Sat Jul  5 17:41:21 2014 - [warning] master_ip_failover_script is not defined.
Sat Jul  5 17:41:21 2014 - [warning] shutdown_script is not defined.
Sat Jul  5 17:41:21 2014 - [info] Set master ping interval 1 seconds.
Sat Jul  5 17:41:21 2014 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Sat Jul  5 17:41:21 2014 - [info] Starting ping health check on 172.16.1.241(172.16.1.241:3306)..
Sat Jul  5 17:41:21 2014 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Sat Jul  5 18:40:21 2014 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Sat Jul  5 18:40:21 2014 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysqllog/3306 --output_file=/masterha/app1/save_binary_logs_test --manager_version=0.53 --binlog_prefix=mysql3306
Sat Jul  5 18:40:21 2014 - [info] HealthCheck: SSH to 172.16.1.241 is reachable.
Sat Jul  5 18:40:22 2014 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Sat Jul  5 18:40:22 2014 - [warning] Connection failed 1 time(s)..
Sat Jul  5 18:40:23 2014 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Sat Jul  5 18:40:23 2014 - [warning] Connection failed 2 time(s)..
Sat Jul  5 18:40:24 2014 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Sat Jul  5 18:40:24 2014 - [warning] Connection failed 3 time(s)..
Sat Jul  5 18:40:24 2014 - [warning] Master is not reachable from health checker!
Sat Jul  5 18:40:24 2014 - [warning] Master 172.16.1.241(172.16.1.241:3306) is not reachable!
Sat Jul  5 18:40:24 2014 - [warning] SSH is reachable.
Sat Jul  5 18:40:24 2014 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..
Sat Jul  5 18:40:24 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Jul  5 18:40:24 2014 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Sat Jul  5 18:40:24 2014 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Sat Jul  5 18:40:24 2014 - [info] Dead Servers:
Sat Jul  5 18:40:24 2014 - [info]   172.16.1.241(172.16.1.241:3306)
Sat Jul  5 18:40:24 2014 - [info] Alive Servers:
Sat Jul  5 18:40:24 2014 - [info]   172.16.1.242(172.16.1.242:3306)
Sat Jul  5 18:40:24 2014 - [info]   172.16.1.243(172.16.1.243:3306)
Sat Jul  5 18:40:24 2014 - [info] Alive Slaves:
Sat Jul  5 18:40:24 2014 - [info]   172.16.1.242(172.16.1.242:3306)  Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul  5 18:40:24 2014 - [info]     Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul  5 18:40:24 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Jul  5 18:40:24 2014 - [info]   172.16.1.243(172.16.1.243:3306)  Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul  5 18:40:24 2014 - [info]     Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul  5 18:40:24 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Jul  5 18:40:24 2014 - [info] Checking slave configurations..
Sat Jul  5 18:40:24 2014 - [info] Checking replication filtering settings..
Sat Jul  5 18:40:24 2014 - [info]  Replication filtering check ok.
Sat Jul  5 18:40:24 2014 - [info] Master is down!
Sat Jul  5 18:40:24 2014 - [info] Terminating monitoring script.
Sat Jul  5 18:40:24 2014 - [info] Got exit code 20 (Master dead).
Sat Jul  5 18:40:24 2014 - [info] MHA::MasterFailover version 0.53.
Sat Jul  5 18:40:24 2014 - [info] Starting master failover.
Sat Jul  5 18:40:24 2014 - [info] 
Sat Jul  5 18:40:24 2014 - [info] * Phase 1: Configuration Check Phase..
Sat Jul  5 18:40:24 2014 - [info] 
Sat Jul  5 18:40:24 2014 - [info] Dead Servers:
Sat Jul  5 18:40:24 2014 - [info]   172.16.1.241(172.16.1.241:3306)
Sat Jul  5 18:40:24 2014 - [info] Checking master reachability via mysql(double check)..
Sat Jul  5 18:40:24 2014 - [info]  ok.
Sat Jul  5 18:40:24 2014 - [info] Alive Servers:
Sat Jul  5 18:40:24 2014 - [info]   172.16.1.242(172.16.1.242:3306)
Sat Jul  5 18:40:24 2014 - [info]   172.16.1.243(172.16.1.243:3306)
Sat Jul  5 18:40:24 2014 - [info] Alive Slaves:
Sat Jul  5 18:40:24 2014 - [info]   172.16.1.242(172.16.1.242:3306)  Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul  5 18:40:24 2014 - [info]     Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul  5 18:40:24 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Jul  5 18:40:24 2014 - [info]   172.16.1.243(172.16.1.243:3306)  Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul  5 18:40:24 2014 - [info]     Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul  5 18:40:24 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Jul  5 18:40:24 2014 - [info] ** Phase 1: Configuration Check Phase completed.
Sat Jul  5 18:40:24 2014 - [info] 
Sat Jul  5 18:40:24 2014 - [info] * Phase 2: Dead Master Shutdown Phase..
Sat Jul  5 18:40:24 2014 - [info] 
Sat Jul  5 18:40:24 2014 - [info] Forcing shutdown so that applications never connect to the current master..
Sat Jul  5 18:40:24 2014 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master ip address.
Sat Jul  5 18:40:24 2014 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Sat Jul  5 18:40:24 2014 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Sat Jul  5 18:40:24 2014 - [info] 
Sat Jul  5 18:40:24 2014 - [info] * Phase 3: Master Recovery Phase..
Sat Jul  5 18:40:24 2014 - [info] 
Sat Jul  5 18:40:24 2014 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Sat Jul  5 18:40:24 2014 - [info] 
Sat Jul  5 18:40:24 2014 - [info] The latest binary log file/position on all slaves is mysql3306.000008:157085
Sat Jul  5 18:40:24 2014 - [info] Latest slaves (Slaves that received relay log files to the latest):
Sat Jul  5 18:40:24 2014 - [info]   172.16.1.242(172.16.1.242:3306)  Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul  5 18:40:24 2014 - [info]     Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul  5 18:40:24 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Jul  5 18:40:24 2014 - [info]   172.16.1.243(172.16.1.243:3306)  Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul  5 18:40:24 2014 - [info]     Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul  5 18:40:24 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Jul  5 18:40:24 2014 - [info] The oldest binary log file/position on all slaves is mysql3306.000008:157085
Sat Jul  5 18:40:24 2014 - [info] Oldest slaves:
Sat Jul  5 18:40:24 2014 - [info]   172.16.1.242(172.16.1.242:3306)  Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul  5 18:40:24 2014 - [info]     Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul  5 18:40:24 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Jul  5 18:40:24 2014 - [info]   172.16.1.243(172.16.1.243:3306)  Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul  5 18:40:24 2014 - [info]     Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul  5 18:40:24 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Jul  5 18:40:24 2014 - [info] 
Sat Jul  5 18:40:24 2014 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Sat Jul  5 18:40:24 2014 - [info] 
Sat Jul  5 18:40:24 2014 - [info] Fetching dead master's binary logs..
Sat Jul  5 18:40:24 2014 - [info] Executing command on the dead master 172.16.1.241(172.16.1.241:3306): save_binary_logs --command=save --start_file=mysql3306.000008  --start_pos=157085 --binlog_dir=/data/mysqllog/3306 --output_file=/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53
  Creating /masterha/app1 if not exists..    ok.
Concat binary/relay logs from mysql3306.000008 pos 157085 to mysql3306.000008 EOF into /masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog ..
  Dumping binlog format description event, from position 0 to 107.. ok.
  Dumping effective binlog data from /data/mysqllog/3306/mysql3306.000008 position 157085 to tail(157104).. ok.
Concat succeeded.
Sat Jul  5 18:40:26 2014 - [info] scp from [email protected]:/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog to local:/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog succeeded.
Sat Jul  5 18:40:26 2014 - [info] HealthCheck: SSH to 172.16.1.242 is reachable.
Sat Jul  5 18:40:27 2014 - [info] HealthCheck: SSH to 172.16.1.243 is reachable.
Sat Jul  5 18:40:27 2014 - [info] 
Sat Jul  5 18:40:27 2014 - [info] * Phase 3.3: Determining New Master Phase..
Sat Jul  5 18:40:27 2014 - [info] 
Sat Jul  5 18:40:27 2014 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Sat Jul  5 18:40:27 2014 - [info] All slaves received relay logs to the same position. No need to resync each other.
Sat Jul  5 18:40:27 2014 - [info] Searching new master from slaves..
Sat Jul  5 18:40:27 2014 - [info]  Candidate masters from the configuration file:
Sat Jul  5 18:40:27 2014 - [info]   172.16.1.242(172.16.1.242:3306)  Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul  5 18:40:27 2014 - [info]     Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul  5 18:40:27 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Jul  5 18:40:27 2014 - [info]   172.16.1.243(172.16.1.243:3306)  Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul  5 18:40:27 2014 - [info]     Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul  5 18:40:27 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Jul  5 18:40:27 2014 - [info]  Non-candidate masters:
Sat Jul  5 18:40:27 2014 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Sat Jul  5 18:40:27 2014 - [info] New master is 172.16.1.242(172.16.1.242:3306)
Sat Jul  5 18:40:27 2014 - [info] Starting master failover..
Sat Jul  5 18:40:27 2014 - [info] 
From:
172.16.1.241 (current master)
+--172.16.1.242
+--172.16.1.243

To:
172.16.1.242 (new master)
+--172.16.1.243
Sat Jul  5 18:40:27 2014 - [info] 
Sat Jul  5 18:40:27 2014 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Sat Jul  5 18:40:27 2014 - [info] 
Sat Jul  5 18:40:27 2014 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Sat Jul  5 18:40:27 2014 - [info] Sending binlog..
Sat Jul  5 18:40:28 2014 - [info] scp from local:/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog to[email protected]:/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog succeeded.
Sat Jul  5 18:40:28 2014 - [info] 
Sat Jul  5 18:40:28 2014 - [info] * Phase 3.4: Master Log Apply Phase..
Sat Jul  5 18:40:28 2014 - [info] 
Sat Jul  5 18:40:28 2014 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Sat Jul  5 18:40:28 2014 - [info] Starting recovery on 172.16.1.242(172.16.1.242:3306)..
Sat Jul  5 18:40:28 2014 - [info]  Generating diffs succeeded.
Sat Jul  5 18:40:28 2014 - [info] Waiting until all relay logs are applied.
Sat Jul  5 18:40:28 2014 - [info]  done.
Sat Jul  5 18:40:28 2014 - [info] Getting slave status..
Sat Jul  5 18:40:28 2014 - [info] This slave(172.16.1.242)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql3306.000008:157085). No need to recover from Exec_Master_Log_Pos.
Sat Jul  5 18:40:28 2014 - [info] Connecting to the target slave host 172.16.1.242, running recover script..
Sat Jul  5 18:40:28 2014 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=172.16.1.242 --slave_ip=172.16.1.242  --slave_port=3306 --apply_files=/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog --workdir=/masterha/app1 --target_version=5.5.35-log --timestamp=20140705184024 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
Sat Jul  5 18:40:28 2014 - [info] 
Applying differential binary/relay log files /masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog on 172.16.1.242:3306. This may take long time...
Applying log files succeeded.
Sat Jul  5 18:40:28 2014 - [info]  All relay logs were successfully applied.
Sat Jul  5 18:40:28 2014 - [info] Getting new master's binlog name and position..
Sat Jul  5 18:40:28 2014 - [info]  mysql3306.000006:107
Sat Jul  5 18:40:28 2014 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.1.242', MASTER_PORT=3306, MASTER_LOG_FILE='mysql3306.000006', MASTER_LOG_POS=107, MASTER_USER='repluser', MASTER_PASSWORD='xxx';
Sat Jul  5 18:40:28 2014 - [warning] master_ip_failover_script is not set. Skipping taking over new master ip address.
Sat Jul  5 18:40:28 2014 - [info] Setting read_only=0 on 172.16.1.242(172.16.1.242:3306)..
Sat Jul  5 18:40:28 2014 - [info]  ok.
Sat Jul  5 18:40:28 2014 - [info] ** Finished master recovery successfully.
Sat Jul  5 18:40:28 2014 - [info] * Phase 3: Master Recovery Phase completed.
Sat Jul  5 18:40:28 2014 - [info] 
Sat Jul  5 18:40:28 2014 - [info] * Phase 4: Slaves Recovery Phase..
Sat Jul  5 18:40:28 2014 - [info] 
Sat Jul  5 18:40:28 2014 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Sat Jul  5 18:40:28 2014 - [info] 
Sat Jul  5 18:40:28 2014 - [info] -- Slave diff file generation on host 172.16.1.243(172.16.1.243:3306) started, pid: 25375. Check tmp log /masterha/app1/172.16.1.243_3306_20140705184024.log if it takes time..
Sat Jul  5 18:40:28 2014 - [info] 
Sat Jul  5 18:40:28 2014 - [info] Log messages from 172.16.1.243 ...
Sat Jul  5 18:40:28 2014 - [info] 
Sat Jul  5 18:40:28 2014 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Sat Jul  5 18:40:28 2014 - [info] End of log messages from 172.16.1.243.
Sat Jul  5 18:40:28 2014 - [info] -- 172.16.1.243(172.16.1.243:3306) has the latest relay log events.
Sat Jul  5 18:40:28 2014 - [info] Generating relay diff files from the latest slave succeeded.
Sat Jul  5 18:40:28 2014 - [info] 
Sat Jul  5 18:40:28 2014 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Sat Jul  5 18:40:28 2014 - [info] 
Sat Jul  5 18:40:28 2014 - [info] -- Slave recovery on host 172.16.1.243(172.16.1.243:3306) started, pid: 25377. Check tmp log /masterha/app1/172.16.1.243_3306_20140705184024.log if it takes time..
Sat Jul  5 18:40:30 2014 - [info] 
Sat Jul  5 18:40:30 2014 - [info] Log messages from 172.16.1.243 ...
Sat Jul  5 18:40:30 2014 - [info] 
Sat Jul  5 18:40:28 2014 - [info] Sending binlog..
Sat Jul  5 18:40:29 2014 - [info] scp from local:/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog to[email protected]:/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog succeeded.
Sat Jul  5 18:40:29 2014 - [info] Starting recovery on 172.16.1.243(172.16.1.243:3306)..
Sat Jul  5 18:40:29 2014 - [info]  Generating diffs succeeded.
Sat Jul  5 18:40:29 2014 - [info] Waiting until all relay logs are applied.
Sat Jul  5 18:40:29 2014 - [info]  done.
Sat Jul  5 18:40:29 2014 - [info] Getting slave status..
Sat Jul  5 18:40:29 2014 - [info] This slave(172.16.1.243)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql3306.000008:157085). No need to recover from Exec_Master_Log_Pos.
Sat Jul  5 18:40:29 2014 - [info] Connecting to the target slave host 172.16.1.243, running recover script..
Sat Jul  5 18:40:29 2014 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=172.16.1.243 --slave_ip=172.16.1.243  --slave_port=3306 --apply_files=/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog --workdir=/masterha/app1 --target_version=5.5.35-log --timestamp=20140705184024 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
Sat Jul  5 18:40:30 2014 - [info] 
Applying differential binary/relay log files /masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog on 172.16.1.243:3306. This may take long time...
Applying log files succeeded.
Sat Jul  5 18:40:30 2014 - [info]  All relay logs were successfully applied.
Sat Jul  5 18:40:30 2014 - [info]  Resetting slave 172.16.1.243(172.16.1.243:3306) and starting replication from the new master 172.16.1.242(172.16.1.242:3306)..
Sat Jul  5 18:40:30 2014 - [info]  Executed CHANGE MASTER.
Sat Jul  5 18:40:30 2014 - [info]  Slave started.
Sat Jul  5 18:40:30 2014 - [info] End of log messages from 172.16.1.243.
Sat Jul  5 18:40:30 2014 - [info] -- Slave recovery on host 172.16.1.243(172.16.1.243:3306) succeeded.
Sat Jul  5 18:40:30 2014 - [info] All new slave servers recovered successfully.
Sat Jul  5 18:40:30 2014 - [info] 
Sat Jul  5 18:40:30 2014 - [info] * Phase 5: New master cleanup phease..
Sat Jul  5 18:40:30 2014 - [info] 
Sat Jul  5 18:40:30 2014 - [info] Resetting slave info on the new master..
Sat Jul  5 18:40:30 2014 - [info]  172.16.1.242: Resetting slave info succeeded.
Sat Jul  5 18:40:30 2014 - [info] Master failover to 172.16.1.242(172.16.1.242:3306) completed successfully.
Sat Jul  5 18:40:30 2014 - [info] 

----- Failover Report -----

app1: MySQL Master failover 172.16.1.241 to 172.16.1.242 succeeded

Master 172.16.1.241 is down!

Check MHA Manager logs at haproxy001:/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
The latest slave 172.16.1.242(172.16.1.242:3306) has all relay logs for recovery.
Selected 172.16.1.242 as a new master.
172.16.1.242: OK: Applying all logs succeeded.
172.16.1.243: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
172.16.1.243: OK: Applying all logs succeeded. Slave started, replicating from 172.16.1.242.
172.16.1.242: Resetting slave info succeeded.
Master failover to 172.16.1.242(172.16.1.242:3306) completed successfully.



 耗時10s。242提升爲主,查看242主從狀態


mysql> show slave status\G;
Empty set (0.00 sec)

ERROR: 
No query specified

mysql> show master status\G;
*************************** 1. row ***************************
            File: mysql3306.000009
        Position: 107
    Binlog_Do_DB: 
Binlog_Ignore_DB: 
1 row in set (0.00 sec)

ERROR: 
No query specified



 242主從信息被清空。查看243主從狀態。


mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.16.1.242
                  Master_User: repluser
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql3306.000006
          Read_Master_Log_Pos: 107
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 253
        Relay_Master_Log_File: mysql3306.000006
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: mysql
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 107
              Relay_Log_Space: 409
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 12423306
1 row in set (0.00 sec)

ERROR: 
No query specified



 3.1.3查看manager狀態

masterha_check_status --conf=/etc/masterha/app1.cnf
app1 is stopped(2:NOT_RUNNING).

 啓動manager進程,發現啓不來

[iyunv@haproxy001 app1]# masterha_manager --conf=/etc/masterha/app1.cnf 
Sat Jul  5 18:51:34 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Jul  5 18:51:34 2014 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Sat Jul  5 18:51:34 2014 - [info] Reading server configurations from /etc/masterha/app1.cnf..
[iyunv@haproxy001 app1]# masterha_check_status --conf=/etc/masterha/app1.cnf

 日誌如下:


Sat Jul  5 18:51:34 2014 - [info] MHA::MasterMonitor version 0.53.
Sat Jul  5 18:51:34 2014 - [info] Dead Servers:
Sat Jul  5 18:51:34 2014 - [info]   172.16.1.241(172.16.1.241:3306)
Sat Jul  5 18:51:34 2014 - [info] Alive Servers:
Sat Jul  5 18:51:34 2014 - [info]   172.16.1.242(172.16.1.242:3306)
Sat Jul  5 18:51:34 2014 - [info]   172.16.1.243(172.16.1.243:3306)
Sat Jul  5 18:51:34 2014 - [info] Alive Slaves:
Sat Jul  5 18:51:34 2014 - [info]   172.16.1.243(172.16.1.243:3306)  Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul  5 18:51:34 2014 - [info]     Replicating from 172.16.1.242(172.16.1.242:3306)
Sat Jul  5 18:51:34 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Jul  5 18:51:34 2014 - [info] Current Alive Master: 172.16.1.242(172.16.1.242:3306)
Sat Jul  5 18:51:34 2014 - [info] Checking slave configurations..
Sat Jul  5 18:51:34 2014 - [info] Checking replication filtering settings..
Sat Jul  5 18:51:34 2014 - [info]  binlog_do_db= , binlog_ignore_db= 
Sat Jul  5 18:51:34 2014 - [info]  Replication filtering check ok.
Sat Jul  5 18:51:34 2014 - [info] Starting SSH connection tests..
Sat Jul  5 18:51:35 2014 - [info] All SSH connection tests passed successfully.
Sat Jul  5 18:51:35 2014 - [info] Checking MHA Node version..
Sat Jul  5 18:51:36 2014 - [info]  Version check ok.
Sat Jul  5 18:51:36 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln361]  Server 172.16.1.241(172.16.1.241:3306) is dead, but must be alive! Check server settings.
Sat Jul  5 18:51:36 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln475] Error happend on checking configurations.  at /usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm line 361
Sat Jul  5 18:51:36 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln644] Error happened on monitoring servers.
Sat Jul  5 18:51:36 2014 - [info] Got exit code 1 (Not master dead).



 測試結果:(1)無寫負載壓力下,切換正常,耗時10s。(2)manager進程停止,無法控制接下來的主從問題。(3)被提升爲寫的實例,原主從狀態被清空。

 預測:因爲主從信息被清空,回切肯定失敗。

3.2 回切測試1;

 測試內容:(1)保留241宕,242主,243從的狀態,未配置master_ip_failover_script。(2)啓動241的mysql服務,觀察主從。(3)啓動manager服務,觀察主從。(4)觀察日誌。

 3.3.1 啓動241的mysql服務。此時manager服務還是起不來,觀察主從,觀察manager日誌:

  242的show slave status:


mysql> show slave status\G;
Empty set (0.00 sec)

ERROR: 
No query specified

mysql> show master status\G;
*************************** 1. row ***************************
            File: mysql3306.000009
        Position: 107
    Binlog_Do_DB: 
Binlog_Ignore_DB: 
1 row in set (0.00 sec)

ERROR: 
No query specified



  243的show slave status:


mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.16.1.242
                  Master_User: repluser
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql3306.000007
          Read_Master_Log_Pos: 107
               Relay_Log_File: mysql-relay-bin.000004
                Relay_Log_Pos: 253
        Relay_Master_Log_File: mysql3306.000007
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: mysql
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 107
              Relay_Log_Space: 1013
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 12423306
1 row in set (0.00 sec)

ERROR: 
No query specified



 3.3.2 啓動manager。觀察主從,觀察manager日誌。

  242,243主從照舊,manager還是起不來。


Sat Jul  5 19:27:16 2014 - [info] MHA::MasterMonitor version 0.53.
Sat Jul  5 19:27:17 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/ServerManager.pm, ln274] There are 2 non-slave servers! MHA manages at most one non-slave server. Check configurations.
Sat Jul  5 19:27:17 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln475] Error happend on checking configurations.  at /usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm line 298
Sat Jul  5 19:27:17 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln644] Error happened on monitoring servers.
Sat Jul  5 19:27:17 2014 - [info] Got exit code 1 (Not master dead).



 測試結論:未配置master_ip_failover_script等參數狀態下,回切失敗。

3.3 雪崩測試

 少一臺服務器,在243再加一個實例3307。

 測試內容:(1)寫庫241停止訪問後,242提升爲主庫;(2)再停止242mysql服務;(3)觀察是否會產生新的主庫。

 結論:沒戲,manager起不來。

3.4 數據缺失測試

 測試內容:判斷是否按照server1,server2,server3的順序提升新主庫。

 (1)還原241爲主,242,243爲從的狀態;(2)通過事務提交,讓242數據同步比243慢,宕掉241寫庫,觀查哪個實例被提升爲寫庫。(3)通過鎖,讓243比242數據同步慢,宕掉241寫庫,觀察哪個從庫會被提升爲寫庫。

 3.4.1 恢復主從,啓動manager。

 3.4.2 新建表

CREATE TABLE `t_mha` (
  `id` int(8) NOT NULL AUTO_INCREMENT,
  `name` varchar(32) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

    鎖242從庫表t_mha爲只讀。

lock table t_mha read;

    寫數據。

mysql> insert into t_mha(name) select 'abf';
mysql> insert into t_mha(name) select 'abg';
mysql> insert into t_mha(name) select 'abe';

    宕掉主寫241,觀察錯誤日誌,發現一直在等待242的relay log執行,並未發生切換。


Mon Jul  7 17:08:38 2014 - [info] 
Mon Jul  7 17:08:38 2014 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Mon Jul  7 17:08:38 2014 - [info] 
Mon Jul  7 17:08:38 2014 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Mon Jul  7 17:08:38 2014 - [info] 
Mon Jul  7 17:08:38 2014 - [info] * Phase 3.4: Master Log Apply Phase..
Mon Jul  7 17:08:38 2014 - [info] 
Mon Jul  7 17:08:38 2014 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Mon Jul  7 17:08:38 2014 - [info] Starting recovery on 172.16.1.242(172.16.1.242:3306)..
Mon Jul  7 17:08:38 2014 - [info]  This server has all relay logs. Waiting all logs to be applied.. 



   解鎖242表,觀察錯誤日誌,觀察主從。


Mon Jul  7 17:10:20 2014 - [info]   done.
Mon Jul  7 17:10:20 2014 - [info]  All relay logs were successfully applied.
Mon Jul  7 17:10:20 2014 - [info] Getting new master's binlog name and position..
Mon Jul  7 17:10:20 2014 - [info]  mysql3306.000008:620
Mon Jul  7 17:10:20 2014 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.1.242', MASTER_PORT=3306, MASTER_LOG_FILE='mysql3306.000008', MASTER_LOG_POS=620, MASTER_USER='repluser', MASTER_PASSWORD='xxx';
Mon Jul  7 17:10:20 2014 - [warning] master_ip_failover_script is not set. Skipping taking over new master ip address.
Mon Jul  7 17:10:20 2014 - [info] Setting read_only=0 on 172.16.1.242(172.16.1.242:3306)..
Mon Jul  7 17:10:20 2014 - [info]  ok.
Mon Jul  7 17:10:20 2014 - [info] ** Finished master recovery successfully.
Mon Jul  7 17:10:20 2014 - [info] * Phase 3: Master Recovery Phase completed.
Mon Jul  7 17:10:20 2014 - [info] 
Mon Jul  7 17:10:20 2014 - [info] * Phase 4: Slaves Recovery Phase..
Mon Jul  7 17:10:20 2014 - [info] 
Mon Jul  7 17:10:20 2014 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Mon Jul  7 17:10:20 2014 - [info] 
Mon Jul  7 17:10:20 2014 - [info] -- Slave diff file generation on host 172.16.1.243(172.16.1.243:3306) started, pid: 9489. Check tmp log /masterha/app1/172.16.1.243_3306_20140707170837.log if it takes time..
Mon Jul  7 17:10:20 2014 - [info] 
Mon Jul  7 17:10:20 2014 - [info] Log messages from 172.16.1.243 ...
Mon Jul  7 17:10:20 2014 - [info] 
Mon Jul  7 17:10:20 2014 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Mon Jul  7 17:10:20 2014 - [info] End of log messages from 172.16.1.243.
Mon Jul  7 17:10:20 2014 - [info] -- 172.16.1.243(172.16.1.243:3306) has the latest relay log events.
Mon Jul  7 17:10:20 2014 - [info] Generating relay diff files from the latest slave succeeded.
Mon Jul  7 17:10:20 2014 - [info] 
Mon Jul  7 17:10:20 2014 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Mon Jul  7 17:10:20 2014 - [info] 
Mon Jul  7 17:10:20 2014 - [info] -- Slave recovery on host 172.16.1.243(172.16.1.243:3306) started, pid: 9491. Check tmp log /masterha/app1/172.16.1.243_3306_20140707170837.log if it takes time..
Mon Jul  7 17:10:20 2014 - [info] 
Mon Jul  7 17:10:20 2014 - [info] Log messages from 172.16.1.243 ...
Mon Jul  7 17:10:20 2014 - [info] 
Mon Jul  7 17:10:20 2014 - [info] Starting recovery on 172.16.1.243(172.16.1.243:3306)..
Mon Jul  7 17:10:20 2014 - [info]  This server has all relay logs. Waiting all logs to be applied.. 
Mon Jul  7 17:10:20 2014 - [info]   done.
Mon Jul  7 17:10:20 2014 - [info]  All relay logs were successfully applied.
Mon Jul  7 17:10:20 2014 - [info]  Resetting slave 172.16.1.243(172.16.1.243:3306) and starting replication from the new master 172.16.1.242(172.16.1.242:3306)..
Mon Jul  7 17:10:20 2014 - [info]  Executed CHANGE MASTER.
Mon Jul  7 17:10:20 2014 - [info]  Slave started.
Mon Jul  7 17:10:20 2014 - [info] End of log messages from 172.16.1.243.
Mon Jul  7 17:10:20 2014 - [info] -- Slave recovery on host 172.16.1.243(172.16.1.243:3306) succeeded.
Mon Jul  7 17:10:20 2014 - [info] All new slave servers recovered successfully.
Mon Jul  7 17:10:20 2014 - [info] 
Mon Jul  7 17:10:20 2014 - [info] * Phase 5: New master cleanup phease..
Mon Jul  7 17:10:20 2014 - [info] 
Mon Jul  7 17:10:20 2014 - [info] Resetting slave info on the new master..
Mon Jul  7 17:10:20 2014 - [info]  172.16.1.242: Resetting slave info succeeded.
Mon Jul  7 17:10:20 2014 - [info] Master failover to 172.16.1.242(172.16.1.242:3306) completed successfully.
Mon Jul  7 17:10:20 2014 - [info] 

----- Failover Report -----

app1: MySQL Master failover 172.16.1.241 to 172.16.1.242 succeeded

Master 172.16.1.241 is down!

Check MHA Manager logs at haproxy001:/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
The latest slave 172.16.1.242(172.16.1.242:3306) has all relay logs for recovery.
Selected 172.16.1.242 as a new master.
172.16.1.242: OK: Applying all logs succeeded.
172.16.1.243: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
172.16.1.243: OK: Applying all logs succeeded. Slave started, replicating from 172.16.1.242.
172.16.1.242: Resetting slave info succeeded.
Master failover to 172.16.1.242(172.16.1.242:3306) completed successfully.



 3.4.3  這次鎖243的t_mha表。

  日誌如下:


f it takes time..
Mon Jul  7 17:35:39 2014 - [info] 
Mon Jul  7 17:35:39 2014 - [info] Log messages from 172.16.1.243 ...
Mon Jul  7 17:35:39 2014 - [info] 
Mon Jul  7 17:35:38 2014 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Mon Jul  7 17:35:39 2014 - [info] End of log messages from 172.16.1.243.
Mon Jul  7 17:35:39 2014 - [info] -- 172.16.1.243(172.16.1.243:3306) has the latest relay log events.
Mon Jul  7 17:35:39 2014 - [info] Generating relay diff files from the latest slave succeeded.
Mon Jul  7 17:35:39 2014 - [info] 
Mon Jul  7 17:35:39 2014 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Mon Jul  7 17:35:39 2014 - [info] 
Mon Jul  7 17:35:39 2014 - [info] -- Slave recovery on host 172.16.1.243(172.16.1.243:3306) started, pid: 9886. Check tmp log /masterha/app1/172.16.1.243_3306_20140707173538.log if it takes time..

Mon Jul  7 17:37:10 2014 - [info] 
Mon Jul  7 17:37:10 2014 - [info] Log messages from 172.16.1.243 ...
Mon Jul  7 17:37:10 2014 - [info] 
Mon Jul  7 17:35:39 2014 - [info] Starting recovery on 172.16.1.243(172.16.1.243:3306)..
Mon Jul  7 17:35:39 2014 - [info]  This server has all relay logs. Waiting all logs to be applied.. 
Mon Jul  7 17:37:10 2014 - [info]   done.
Mon Jul  7 17:37:10 2014 - [info]  All relay logs were successfully applied.
Mon Jul  7 17:37:10 2014 - [info]  Resetting slave 172.16.1.243(172.16.1.243:3306) and starting replication from the new master 172.16.1.242(172.16.1.242:3306)..
Mon Jul  7 17:37:10 2014 - [info]  Executed CHANGE MASTER.
Mon Jul  7 17:37:10 2014 - [info]  Slave started.
Mon Jul  7 17:37:10 2014 - [info] End of log messages from 172.16.1.243.
Mon Jul  7 17:37:10 2014 - [info] -- Slave recovery on host 172.16.1.243(172.16.1.243:3306) succeeded.
Mon Jul  7 17:37:10 2014 - [info] All new slave servers recovered successfully.
Mon Jul  7 17:37:10 2014 - [info] 
Mon Jul  7 17:37:10 2014 - [info] * Phase 5: New master cleanup phease..
Mon Jul  7 17:37:10 2014 - [info] 
Mon Jul  7 17:37:10 2014 - [info] Resetting slave info on the new master..
Mon Jul  7 17:37:10 2014 - [info]  172.16.1.242: Resetting slave info succeeded.
Mon Jul  7 17:37:10 2014 - [info] Master failover to 172.16.1.242(172.16.1.242:3306) completed successfully.
Mon Jul  7 17:37:10 2014 - [info] 

----- Failover Report -----

app1: MySQL Master failover 172.16.1.241 to 172.16.1.242 succeeded

Master 172.16.1.241 is down!

Check MHA Manager logs at haproxy001:/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
The latest slave 172.16.1.242(172.16.1.242:3306) has all relay logs for recovery.
Selected 172.16.1.242 as a new master.
172.16.1.242: OK: Applying all logs succeeded.
172.16.1.243: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
172.16.1.243: OK: Applying all logs succeeded. Slave started, replicating from 172.16.1.242.
172.16.1.242: Resetting slave info succeeded.
Master failover to 172.16.1.242(172.16.1.242:3306) completed successfully.




 結論:(1)因爲sql線程引起的數據延遲問題,mha會等待relay-log全部寫入盤中後,才進行切換。(2)由於IO線程引起的數據不一致,目前還未測試。

小結:(1)無負載切換越需10s。

   (2)默認按照server模塊順序進行切換。

   (3)無腳本無法回切。

   (4)主寫down後,manager進程會停止。

   (5)無法處理雪崩情況。

   (6)需要等待relay-log全部追上之後,纔會切換主從。

根據這些缺陷,來確認,相關腳本需要實現什麼樣的功能?



下面配置切換腳本參數,再測試一遍上面案例,綁定vip。

#promote script
#shutdown_script="" 
#master_ip_failover_script="/usr/local/bin/master_ip_failover" 
#master_ip_online_change_script="" 
#report_script=""

配置做如下調整:

(1) 配置vip

(2) 配置promote script,master_ip_failover_script,master_ip_online_change_script

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章