一、MHA特性
二、MHA工作機制及failover過程解析
三、MHA適用的主從架構
四、MHA高可用環境的構建
4.1 實驗環境
4.2 實驗大概步驟
4.3 相關腳本說明
4.4 MHA部署過程
4.5 配置VIP的方式
五、MHA常用命令
六、注意事項
七、部署過程遇到的問題
一.MHA特性
1.主服務器的自動監控和故障轉移
MHA監控複製架構的主服務器,一旦檢測到主服務器故障,就會自動進行故障轉移。即使有些從服務器沒有收到最新的relay log,MHA自動從最新的從服務器上識別差異的relay log並把這些日誌應用到其他從服務器上,因此所有的從服務器保持一致性了。MHA通常在幾秒內完成故障轉移,9-12秒可以檢測出主服務器故障,7-10秒內關閉故障的主服務器以避免腦裂,幾秒中內應用差異的relay log到新的主服務器上,整個過程可以在10-30s內完成。還可以設置優先級指定其中的一臺slave作爲master的候選人。由於MHA在slaves之間修復一致性,因此可以將任何slave變成新的master,而不會發生一致性的問題,從而導致複製失敗。
2.交互式主服務器故障轉移
可以只使用MHA的故障轉移,而不用於監控主服務器,當主服務器故障時,人工調用MHA來進行故障故障。
3.非交互式的主故障轉移
不監控主服務器,但自動實現故障轉移。這種特徵適用於已經使用其他軟件來監控主服務器狀態,比如heartbeat來檢測主服務器故障和虛擬IP地址接管,可以使用MHA來實現故障轉移和slave服務器晉級爲master服務器。
4.在線切換主服務器
在許多情況下,需要將現有的主服務器遷移到另外一臺服務器上。比如主服務器硬件故障,RAID控制卡需要重建,將主服務器移到性能更好的服務器上等等。維護主服務器引起性能下降,導致停機時間至少無法寫入數據。另外,阻塞或殺掉當前運行的會話會導致主主之間數據不一致的問題發生。MHA提供快速切換和優雅的阻塞寫入,這個切換過程只需要0.5-2s的時間,這段時間內數據是無法寫入的。在很多情況下,0.5-2s的阻塞寫入是可以接受的。因此切換主服務器不需要計劃分配維護時間窗口(呵呵,不需要你在夜黑風高時通宵達旦完成切換主服務器的任務)。
二.MHA工作機制
MHA自動Failover過程解析
http://www.mysqlsystems.com/2012/03/figure-out-process-of-autofailover-on-mha.html
https://code.google.com/p/mysql-master-ha/wiki/Sequences_of_MHA
三.MHA適用的主從架構
https://code.google.com/p/mysql-master-ha/wiki/UseCases
四.MHA高可用環境的構建
4.1 實驗環境
Node1:192.168.10.216 (主)
Node2:192.168.10.217 (從,主故障切換的備主)
Node3:192.168.10.218 (從,兼MHA管理節點)
VIP : 192.168.10.219
Mysql:Percona-Server-5.6.16-rel64.2-569
以上節點系統均爲CentOS6.5 x64
4.2 實驗大概步驟
三節點配置epel的yum源,安裝相關依賴包
建立主從複製關係
ssh-keygen實現三臺機器之間相互免密鑰登錄
三節點安裝mha4mysql-node-0.56,node3上安裝mha4mysql-manager-0.56
在node3上管理MHA配置文件
masterha_check_ssh驗證ssh信任登錄是否成功,masterha_check_repl驗證mysql複製是否成功
啓動MHA manager,並監控日誌文件
測試master(Node1)的mysql宕掉後,是否會自動切換正常
I . 配置VIP,切換後從自動接管主服務,並對客戶端透明
4.3 腳本相關說明
MHA node有三個腳本,依賴perl模塊
save_binary_logs:保存和拷貝宕掉的主服務器二進制日誌
apply_diff_relay_logs:識別差異的relay log事件,並應用到所有從服務器節點
purge_relay_logs:清除relay log日誌文件
4.4 MHA部署過程
A.三節點配置epel的yum源,安裝相關依賴包
rpm -Uvh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-6 yum -y install perl-DBD-MySQL ncftp
B. 建立主從複製關係
在node1上:
mysql>grant replication slave on *.* to 'rep'@'192.168.10.%' identified by 'geekwolf'; mysql>grant all on *.* to 'root'@'192.168.10.%' identified by 'geekwolf'; mysql>show master status;
拷貝node1的data目錄同步到node2,node3 在node2 node3上:
mysql>change master to master_host='192.168.10.216', master_user='rep', master_password='geekwolf',master_port=3306, master_log_file='mysql-in.000006',master_log_pos=120,master_connect_retry=1; mysql>start slave;
每個節點都做好mysql命令的軟鏈
ln -s /usr/local/mysql/bin/* /usr/local/bin/
C. ssh-keygen實現三臺機器之間相互免密鑰登錄 在node1(在其他兩個節點一同)執行
ssh-keygen -t rsa ssh-copy-id -i /root/.ssh/id_rsa.pub root@node1 ssh-copy-id -i /root/.ssh/id_rsa.pub root@node2 ssh-copy-id -i /root/.ssh/id_rsa.pub root@node3
D. 三節點安裝mha4mysql-node-0.56,node3上安裝mha4mysql-manager-0.56
在node1 node2 node3安裝mha4mysql-node
wget https://googledrive.com/host/0B1lu97m8-haWeHdGWXp0YVVUSlk/mha4mysql-node-0.56.tar.gz
tar xf mha4mysql-node-0.56.tar.gz
cd mha4mysql-node
perl Makefile.PL
make && make install
在node3上安裝mha4mysql-manager
wget https://googledrive.com/host/0B1lu97m8-haWeHdGWXp0YVVUSlk/mha4mysql-manager-0.56.tar.gz
tar xf mha4mysql-manager-0.56.tar.gz
cd mha4mysql-manager-0.56
yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Config-IniFiles perl-Time-HiRes
E. 在node3上管理MHA配置文件
mkdir -p /etc/mha/{app1,scripts}
cp mha4mysql-manager-0.56/samples/conf/ /etc/mha/
cp mha4mysql-manager-0.56/samples/scripts/ /etc/mha/scripts/
mv /etc/mha/app1.cnf /etc/mha/app1/
mv /etc/mha/masterha_default.cnf /etc/masterha_default.cnf
設置全局配置:
vim /etc/mha/masterha_default.cnf
[server default] user=root password=geekwolf ssh_user=root repl_user=rep repl_password=geekwolf ping_interval=1 #shutdown_script="" secondary_check_script = masterha_secondary_check -s node1 -s node2 -s node3 --user=root --master_host=node1 --master_ip=192.168.10.216 --master_port=3306 #master_ip_failover_script="/etc/mha/scripts/master_ip_failover" #master_ip_online_change_script="/etc/mha/scripts/master_ip_online_change" # shutdown_script= /script/masterha/power_manager #report_script=""
vim /etc/mha/app1/app1.cnf
[server default] manager_workdir=/var/log/mha/app1 manager_log=/var/log/mha/app1/manager.log [server1] hostname=node1 master_binlog_dir="/usr/local/mysql/logs" candidate_master=1 [server2] hostname=node2 master_binlog_dir="/usr/local/mysql/logs" candidate_master=1 [server3] hostname=node3 master_binlog_dir="/usr/local/mysql/logs" no_master=1
註釋:
candidate_master=1 表示該主機優先可被選爲new master,當多個[serverX]等設置此參數時,優先級由[serverX]配置的順序決定
secondary_check_script mha強烈建議有兩個或多個網絡線路檢查MySQL主服務器的可用性。默認情況下,只有單一的路線 MHA Manager檢查:從Manager to Master,但這是不可取的。MHA實際上可以有兩個或兩個以上的檢查路線通過調用外部腳本定義二次檢查腳本參數
master_ip_failover_script 在MySQL從服務器提升爲新的主服務器時,調用此腳本,因此可以將vip信息寫到此配置文件
master_ip_online_change_script 使用masterha_master_switch命令手動切換MySQL主服務器時後會調用此腳本,參數和master_ip_failover_script 類似,腳本可以互用 shutdown_script 此腳本(默認samples內的腳本)利用服務器的遠程控制IDRAC等,使用ipmitool強制去關機,以避免fence設備重啓主服務器,造成腦列現象
report_script 當新主服務器切換完成以後通過此腳本發送郵件報告,可參考使用 http://caspian.dotconf.net/menu/Software/SendEmail/sendEmail-v1.56.tar.gz
以上涉及到的腳本可以從mha4mysql-manager-0.56/samples/scripts/*拷貝進行修改使用
其他manager詳細配置參數https://code.google.com/p/mysql-master-ha/wiki/Parameters
F. masterha_check_ssh驗證ssh信任登錄是否成功,masterha_check_repl驗證mysql複製是否成功
驗證ssh信任:masterha_check_ssh —conf=/etc/mha/app1/app1.cnf
[root@localhost ~]# masterha_check_ssh --conf=/etc/mha/app1/app1.cnf Tue May 13 07:53:15 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue May 13 07:53:15 2014 - [info] Reading application default configuration from /etc/mha/app1/app1.cnf.. Tue May 13 07:53:15 2014 - [info] Reading server configuration from /etc/mha/app1/app1.cnf.. Tue May 13 07:53:15 2014 - [info] Starting SSH connection tests.. Tue May 13 07:53:16 2014 - [debug] Tue May 13 07:53:15 2014 - [debug] Connecting via SSH from root@node1(192.168.10.216:22) to root@node2(192.168.10.217:22).. Tue May 13 07:53:15 2014 - [debug] ok. Tue May 13 07:53:15 2014 - [debug] Connecting via SSH from root@node1(192.168.10.216:22) to root@node3(192.168.10.218:22).. Tue May 13 07:53:16 2014 - [debug] ok. Tue May 13 07:53:16 2014 - [debug] Tue May 13 07:53:16 2014 - [debug] Connecting via SSH from root@node2(192.168.10.217:22) to root@node1(192.168.10.216:22).. Tue May 13 07:53:16 2014 - [debug] ok. Tue May 13 07:53:16 2014 - [debug] Connecting via SSH from root@node2(192.168.10.217:22) to root@node3(192.168.10.218:22).. Tue May 13 07:53:16 2014 - [debug] ok. Tue May 13 07:53:17 2014 - [debug] Tue May 13 07:53:16 2014 - [debug] Connecting via SSH from root@node3(192.168.10.218:22) to root@node1(192.168.10.216:22).. Tue May 13 07:53:16 2014 - [debug] ok. Tue May 13 07:53:16 2014 - [debug] Connecting via SSH from root@node3(192.168.10.218:22) to root@node2(192.168.10.217:22).. Tue May 13 07:53:17 2014 - [debug] ok. Tue May 13 07:53:17 2014 - [info] All SSH connection tests passed successfully.
驗證主從複製:masterha_check_repl —conf=/etc/mha/app1/app1.cnf
[root@localhost mha]# masterha_check_repl --conf=/etc/mha/app1/app1.cnf Tue May 13 08:10:54 2014 - [info] Reading default configuration from /etc/masterha_default.cnf.. Tue May 13 08:10:54 2014 - [info] Reading application default configuration from /etc/mha/app1/app1.cnf.. Tue May 13 08:10:54 2014 - [info] Reading server configuration from /etc/mha/app1/app1.cnf.. Tue May 13 08:10:54 2014 - [info] MHA::MasterMonitor version 0.56. Tue May 13 08:10:54 2014 - [info] GTID failover mode = 0 Tue May 13 08:10:54 2014 - [info] Dead Servers: Tue May 13 08:10:54 2014 - [info] Alive Servers: Tue May 13 08:10:54 2014 - [info] node1(192.168.10.216:3306) Tue May 13 08:10:54 2014 - [info] node2(192.168.10.217:3306) Tue May 13 08:10:54 2014 - [info] node3(192.168.10.218:3306) Tue May 13 08:10:54 2014 - [info] Alive Slaves: Tue May 13 08:10:54 2014 - [info] node2(192.168.10.217:3306) Version=5.6.16-64.2-rel64.2-log (oldest major version between slaves) log-bin:enabled Tue May 13 08:10:54 2014 - [info] Replicating from 192.168.10.216(192.168.10.216:3306) Tue May 13 08:10:54 2014 - [info] Primary candidate for the new Master (candidate_master is set) Tue May 13 08:10:54 2014 - [info] node3(192.168.10.218:3306) Version=5.6.16-64.2-rel64.2-log (oldest major version between slaves) log-bin:enabled Tue May 13 08:10:54 2014 - [info] Replicating from 192.168.10.216(192.168.10.216:3306) Tue May 13 08:10:54 2014 - [info] Not candidate for the new Master (no_master is set) Tue May 13 08:10:54 2014 - [info] Current Alive Master: node1(192.168.10.216:3306) Tue May 13 08:10:54 2014 - [info] Checking slave configurations.. Tue May 13 08:10:54 2014 - [info] read_only=1 is not set on slave node2(192.168.10.217:3306). Tue May 13 08:10:54 2014 - [warning] relay_log_purge=0 is not set on slave node2(192.168.10.217:3306). Tue May 13 08:10:54 2014 - [info] read_only=1 is not set on slave node3(192.168.10.218:3306). Tue May 13 08:10:54 2014 - [warning] relay_log_purge=0 is not set on slave node3(192.168.10.218:3306). Tue May 13 08:10:54 2014 - [info] Checking replication filtering settings.. Tue May 13 08:10:54 2014 - [info] binlog_do_db= , binlog_ignore_db= Tue May 13 08:10:54 2014 - [info] Replication filtering check ok. Tue May 13 08:10:54 2014 - [info] GTID (with auto-pos) is not supported Tue May 13 08:10:54 2014 - [info] Starting SSH connection tests.. Tue May 13 08:10:55 2014 - [info] All SSH connection tests passed successfully. Tue May 13 08:10:55 2014 - [info] Checking MHA Node version.. Tue May 13 08:10:55 2014 - [info] Version check ok. Tue May 13 08:10:55 2014 - [info] Checking SSH publickey authentication settings on the current master.. Tue May 13 08:10:56 2014 - [info] HealthCheck: SSH to node1 is reachable. Tue May 13 08:10:56 2014 - [info] Master MHA Node version is 0.56. Tue May 13 08:10:56 2014 - [info] Checking recovery script configurations on node1(192.168.10.216:3306).. Tue May 13 08:10:56 2014 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/usr/local/mysql/logs --output_file=/var/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000009 Tue May 13 08:10:56 2014 - [info] Connecting to [email protected](node1:22).. Creating /var/tmp if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /usr/local/mysql/logs, up to mysql-bin.000009 Tue May 13 08:10:56 2014 - [info] Binlog setting check done. Tue May 13 08:10:56 2014 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Tue May 13 08:10:56 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=node2 --slave_ip=192.168.10.217 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.16-64.2-rel64.2-log --manager_version=0.56 --relay_log_info=/usr/local/mysql/data/relay-log.info --relay_dir=/usr/local/mysql/data/ --slave_pass=xxx Tue May 13 08:10:56 2014 - [info] Connecting to [email protected](node2:22).. Checking slave recovery environment settings.. Opening /usr/local/mysql/data/relay-log.info ... ok. Relay log found at /usr/local/mysql/logs, up to relay-bin.000006 Temporary relay log file is /usr/local/mysql/logs/relay-bin.000006 Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Tue May 13 08:10:57 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=node3 --slave_ip=192.168.10.218 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.16-64.2-rel64.2-log --manager_version=0.56 --relay_log_info=/usr/local/mysql/data/relay-log.info --relay_dir=/usr/local/mysql/data/ --slave_pass=xxx Tue May 13 08:10:57 2014 - [info] Connecting to [email protected](node3:22).. Checking slave recovery environment settings.. Opening /usr/local/mysql/data/relay-log.info ... ok. Relay log found at /usr/local/mysql/logs, up to relay-bin.000006 Temporary relay log file is /usr/local/mysql/logs/relay-bin.000006 Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Tue May 13 08:10:57 2014 - [info] Slaves settings check done. Tue May 13 08:10:57 2014 - [info] node1(192.168.10.216:3306) (current master) +--node2(192.168.10.217:3306) +--node3(192.168.10.218:3306) Tue May 13 08:10:57 2014 - [info] Checking replication health on node2.. Tue May 13 08:10:57 2014 - [info] ok. Tue May 13 08:10:57 2014 - [info] Checking replication health on node3.. Tue May 13 08:10:57 2014 - [info] ok. Tue May 13 08:10:57 2014 - [warning] master_ip_failover_script is not defined. Tue May 13 08:10:57 2014 - [warning] shutdown_script is not defined. Tue May 13 08:10:57 2014 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.
G. 啓動MHA manager,並監控日誌文件
在node1上killall mysqld的同時在node3上啓動manager服務
[root@localhost mha]# masterha_manager --conf=/etc/mha/app1/app1.cnf Tue May 13 08:19:01 2014 - [info] Reading default configuration from /etc/masterha_default.cnf.. Tue May 13 08:19:01 2014 - [info] Reading application default configuration from /etc/mha/app1/app1.cnf.. Tue May 13 08:19:01 2014 - [info] Reading server configuration from /etc/mha/app1/app1.cnf.. Creating /var/tmp if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /usr/local/mysql/logs, up to mysql-bin.000009 Tue May 13 08:19:18 2014 - [info] Reading default configuration from /etc/masterha_default.cnf.. Tue May 13 08:19:18 2014 - [info] Reading application default configuration from /etc/mha/app1/app1.cnf.. Tue May 13 08:19:18 2014 - [info] Reading server configuration from /etc/mha/app1/app1.cnf..
之後觀察node3上/var/log/mha/app1/manager.log日誌會發現node1 dead狀態,主自動切換到node2上,而node3上的主從配置指向了node2,並且發生一次切換後會生成/var/log/mha/app1/app1.failover.complete文件;
手動恢復node1操作:
rm -rf /var/log/mha/app1/app1.failover.complete
啓動node1上的mysql,重新配置node2 node3 主從指向node1(change master to)
MHA Manager後臺執行:
nohup masterha_manager —conf=/etc/mha/app1/app1.cnf < /dev/null > /var/log/mha/app1/app1.log 2>&1 &
守護進程方式參考: https://code.google.com/p/mysql-master-ha/wiki/Runnning_Background
ftp://ftp.pbone.net/mirror/ftp5.gwdg.de/pub/opensuse/repositories/home:/weberho:/qmailtoaster/openSUSE_Tumbleweed/x86_64/daemontools-0.76-5.3.x86_64.rpm
4.5 配置VIP的方式
A.通過全局配置文件實現
vim /etc/mha/masterha_default.cnf
[server default] user=root password=geekwolf ssh_user=root repl_user=rep repl_password=geekwolf ping_interval=1 secondary_check_script = masterha_secondary_check -s node1 -s node2 -s node3 --user=root --master_host=node1 --master_ip=192.168.10.216 --master_port=3306 master_ip_failover_script="/etc/mha/scripts/master_ip_failover" master_ip_online_change_script="/etc/mha/scripts/master_ip_online_change" #shutdown_script= /script/masterha/power_manager #report_script=""
修改後的master_ip_failover、master_ip_online_change腳本
#!/usr/bin/env perl use strict; use warnings FATAL => 'all'; use Getopt::Long; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port ); my $vip = '192.168.10.218'; # Virtual IP my $gateway = '192.168.10.1';#Gateway IP my $interface = 'eth0' my $key = "1"; my $ssh_start_vip = "/sbin/ifconfig $interface:$key $vip;/sbin/arping -I $interface -c 3 -s $vip $gateway >/dev/null 2>&1"; my $ssh_stop_vip = "/sbin/ifconfig $interface:$key down"; GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, ); exit &main(); sub main { print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { # $orig_master_host, $orig_master_ip, $orig_master_port are passed. # If you manage master ip address at global catalog database, # invalidate orig_master_ip here. my $exit_code = 1; eval { print "Disabling the VIP on old master: $orig_master_host \n"; &stop_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { # all arguments are passed. # If you manage master ip address at global catalog database, # activate new_master_ip here. # You can also grant write access (create user, set read_only=0, etc) here. my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; `ssh $ssh_user\@cluster1 \" $ssh_start_vip \"`; exit 0; } else { &usage(); exit 1; } } # A simple system call that enable the VIP on the new master sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; } # A simple system call that disable the VIP on the old_master sub stop_vip() { `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; }
B.通過第三方HA(keepalived、heartbeat)實現VIP,以keepalived爲例
以node1 node2互爲主備進行配置keepalived
在node1 node2上分別下載安裝keepalived
wget http://www.keepalived.org/software/keepalived-1.2.13.tar.gz
yum -y install popt-*
./configure —prefix=/usr/local/keepalived —enable-snmp
make && make install
cp /usr/local/keepalived/etc/rc.d/init.d/keepalived /etc/rc.d/init.d/
cp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/
chmod +x /etc/rc.d/init.d/keepalived
chkconfig keepalived on
mkdir /etc/keepalived
ln -s /usr/local/keepalived/sbin/keepalived /usr/sbin
修改node1(192.168.10.216)配置文件
vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived global_defs { router_id MHA notification_email { root@localhost #接收郵件,可以有多個,一行一個 } #當主、備份設備發生改變時,通過郵件通知 notification_email_from m@localhost #發送郵箱服務器 smtp_server 127.0.0.1 #發送郵箱超時時間 smtp_connect_timeout 30 } varrp_script check_mysql { script "/etc/keepalived/check_mysql.sh" } vrrp_sync_group VG1 { group { VI_1 } notify_master "/etc/keepalived/master.sh" } vrrp_instance VI_1 { state master interface eth0 virtual_router_id 110 priority 100 advert_int 1 nopreempt #不搶佔資源,意思就是它活了之後也不會再把主搶回來 authentication { # 認證方式,可以是PASS或AH兩種認證方式 auth_type PASS # 認證密碼 auth_pass geekwolf } track_script { check_mysql } virtual_ipaddress { 192.168.10.219 } }
修改node2(192.168.10.217)配置文件
vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived global_defs { router_id MHA notification_email { root@localhost #接收郵件,可以有多個,一行一個 } #當主、備份設備發生改變時,通過郵件通知 notification_email_from m@localhost #發送郵箱服務器 smtp_server 127.0.0.1 #發送郵箱超時時間 smtp_connect_timeout 30 } varrp_script check_mysql { script "/etc/keepalived/check_mysql.sh" } vrrp_sync_group VG1 { group { VI_1 } notify_master "/etc/keepalived/master.sh" } vrrp_instance VI_1 { state backup interface eth0 virtual_router_id 110 priority 99 advert_int 1 authentication { # 認證方式,可以是PASS或AH兩種認證方式 auth_type PASS # 認證密碼 auth_pass geekwolf } track_script { check_mysql } virtual_ipaddress { 192.168.10.219 } }
check_mysql.sh
#!/bin/bash MYSQL=/usr/local/mysql/bin/mysql MYSQL_HOST=127.0.0.1 MYSQL_USER=root MYSQL_PASSWORD=geekwolf CHECK_TIME=3 #mysql is working MYSQL_OK is 1 , mysql down MYSQL_OK is 0 MYSQL_OK=1 function check_mysql_helth (){ $MYSQL -h $MYSQL_HOST -u $MYSQL_USER -e "show status;" >/dev/null 2>&1 if [ $? = 0 ] ;then MYSQL_OK=1 else MYSQL_OK=0 fi return $MYSQL_OK } while [ $CHECK_TIME -ne 0 ] do let "CHECK_TIME -= 1" check_mysql_helth if [ $MYSQL_OK = 1 ] ; then CHECK_TIME=0 exit 0 fi if [ $MYSQL_OK -eq 0 ] && [ $CHECK_TIME -eq 0 ] then pkill keepalived exit 1 fi sleep 1 done
master.sh
#!/bin/bash VIP=192.168.10.219 GATEWAY=1.1 /sbin/arping -I eth0 -c 5 -s $VIP $GATEWAY &>/dev/null
chmod +x /etc/keepalived/check_mysql.sh
chmod +x /etc/keepalived/master.sh
五.MHA常用命令
查看manager狀態
masterha_check_status —conf=/etc/mha/app1/app1.cnf
查看免密鑰是否正常
masterha_check_ssh —conf=/etc/mha/app1/app1.cnf
查看主從複製是否正常
masterha_check_repl —conf=/etc/mha/app1/app1.cnf
添加新節點server4到配置文件
masterha_conf_host —command=add —conf=/etc/mha/app1/app1.cnf —hostname=geekwolf —block=server4 —params=“no_master=1;ignore_fail=1” ***server4節點
masterha_conf_host —command=delete —conf=/etc/mha/app1/app1.cnf —block=server4
注:
block:爲節點區名,默認值 爲[server_$hostname],如果設置成block=100,則爲[server100] params:參數,分號隔開(參考https://code.google.com/p/mysql-master-ha/wiki/Parameters)
關閉manager服務
masterha_stop —conf=/etc/mha/app1/app1.cnf
主手動切換(前提不要啓動masterha_manager服務)
在主node1存活情況下進行切換
交互模式:
masterha_master_switch —master_state=alive —conf=/etc/mha/app1/app1.cnf —new_master_host=node2
非交互模式:
masterha_master_switch —master_state=alive —conf=/etc/mha/app1/app1.cnf —new_master_host=node2 —interactive=0
在主node1宕掉情況下進行切換
masterha_master_switch —master_state=dead —conf=/etc/mha/app1/app1.cnf —dead_master_host=node1 —dead_master_ip=192.168.10.216 —dead_master_port=3306 —new_master_host=192.168.10.217 詳細請參考:https://code.google.com/p/mysql-master-ha/wiki/TableOfContents?tm=6 *
六.注意事項
A. 以上兩種vip切換方式,建議採用第一種方法
B. 發生主備切換後,manager服務會自動停掉,且在/var/log/mha/app1下面生成
app1.failover.complete,若再次發生切換需要***app1.failover.complete文件
C. 測試過程發現一主兩從的架構(兩從都設置可以擔任主角色candidate_master=1),當舊主故障遷移到備主後,***app1.failover.complete,再次啓動manager,停掉新主後,發現無法正常切換(解決方式:***/etc/mha/app1/app1.cnf裏面的舊主node1的信息後,重新切換正常)
D. arp緩存導致切換VIP後,無法使用問題
E. 使用Semi-Sync能夠最大程度保證數據安全
F. Purge_relay_logs腳本***中繼日誌不會阻塞SQL線程,在每臺從節點上設置計劃任務定期清除中繼日誌
0 5 * * * root /usr/bin/purge_relay_logs —user=root —password=geekwolf —disable_relay_log_purge >> /var/log/mha/purge_relay_logs.log 2>&1
七.部署過程遇到的問題
問題1: [root@node1 mha4mysql-node-0.56]# perl Makefile.PL
Can’t locate ExtUtils/MakeMaker.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/Install/Makefile.pm line 4.
BEGIN failed—compilation aborted at inc/Module/Install/Makefile.pm line 4. Compilation failed in require at inc/Module/Install.pm line 283.
Can’t locate ExtUtils/MakeMaker.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/
vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/Install/Can.pm line 6.
BEGIN failed—compilation aborted at inc/Module/Install/Can.pm line 6.
Compilation failed in require at inc/Module/Install.pm line 283.
Can’t locate ExtUtils/MM_Unix.pm in @INC (@INC contains: inc /usr/local/lib64/
perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/Install/
Metadata.pm line 349.
解決辦法:
yum -y install perl-CPAN perl-devel perl-DBD-MySQL
問題2:
Can’t locate Time/HiRes.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/local/share/perl5/MHA/SSHCheck.pm line 28.
BEGIN failed—compilation aborted at /usr/local/share/perl5/MHA/SSHCheck.pm line 28.
Compilation failed in require at /usr/local/bin/masterha_check_ssh line 25. BEGIN failed—compilation aborted at /usr/local/bin/masterha_check_ssh line 25.
解決辦法:
yum -y install perl-Time-HiRes
問題3: 解決辦法:
每個節點都做好mysql命令的軟鏈
ln -s /usr/local/mysql/bin/* /usr/local/bin/
參考文檔:
https://code.google.com/p/mysql-master-ha
http://blog.chinaunix.net/uid-28437434-id-3476641.html