實驗環境:
OS : CentOS 6.6
corosync: corosync-1.4.7-1.el6.x86_64
pacemaker:pacemaker-1.1.12-4.el6.x86_64
crmsh:crmsh-2.1-1.6.x86_64.rpm
pssh: pssh-2.3.1-2.el6.x86_64.rpm
node1:
hostname: node2.1inux.com
IP :172.16.66.81
node2:
hostname: node2.1inux.com
IP : 172.16.66.82
一、前期環境配置
爲了配置一臺Linux主機成爲HA的節點,通常需要做出如下的準備工作:
1)所有節點的主機名稱和對應的IP地址解析服務可以正常工作,且每個節點的主機名稱需要跟"uname -n“命令的結果保持一致;因此,需要保證兩個節點上的/etc/hosts文件均爲下面的內容:
172.16.66.81 node1.1inux.com node1 172.16.66.82 node2.1inux.com node2
爲了使得重新啓動系統後仍能保持如上的主機名稱,還分別需要在各節點執行類似如下的命令:
Node1: # sed -i 's@\(HOSTNAME=\).*@\1node1.1inux.com@g' /etc/sysconfig/network # hostname node1.1inux.com Node2: # sed -i 's@\(HOSTNAME=\).*@\1node2.1inux.com@g' /etc/sysconfig/network # hostname node2.1inux.com
2、設定 node1 、node2基於ssh祕鑰的認證的配置
Node1: [root@node1 ~]# ssh-keygen -t rsa -f /root/.ssh/id_rsa -P '' [root@node1 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected] Node2: [root@node2 ~]# ssh-keygen -t rsa -f /root/.ssh/id_rsa -P '' [root@node2 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
3、時間同步配置
[root@node1 yum.repos.d]# ntpdate 172.16.0.1 ; ssh node2 'ntpdate 172.16.0.1' 30 May 16:20:20 ntpdate[2351]: adjust time server 172.16.0.1 offset 0.195961 sec 30 May 16:20:21 ntpdate[1994]: step time server 172.16.0.1 offset 1.033553 sec [root@node1 yum.repos.d]#
驗證時間同步:
[root@node2 ~]# date; ssh node1 "date" Sat May 30 16:51:13 CST 2015 Sat May 30 16:51:13 CST 2015 [root@node2 ~]#
二、安裝配置 corosync 、pacemaker
1、安裝corosync pacemaker
在node1 上面安裝: [root@node1 ~]# yum -y install corosync pacemaker 在node2上安裝 [root@node2 ~]# yum -y install corosync pacemaker
=========================== 查看安裝生成的文件 [root@node2 ~]# rpm -ql corosync /etc/corosync /etc/corosync/corosync.conf.example //樣例配置文件 /etc/corosync/corosync.conf.example.udpu // /etc/corosync/service.d //服務腳本 /etc/corosync/uidgid.d /etc/dbus-1/system.d/corosync-signals.conf /etc/rc.d/init.d/corosync //服務文件 /etc/rc.d/init.d/corosync-notifyd /etc/sysconfig/corosync-notifyd /usr/bin/corosync-blackbox /usr/libexec/lcrso ... /usr/sbin/corosync /usr/sbin/corosync-cfgtool /usr/sbin/corosync-cpgtool /usr/sbin/corosync-fplay /usr/sbin/corosync-keygen /usr/sbin/corosync-notifyd /usr/sbin/corosync-objctl /usr/sbin/corosync-pload /usr/sbin/corosync-quorumtool .... /var/lib/corosync /var/log/cluster ... ===========================
2、配置corosync ,以下操作在在node2 上操作
[root@node2 ~]# cd /etc/corosync/ [root@node2 corosync]# cp corosync.conf.example corosync.conf [root@node2 corosync]# vim corosync.conf
# Please read the corosync.conf.5 manual page compatibility: whitetank totem { version: 2 # secauth: Enable mutual node authentication. If you choose to # enable this ("on"), then do remember to create a shared # secret with "corosync-keygen". secauth: on //開啓安全認證 threads: 0 //線程數 0 表示不基於線程模式工作而是進程 # interface: define at least one interface to communicate # over. If you define more than one interface stanza, you must # also set rrp_mode. interface { # Rings must be consecutively numbered, starting at 0. ringnumber: 0 //環數目 一般保持爲0 # This is normally the *network* address of the # interface to bind to. This ensures that you can use # identical instances of this configuration file # across all your cluster nodes, without having to # modify this option. bindnetaddr: 172.16.0.0 //網絡地址 更改爲主機所在網絡的網絡地址 # However, if you have multiple physical network # interfaces configured for the same subnet, then the # network address alone is not sufficient to identify # the interface Corosync should bind to. In that case, # configure the *host* address of the interface # instead: # bindnetaddr: 192.168.1.1 # When selecting a multicast address, consider RFC # 2365 (which, among other things, specifies that # 239.255.x.x addresses are left to the discretion of # the network administrator). Do not reuse multicast # addresses across multiple Corosync clusters sharing # the same network. mcastaddr: 239.235.88.8 //多播地址 # Corosync uses the port you specify here for UDP # messaging, and also the immediately preceding # port. Thus if you set this to 5405, Corosync sends # messages over UDP ports 5405 and 5404. mcastport: 5405 //多播地址監聽端口 # Time-to-live for cluster communication packets. The # number of hops (routers) that this ring will allow # itself to pass. Note that multicast routing must be # specifically enabled on most network routers. ttl: 1 } } logging { # Log the source file and line where messages are being # generated. When in doubt, leave off. Potentially useful for # debugging. fileline: off # Log to standard error. When in doubt, set to no. Useful when # running in the foreground (when invoking "corosync -f") to_stderr: no # Log to a log file. When set to "no", the "logfile" option # must not be set. to_logfile: yes //是否記錄日誌 logfile: /var/log/cluster/corosync.log //日誌文件保存位置 # Log to the system log daemon. When in doubt, set to yes. to_syslog: no //是否記錄系統日誌 一般只記錄一份 設置爲NO # Log debug messages (very verbose). When in doubt, leave off. debug: off # Log messages with time stamps. When in doubt, set to on # (unless you are only logging to syslog, where double # timestamps can be annoying). timestamp: on //是否開啓時間戳功能 logger_subsys { subsys: AMF debug: off } } //添加如下行 使pacemaker 以corosync插件方式運行 service { ver: 0 name: pacemaker use_mgmtd: yes } aisexec { user: root group: root } =======================================
生成corosync的密鑰文件
查看 # corosync-keygen使用方法:
corosync-keygen是從/dev/random中讀取隨機數,如果此熵池中隨機數過少,可能導致無法生成密鑰,但可以通過下載軟件或其他方案來產生大量I/O從而增加熵池中的隨機數,編譯生成密鑰
[root@node2 ~]# corosync-keygen ..... Press keys on your keyboard to generate entropy (bits = 896). Press keys on your keyboard to generate entropy (bits = 960). Writing corosync key to /etc/corosync/authkey. //生成的密鑰文件保存的位置
3、查看網卡是否開啓了組播MULTICAST功能如果沒有開啓,要手動開啓
[root@node2 corosync]# ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:3d:a9:a1 brd ff:ff:ff:ff:ff:ff [root@node2 corosync]#
4、將corosync.conf 和authkey複製到node1中
[root@node2 corosync]# scp authkey corosync.conf node1:/etc/corosync/ authkey 100% 128 0.1KB/s 00:00 corosync.conf 100% 2773 2.7KB/s 00:00 [root@node2 corosync]#
三、安裝crmsh
RHEL自6.4起不再提供集羣的命令行配置工具crmsh,轉而使用pcs;所以如果想使用crmsh可以自行安裝:
分別在node1和node2 上安裝crmsh和pssh
[root@node2 ~]# ls anaconda-ks.cfg crmsh-2.1-1.6.x86_64.rpm install.log install.log.syslog pssh-2.3.1-2.el6.x86_64.rpm [root@node2 ~]# yum --nogpgcheck localinstall crmsh-2.1-1.6.x86_64.rpm pssh-2.3.1-2.el6.x86_64.rpm
將此兩個安裝文件複製到node1上進行安裝
[root@node2 ~]# scp crmsh-2.1-1.6.x86_64.rpm pssh-2.3.1-2.el6.x86_64.rpm node1:/root/ [root@node1 ~]# yum --nogpgcheck localinstall crmsh-2.1-1.6.x86_64.rpm pssh-2.3.1-2.el6.x86_64.rpm
查看安裝crmsh生成的文件
---------------------- [root@node1 ~]# rpm -ql crmsh /etc/bash_completion.d/crm.sh /etc/crm /etc/crm/crm.conf .... /usr/sbin/crm /usr/share/crmsh ... /var/cache/crm ---------------------------
查看pssh生成的文件
[root@node1 ~]# rpm -ql pssh /usr/bin/pnuke /usr/bin/prsync /usr/bin/pscp.pssh /usr/bin/pslurp /usr/bin/pssh ..... ..... /usr/libexec/pssh /usr/libexec/pssh/pssh-askpass ..... [root@node1 ~]#
四、驗證:
1、啓動 corosync
[root@node1 ~]# service corosync start Starting Corosync Cluster Engine (corosync): [ OK ] [root@node2 ~]#
2、驗證端口:
[root@node1 log]# ss -tunl | grep :5405 udp UNCONN 0 0 172.16.66.82:5405 *:* udp UNCONN 0 0 239.235.1.8:5405 *:* [root@node2 log]#
3、查看corosync引擎是否正常啓動:
# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log
5、檢查啓動過程中是否有錯誤產生。下面的錯誤信息表示packmaker不久之後將不再作爲corosync的插件運行,因此,建議使用cman作爲集羣基礎架構服務;此處可安全忽略。
圖5
如果上面命令執行均沒有問題,接着可以執行如下命令啓動node2上的corosync
[root@node1 ~]# ssh node2 -- /etc/init.d/corosync start
注意:啓動node2需要在node1上使用如上命令進行,不要在node2節點上直接啓動。
使用crmsh命令查看集羣節點的啓動狀態
圖7
五、配置集羣的工作屬性
1、corosync默認啓用了stonith,而當前集羣並沒有相應的stonith設備,因此此默認配置目前尚不可用,這可以通過如下命令驗正:
禁用後查看
# crm configure property stonith-enabled=false
圖9
[root@node1 ~]# crm configure show
node node1.1inux.com
node node2.1inux.com
property cib-bootstrap-options: \
dc-version=1.1.11-97629de \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes=2 \
stonith-enabled=false //stonith 已被禁用
[root@node1 ~]#
也可以進入crm命令模式關閉 圖13
crm(live)configure# property no-quorum-policy=ignore crm(live)configure# show node node1.1inux.com \ attributes standby=off node node2.1inux.com \ attributes standby=off property cib-bootstrap-options: \ dc-version=1.1.11-97629de \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 \ stonith-enabled=false \
no-quorum-policy=ignore
3、爲集羣添加集羣資源
corosync支持heartbeat,LSB和ocf等類型的資源代理,目前較爲常用的類型爲LSB和OCF兩類,stonith類專爲配置stonith設備而用;
可以通過如下命令查看當前集羣系統所支持的類型:
[root@node1 ~]# crm ra classes
如果想要查看某種類別下的所用資源代理的列表,可以使用類似如下命令實現:
# crm ra list lsb
# crm ra list ocf heartbeat
# crm ra list ocf pacemaker
# crm ra list stonith
# crm ra info [class:[provider:]]resource_agent
例如:
# crm ra info ocf:heartbeat:IPaddr
圖12
六、配置高可用的Web集羣
1、爲web集羣創建一個IP地址資源,以在通過集羣提供web服務時使用
crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=172.16.66.100 nic=eth0 cidr_netmask=16 op monitor interval=10 timeout=20s crm(live)configure# verify crm(live)configure# commit crm(live)configure#
然後查看node1 ip 信息如圖18 ,看以看到66.100 已經在node1上生效
進入crm 命令行 下線node1 操作
crm(live)node# standby
然後查看node節點狀態信息
圖15
圖15 可以看到node1 已經下線 現在在線的是node2
查看node2 IP 圖19
OK IP配置完成,接下來我們配置httpd
2、高可用集羣Web的配置
使用yum源分別在node1、node2 安裝httpd
分別啓動node1、node2上的httpd,併爲其創建相應的主頁
node1: [root@node1 ~]# service httpd start Starting httpd: [ OK ] [root@node1 ~]# echo "<h1>node1.1inux.com</h1>" > /var/www/html/index.html [root@node1 ~]# curl 172.16.66.81 <h1>node1.1inux.com</h1> [root@node1 ~]# chkconfig httpd off [root@node1 ~]# service httpd stop
node2: [root@node2 ~]# service httpd start Starting httpd: [ OK ] [root@node2 ~]# echo "<h1>node2.1inux.com</h1>" > /var/www/html/index.html [root@node2 ~]# curl 172.16.66.82 <h1>node2.1inux.com</h1> [root@node2 ~]# service httpd stop [root@node2 ~]# chkconfig httpd off
然後再node1 上配置如下
configure # primitive webserver lsb:httpd op monitor interval=10 timeout=20s crm(live)configure# verify crm(live)configure# commit crm(live)configure# group ip_web webip webserver //創建組 crm(live)configure# verify crm(live)configure# commit crm(live)configure# show node node1.1inux.com \ attributes standby=off node node2.1inux.com \ attributes standby=off primitive webip IPaddr \ params ip=172.16.66.100 nic=eth0 cidr_netmask=16 \ op monitor interval=10 timeout=20s primitive webserver lsb:httpd \ op monitor interval=10 timeout=20s group ip_web webip webserver property cib-bootstrap-options: \ dc-version=1.1.11-97629de \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore
crm(live)configure# cd .. crm(live)# status Last updated: Sun May 31 00:27:05 2015 Last change: Sun May 31 00:24:37 2015 Stack: classic openais (with plugin) Current DC: node2.1inux.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Online: [ node1.1inux.com node2.1inux.com ] Resource Group: ip_web webip (ocf::heartbeat:IPaddr): Started node1.1inux.com webserver (lsb:httpd): Started node1.1inux.com //就可以看到此時 webip webserver 都在node1上面了 crm(live)#
然後我們訪問http://172.16.66.100,如圖20 ,顯示的是node1頁面
然後我們在node1上運行如下命令,然後再訪問 http://172.16.66.100
[root@node1 ~]# service corosync stop
Signaling Corosync Cluster Engine (corosync) to terminate: [ OK ]
Waiting for corosync services to unload:. [ OK ]
[root@node1 ~]#
此時訪問的頁面已經變成了node2
OK 基於heartbeat,crmsh的高可用集羣已經搭建完畢