還記得小編在前面大篇幅的介紹HA羣集吧,使用heartbeat和rhcs都實現過了吧,加上這篇博客介紹的corosync+pacemaker那麼就算是完整啦,小編覺得未來高可用性羣集的搭建會廣泛使用它們倆啦,Conrosync是從Openais這個大項目中分支出來的一個項目,而Pacemaker是heartbeat v3版本中分裂出來專門用於提供高可用集羣CRM的組件,東西有點多啦,當然我的總結能力有限啦,得感謝我的好室友(wpp)的幫忙啦。。。。
---------------------------------------------
地址規劃:|
*HA架構服務器*
node1.a.com eth0-ip:192.168.102.101 eth1:192.168.1.100
node2.a.com eth0-ip:192.168.102.102 eth1:192.168.1.200
Vip:192.168.102.200
注:eth0橋接、eth1 Host-Only
---------------------------------------------------------
*Target服務器端*
eth0-ip:192.168.1.10
注:eth0 Host-Only
---------------------------------------------------------
***配置步驟***
—————————————————————————————
Step1:準備工作
--------------
①分別在給個節點上配置靜態ip地址(service network restart)
②進行節點間的時鐘同步.(hwclock / date -s "2013-06-14 **:**:**")
③修改HA節點的主機名,使相互能進行名稱解析.
vim /etc/sysyconfig/network
1 NETWORKING=yes
2 NETWORKING_IPV6=yes
3 HOSTNAME=node1.a.com //對於節點2來說主機名是node2.a.com
vim /etc/hosts
3 127.0.0.1 localhost.localdomain localhost
4 ::1 localhost6.localdomain6 localhost6
5 192.168.102.101 node1.a.com node1
6 192.168.102.102 node2.a.com node2
hostname node1.a.com
④實現節點間的無障礙通信(通信時不需要輸入對方的root密碼,主要是用到ssh啦,把各自節點的公鑰給對方就是了)
node1:
ssh-keygen -t rsa //生成node1節點的ssh服務的公鑰和私鑰對
cd /root/.ssh/
sh-copy-id -i id_rsa.pub node2 //將node1的公鑰傳遞給node2
輸入node2的root密碼:123456
node2:
ssh-keygen -t rsa //生成node2節點的ssh服務的公鑰和私鑰對
cd /root/.ssh/
sh-copy-id -i id_rsa.pub node1 //將node1的公鑰傳遞給node1
輸入node1的root密碼:123456
node1上無障礙通信測試:scp /etc/fstab node2(不再需要root密碼)
⑤node1(node2)上配置本地yum源,掛載光盤,安裝Corosync相關軟件包(小編我放到附件裏啦)
yum localinstall cluster-glue-1.0.6-1.6.el5.i386.rpm \
cluster-glue-libs-1.0.6-1.6.el5.i386.rpm \
corosync-1.2.7-1.1.el5.i386.rpm \
corosynclib-1.2.7-1.1.el5.i386.rpm \
heartbeat-3.0.3-2.3.el5.i386.rpm \
heartbeat-libs-3.0.3-2.3.el5.i386.rpm \
libesmtp-1.0.4-5.el5.i386.rpm \
pacemaker-1.1.5-1.1.el5.i386.rpm \
pacemaker-libs-1.1.5-1.1.el5.i386.rpm \
perl-TimeDate-1.16-5.el5.noarch.rpm \
resource-agents-1.0.4-1.1.el5.i386.rpm --nogpgcheck
rpm -ivh openais-1.1.3-1.6.el5.i386.rpm
rpm -ivh openaislib-1.1.3-1.6.el5.i386.rpm
----------------------------
Step2:進行Corosync的具體配置
----------------------------
①拷貝生成配置文件,並進行相關的配置
cd /etc/corosync/
cp -p corosync.conf.example corosync.conf
vim corosync.conf
# Please read the corosync.conf.5 manual page
compatibility: whitetank //表示兼容corosync 0.86的版本,向後兼容,兼容老的版本,一些新的功能可能無法實用
totem { //圖騰的意思 ,多個節點傳遞心跳時的相關協議的信息
version: 2 //版本號
secauth: off //是否打開安全認證
threads: 0 //多少個線程 0 :無限制
interface {
ringnumber: 0
bindnetaddr: 192.168.102.0 //通過哪個網絡地址進行通訊,可以給個主機地址
mcastaddr: 226.94.1.1
mcastport: 5405
}
}
logging { //進行的日誌的相關選項配置
fileline: off //一行顯示所有的日誌信息
to_stderr: no //是否發送標準的出錯到標準的出錯設備上(屏幕,必然要關閉啦,目的你懂的)
to_logfile: yes //將信息輸出到日誌文件中
to_syslog: yes //同時將信息寫入到系統日誌中(兩個用一個,佔系統資源)
logfile: /var/log/cluster/corosync.log //***日誌文件的存放目錄,需要手動創建,不創建,服務將會起不來***
debug: off //是否開啓debug功能,系統排查時,可以啓用該功能
timestamp: on //日誌是否記錄時間
//以下是openais的東西,可以不用打開
logger_subsys {
subsys: AMF
debug: off
}
}
amf {
mode: disabled
}
service { //補充一些東西,前面只是底層的東西,因爲要用pacemaker
ver: 0
name: pacemaker
}
aisexec { //雖然用不到openais ,但是會用到一些子選項
user: root
group: root
}
②爲了方便其他主機加入該集羣,需要認證,生成一個authkey //和heartbeat是一樣的啦,只有信任的節點纔會有,一定程度上保證了羣集的安全性啦
[root@node1 corosync]# corosync-keygen
[root@node1 corosync]# ll
total 28
-rw-r--r-- 1 root root 5384 Jul 28 2010 amf.conf.example
-r-------- 1 root root 128 May 7 16:16 authkey
-rw-r--r-- 1 root root 513 May 7 16:14 corosync.conf
-rw-r--r-- 1 root root 436 Jul 28 2010 corosync.conf.example
drwxr-xr-x 2 root root 4096 Jul 28 2010 service.d
drwxr-xr-x 2 root root 4096 Jul 28 2010 uidgid.d
③創建日誌文件的存放目錄 //如果不創建的話corosync是起不來的,上面的配置文件將日誌放在這個目錄下啦
[root@node1 corosync]# mkdir /var/log/cluster
④進行節點間的配置同步.
[root@node1 corosync]# scp -p authkey corosync.conf node2:/etc/corosync/
authkey 100% 128 0.1KB/s 00:00
corosync.conf 100% 513 0.5KB/s 00:00
[root@node1 corosync]# ssh node2 'mkdir /var/log/cluster'
⑤啓動服務
[root@node1 corosync]# service corosync start
[root@node1 corosync]# ssh node2 '/etc/init.d/corosync start' //很神奇吧,在節點一上可以控制節點二的指令啦
⑥查看corosync的引擎啓動情況
[root@node1 corosync]# grep -i -e "corosync cluster engine" -e "configuration file" /var/log/messages
⑦查看初始化成員節點通知是否發出
[root@node1 corosync]# grep -i totem /var/log/messages
⑧檢查過程中是否有錯誤產生
[root@node1 corosync]# grep -i error: /var/log/messages |grep -v unpack_resources
⑨檢查pacemaker是否已經啓動了
[root@node1 corosync]# grep -i pcmk_startup /var/log/messages
⑩在任何一個節點上 查看集羣的成員狀態
[root@node1 ~]# crm status
============
Last updated: Fri Jun 14 22:06:21 2013
Stack: openais
Current DC: node1.a.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
0 Resources configured.
============
Online: [ node1.a.com node2.a.com ]
-------------------------------------------------------------------
Step3:提供高可用性的服務
--------------------------
Corosync中,定義服務可以用兩種接口:
1:圖形(hb_gui)Heartbeat的一種圖形工具,需要安裝Heartbeat需要的軟件包
yum localinstall heartbeat-2.1.4-9.el5.i386.rpm \
heartbeat-gui-2.1.4-9.el5.i386.rpm \
heartbeat-pils-2.1.4-10.el5.i386.rpm \
heartbeat-stonith-2.1.4-10.el5.i386.rpm \
libnet-1.1.4-3.el5.i386.rpm \
perl-MailTools-1.77-1.el5.noarch.rpm --nogpgcheck
安裝完後:hb_gui圖形進行羣集配置
2:crm(pacemaker提供的一種shell) //很像設備的命令行啦,很好用的
①顯示當前的配置信息 crm configure show
②進行配置文件的語法檢測 crm_verify -L
[root@node1 corosync]# crm_verify -L
crm_verify[878]: 2013/06/14_17:29:33 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
crm_verify[878]: 2013/06/14_17:29:33 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
crm_verify[878]: 2013/06/14_17:29:33 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
-V may provide more details
可以看到有stonith錯誤,在高可用的環境裏面,會禁止使用任何資源
可以禁用stonith
方法:
[root@node1 corosync]# crm //進入crm的shell模式下
crm(live)# configure //進入全局配置模式
crm(live)configure# property stonith-enabled=false //關閉stonith機制
crm(live)configure# commit //提交保存配置信息
crm(live)configure# show //顯示當前配置
crm(live)configure# exit
再次進行語法檢測:crm_verify -L 就不會報錯了.
③羣集資源類型4種
[root@node1 corosync]# crm
crm(live)# configure
crm(live)# help
primitive 本地主資源 (只能運行在一個節點上)
group 把多個資源軌道一個組裏面,便於管理
clone 需要在多個節點上同時啓用的 (如ocfs2 ,stonith ,沒有主次之分)
master 有主次之分,如drbd
。。。。。
。。。。。
④用資源代理進行服務的配置
[root@node1 corosync]# crm
crm(live)# ra
crm(live)# classes
heartbeat
lsb
ocf / heartbeat pacemaker
stonith
⑤查看資源代理的腳本列表
[root@node1 corosync]# crm
crm(live)# ra
crm(live)ra# list lsb
NetworkManager acpid anacron apmd
atd auditd autofs avahi-daemon
avahi-dnsconfd bluetooth capi conman
corosync cpuspeed crond cups
cups-config-daemon dnsmasq drbd dund
firstboot functions gpm haldaemon
halt heartbeat hidd hplip
httpd ip6tables ipmi iptables
irda irqbalance iscsi iscsid
isdn kdump killall krb524
kudzu lm_sensors logd lvm2-monitor
mcstrans mdmonitor mdmpd messagebus
microcode_ctl multipathd netconsole netfs
netplugd network nfs nfslock
nscd ntpd o2cb ocfs2
openais openibd pacemaker pand
pcscd portmap psacct rawdevices
rdisc readahead_early readahead_later restorecond
rhnsd rpcgssd rpcidmapd rpcsvcgssd
saslauthd sendmail setroubleshoot single
smartd sshd syslog vncserver
wdaemon winbind wpa_supplicant xfs
xinetd ypbind yum-updatesd
查看ocf的heartbeat
crm(live)ra# list ocf heartbeat
⑥使用info或meta顯示一個資源的詳細信息
meta ocf:heartbeat:IPaddr
⑦配置資源(IP地址:vip-192.168.102.200 Web服務:httpd)
[root@node1 ~]# crm
crm(live)# configure
crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=192.168.2.100
crm(live)configure# show //查看
node node1.a.com
node node2.a.com
primitive webip ocf:heartbeat:IPaddr \
params ip="192.168.102.200"
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false"
crm(live)configure# commit //提交
crm(live)# status //狀態查詢
============
Last updated: Mon May 7 19:39:37 2013
Stack: openais
Current DC: node1.a.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node1.a.com node2.a.com ]
webip (ocf::heartbeat:IPaddr): Started node1.a.com
可以看出該資源在node1上啓動
使用ifconfig 在node1上進行查看
[root@node1 ~]# ifconfig
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:25:D2:BC
inet addr:192.168.102.200 Bcast:192.168.102.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:67 Base address:0x2000
定義httpd資源
在node1和node2上安裝httpd服務,不需開機啓動.
yum install httpd
chkconfig httpd off
查看httpd服務的資源代理:lsb
[root@node1 corosync]# crm
crm(live)# ra
crm(live)ra# list lsb
查看httpd的參數
crm(live)ra# meta lsb:httpd
定義httpd的資源
crm(live)configure# primitive webserver lsb:httpd
crm(live)configure# show
node node1.a.com
node node2.a.com
primitive webip ocf:heartbeat:IPaddr \
params ip="192.168.102.200"
primitive webserver lsb:httpd
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false"
crm(live)# status
============
Last updated: Mon May 7 20:01:12 2013
Stack: openais
Current DC: node1.a.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ node1.a.com node2.a.com ]
webIP (ocf::heartbeat:IPaddr): Started node1.a.com
webserver (lsb:httpd): Started node2.a.com
發現httpd已經啓動了,但是在node2節點上
(高級羣集服務資源越來越多,會分佈在不同的節點上,以儘量負載均衡)
需要約束在同一個節點上,定義成一個組
⑧定義一個資源組,將資源進行綁定
crm(live)# configure
crm(live)configure# help group
The `group` command creates a group of resources.
Usage:
...............
group <name> <rsc> [<rsc>...]
[meta attr_list]
[params attr_list]
attr_list :: [$id=<id>] <attr>=<val> [<attr>=<val>...] | $id-ref=<id>
...............
Example:
...............
group internal_www disk0 fs0 internal_ip apache \
meta target_role=stopped
...............
定義組進行資源綁定
crm(live)configure# group web-res webip webserver
crm(live)configure# show
node node1.a.com
node node2.a.com
primitive webip ocf:heartbeat:IPaddr \
params ip="192.168.102.200"
primitive webserver lsb:httpd
group web-res webip webserver
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false"
查看羣集的狀態
crm(live)# status
============
Last updated: Mon May 7 20:09:06 2013
Stack: openais
Current DC: node1.a.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node1.a.com node2.a.com ]
Resource Group: web-res
webIP (ocf::heartbeat:IPaddr): Started node1.a.com
webserver (lsb:httpd): Started node1.a.com
(現在ip地址和 httpd都已經在node1上了)
------------------------------------------------------------
Step4:進行節點間的切換測試.
----------------------------
node1:將corosync服務停掉,在節點node2上觀察
service corosync stop
[root@node2 corosync]# crm status
============
Last updated: Mon May 7 20:16:58 2013
Stack: openais
Current DC: node2.a.com - partition WITHOUT quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node2.a.com ]
OFFLINE: [ node1.a.com ]
可以看到:由於node2節點上沒有票數,導致不能正常的資源切換.
解決方法:忽略仲裁磁盤選項.quorum
可選參數有:
ignore (忽略)
freeze (凍結,表示已經啓用的資源繼續實用,沒有啓用的資源不能啓用)
stop (默認)
suicide (所有的資源殺掉)
再node1上:
service corosync start
[root@node1 corosync]# crm
crm(live)# configure
crm(live)configure# property no-quorum-policy=ignore
crm(live)configure# commit
crm(live)configure# show (在次查看quorum 的屬性)
node node1.a.com
node node2.a.com
primitive webip ocf:heartbeat:IPaddr \
params ip="192.168.102.200"
primitive webserver lsb:httpd
group web-res webip webserver
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore" (已經關閉)
再次進行切換測試,資源輪轉正常!
------------------------------------------------------
Step5:corosync的常見指令
①crm_attribute 修改集羣的全局屬性信息
②crm_resource 修改資源
③6crm_node 管理節點
crm_node -e 查看節點的時代(配置文件修改過幾次了)
crm_node -q 顯示當前節點的票數
1
④cibadmin 集羣配置的工具
-u, --upgrade Upgrade the configuration to the latest syntax
-Q, --query Query the contents of the CIB
-E, --erase Erase the contents of the whole CIB
-B, --bump Increase the CIB's epoch value by 1
如果某一個資源定義錯了,就可以實用該工具進行刪除
-D, --delete Delete the first object matching the supplied criteria, Eg.
也可以在crm的命令行下
crm(live)configure# delete
usage: delete <id> [<id>...]
也可以在該模式下執行edit
執行完畢後,commit 提交
--------------------------------------------------------------------
Step6:ISCSI(IP-SAN)存儲配置詳情
-----------------------------------
一:target(後方的存儲介質)
①新添加一塊磁盤(或分區)
fdisk -l
分區:fdisk /dev/sda(n--p--4--+2g-w)---添加一塊磁盤sda6
更新分區表:(cat /proc/partitions)
partprobe /dev/sda(不重啓,更新分區表)
②安裝target需要的軟件包,啓動服務.
cd /mnt/cdrom/ClusterStorage
rpm -ivh perl-Config-General-2.40-1.e15.noarchrpm
rpm -ivh scsi-target-utils-0.0-5.20080917snap.e15.i386.rpm
service tgtd start
③添加新的iscsi的target.
添加:tgtadm --lld iscsi --op new --mode target --tid=1 --targetname iqn.2013-06.com.a.target:disk
顯示:tgtadm --lld iscsi --op show --mode target
存儲:tgtadm --lld iscsi --op new --mode=logicalunit --tid=1 --lun=1 --backing-store /dev/sda4
--lld [driver] --op new --mode=logicalunit --tid=[id] --lun=[lun] --backing-store [path]
驗證:tgtadm --lld iscsi --op bind --mode=target --tid=1 --initiator-address=192.168.1.0/24
tgtadm --lld [driver] --op bind --mode=target --tid=[id] --initiator-address=[address]
④將配置添加到配置文件中,可以開機自動加載.
vim /etc/tgt/targets.conf
<target iqn.2013-06.com.a.target:disk>
backing-store /dev/sda6
initiator-address 192.168.1.0/24
</target>
二:initiator(node1和node2)
cd /mnt/cdrom/Server
rpm -ivh iscsi-initiator-utils-6.2.0.871-0.10.el5.i386.rpm
service iscsi start
發現:iscsiadm --mode discovery --type sendtargets --portal 192.168.1.10
認證登錄:iscsiadm --mode node --targetname iqn.2013-06.com.a.target:disk --portal 192.168.1.10:3260 --login
⑤Target端顯示在線的用戶情況
tgt-admin -s
Target 1: iqn.2013-06.com.a.target:disk
System information:
Driver: iscsi
State: ready
I_T nexus information:
I_T nexus: 1
Initiator: iqn.2013-06.com.a.realserver2
Connection: 0
IP Address: 192.168.1.200
I_T nexus: 2
Initiator: iqn.2013-06.com.a.realserver1
Connection: 0
IP Address: 192.168.1.100
LUN information:
LUN: 0
Type: controller
SCSI ID: deadbeaf1:0
SCSI SN: beaf10
Size: 0 MB
Online: Yes
Removable media: No
Backing store: No backing store
LUN: 1
Type: disk
SCSI ID: deadbeaf1:1
SCSI SN: beaf11
Size: 4178 MB
Online: Yes
Removable media: No
Backing store: /dev/sda6
Account information:
ACL information:
192.168.1.0/24
⑥node1和node2上查看本地的磁盤列表。
fdisk -l
Disk /dev/sdb: 4178 MB, 4178409984 bytes
129 heads, 62 sectors/track, 1020 cylinders
Units = cylinders of 7998 * 512 = 4094976 bytes
Disk /dev/sdb doesn't contain a valid partition table
-------------------------------------------------------
Step7:將新的磁盤sdb格式爲OCFS2羣集文件系統.
------------------------------------------
①在兩個節點上安裝需要的軟件包
yum localinstall ocfs2-2.6.18-164.el5-1.4.7-1.el5.i686.rpm \
ocfs2-tools-1.4.4-1.el5.i386.rpm \
ocfs2console-1.4.4-1.el5.i386.rpm
②對主配置文件進行配置.
方法一:手動創建主配置文件
mkdir /etc/ocfs2/
vim cluster.conf
node:
ip_port = 7777
ip_address = 192.168.102.101
number = 0
name = node1.a.com
cluster = ocfs2
node:
ip_port = 7777
ip_address = 192.168.102.102
number = 1
name = node2.a.com
cluster = ocfs2
cluster:
node_count = 2
name = ocfs2
進行節點間的配置同步.
scp -r /etc/ocfs2 node2:/etc/
方法二:GUI圖形下進行配置
ocfs2console
③兩個節點上分別加載o2cb模塊,啓動服務.
/etc/init.d/o2cb load
Loading module "configfs":OK
Mounting configfs filesystem at /config:OK
Loading module "ocfs2_nodemanager":OK
Loading module "ocfs2_dlm":OK
Loading module "ocfs2_dlmfs":OK
/etc/init.d/ocfs2 start
chkconfig ocfs2 on
/etc/init.d/o2cb online ocfs2
/etc/init.d/o2cb configure
Configuring the O2CB driver.
這將配置 O2CB 驅動程序的引導屬性。以下問題將決定在引導時是否加載驅動程序。當前值將在方括號(“[]”)中顯示。按 而不鍵入答案將保留該當前值。Ctrl-C 將終止。
Load O2CB driver on boot (y/n) [n]:y
Cluster to start on boot (Enter "none" to clear) [ocfs2]:ocfs2
Writing O2CB configuration:OK
Loading module "configfs":OK
Mounting configfs filesystem at /config:OK
Loading module "ocfs2_nodemanager":OK
Loading module "ocfs2_dlm":OK
Loading module "ocfs2_dlmfs":OK
Mounting ocfs2_dlmfs filesystem at /dlm:OK
Starting cluster ocfs2:OK
/etc/init.d/o2cb status
Driver for "configfs": Loaded
Filesystem "configfs": Mounted
Driver for "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster ocfs2: Online
Heartbeat dead threshold = 31
Network idle timeout: 30000
Network keepalive delay: 2000
Network reconnect delay: 2000
Checking O2CB heartbeat: Active
④node1上格式化OCFS2文件系統
mkfs -t ocfs2 /dev/sdb
⑤兩個節點上分別掛載
mount -t ocfs2 /dev/sdb /var/www/html
mount
/dev/sdb on /var/www/html type ocfs2 (rw,_netdev,heartbeat=local)
cd /var/www/html
echo "Welcome" >index.html
⑥兩個節點上進行開機自動掛載
vim /etc/fstab
/dev/sdb /var/www/html ocfs2 defaults 0 0
-------------------------------------------------------------------
Step8:訪問測試
---------------
http://192.168.102.200
Welcome