1. drbd簡介:
Distributed Replicated Block Device(DRBD)是一個用軟件實現的、無共享的、服務器之間鏡像塊設備內容的存儲複製解決方案。
數據鏡像:實時、透明、同步(所有服務器都成功後返回)、異步(本地服務器成功後返回)
DRBD的核心功能通過Linux的內核實現,最接近系統的IO棧,但它不能神奇地添加上層的功能比如檢測到EXT3文件系統的崩潰。 DRBD的位置處於文件系統以下,比文件系統更加靠近操作系統內核及IO棧。
單主模式:典型的高可靠性集羣方案。
復主模式:需要採用共享cluster文件系統,如GFS和OCFS2。用於需要從2個節點併發訪問數據的場合,需要特別配置。
複製模式:3種模式:
協議A:異步複製協議。本地寫成功後立即返回,數據放在發送buffer中,可能丟失。
協議B:內存同步(半同步)複製協議。本地寫成功並將數據發送到對方後立即返回,如果雙機掉電,數據可能丟失。
協議C:同步複製協議。本地和對方寫成功確認後返回。如果雙機掉電或磁盤同時損壞,則數據可能丟失。
一般用協議C。選擇協議將影響流量,從而影響網絡時延。
有效的同步:按線性而不是當初寫的順序同步塊。同步損壞時間內的不一致數據。
在線的設備檢驗:一端順序計算底層存儲,得到一個數字,傳給另一端,另一端也計算,如果不一致,則稍後進行同步。建議一週或一月一次。
複製過程的一致性檢驗:加密後,對方若不一致則要求重傳。防止網卡、緩衝等問題導致位丟失、覆蓋等錯誤。
Split brain:當網絡出現暫時性故障,導致兩端都自己提升爲Primary。兩端再次連通時,可以選擇email通知,建議手工處理這種情況。
2 . 案例拓撲圖
3.配置corosync+openais
3.1 配置n1和n2的ip地址和host文件
1) 配置n1的地址
2) 重啓網絡服務
[root@localhost ~]# service network restart
正在關閉接口 eth0: [確定]
關閉環回接口: [確定]
彈出環回接口: [確定]
彈出界面 eth0: [確定]
3)修改主機名爲n1.xh.com
[root@localhost ~]# vim /etc/sysconfig/network
4)修改n1的hosts文件
[root@localhost ~]# vim /etc/hosts
5)配置n2的ip地址
6)修改主機名n2.xh.com
[root@localhost ~]# vim /etc/sysconfig/network
7)修改hosts文件
[root@localhost ~]# vim /etc/hosts
3.2 配置n1和n2的無障礙通信
1)生成密鑰
[root@n1 ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
d0:0d:c4:5e:fe:b8:a9:45:75:68:f9:ad:43:03:84:9a [email protected]
2)複製公鑰文件到n2.xh.com
[root@n1 ~]# ssh-copy-id -i .ssh/id_rsa.pub [email protected]
15
Now try logging into the machine, with "ssh '[email protected]'", and check in: .ssh/authorized_keys
to make sure we haven't added extra keys that you weren't expecting.
3)測試複製n1的hosts的文件到n2
[root@n1 ~]# scp /etc/hosts n2.xh.com:/etc/hosts
hosts
4)在n2上面做同樣的配置
[root@n2 ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
4f:f2:26:4c:b9:14:85:ba:4c:c0:21:2d:a1:d1:a0:b2 [email protected]
[root@n2 ~]# ssh-copy-id -i .ssh/id_rsa.pub [email protected]
15
The authenticity of host 'n1.xh.com (192.168.20.133)' can't be established.
RSA key fingerprint is d5:1f:ad:8b:ce:a3:10:0b:97:47:41:ac:7c:92:6e:29.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'n1.xh.com,192.168.20.133' (RSA) to the list of known hosts.
[email protected]'s password:
Now try logging into the machine, with "ssh '[email protected]'", and check in:
.ssh/authorized_keys
to make sure we haven't added extra keys that you weren't expecting.
在n1上查看n2的ip配置
[root@n1 ~]# ssh n2.xh.com 'ifconfig'
ssh: n2.xifconfig: Temporary failure in name resolution
[root@n1 ~]# ssh n2.xh.com 'ifconfig'
eth0 Link encap:Ethernet HWaddr 00:0C:29:AD:82:2B
inet addr:192.168.20.134 Bcast:192.168.20.143 Mask:255.255.255.240
inet6 addr: fe80::20c:29ff:fead:822b/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2295 errors:0 dropped:0 overruns:0 frame:0
TX packets:1625 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:216705 (211.6 KiB) TX bytes:209670 (204.7 KiB)
Interrupt:67 Base address:0x2000
3.3 在n1,n2上面配置yum
1) 配置n2的yum服務器
[root@n2 ~]# vim /etc/yum.repos.d/rhel-debuginfo.repo
2)配置n1的yum 服務器
[root@n2 ~]# scp -p /etc/yum.repos.d/rhel-debuginfo.repo n1.xh.com:/etc/yum.repos.d/
rhel-debuginfo.repo
3.4 配置corosync服務
1)把所需的軟件包複製到n2上
[root@n1 ~]# scp -r corosync n2.xh.com:/root
2)在n1上安裝corosnyc和openais
[root@n1 corosync]# yum localinstall cluster-glue-1.0.6-1.6.el5.i386.rpm cluster-glue-libs-1.0.6-1.6.el5.i386.rpm corosync-1.2.7-1.1.el5.i386.rpm corosynclib-1.2.7-1.1.el5.i386.rpm heartbeat-3.0.3-2.3.el5.i386.rpm heartbeat-libs-3.0.3-2.3.el5.i386.rpm libesmtp-1.0.4-5.el5.i386.rpm pacemaker-1.1.5-1.1.el5.i386.rpm pacemaker-libs-1.1.5-1.1.el5.i386.rpm perl-TimeDate-1.16-5.el5.noarch.rpm resource-agents-1.0.4-1.1.el5.i386.rpm --nogpgcheck
[root@n1 corosync]# yum localinstall -y openais*.rpm
3)查看corosync的樣例配置文件
[root@n1 corosync]# cd /etc/corosync
[root@n1 corosync]# ll
總計 20
-rw-r--r-- 1 root root 5384 2010-07-28 amf.conf.example
-rw-r--r-- 1 root root 436 2010-07-28 corosync.conf.example
drwxr-xr-x 2 root root 4096 2010-07-28 service.d
drwxr-xr-x 2 root root 4096 2010-07-28 uidgid.d
4)生成corosync的配置文件
[root@n1 corosync]# cd /etc/corosync
[root@n1 corosync]# cp corosync.conf.example corosync.conf
5)編輯corosync的配置文件
[root@n1 corosync]# vim corosync.conf
6)corosync配置文件的說明
compatibility: whitetank //(表示兼容corosync 0.86的版本,向後兼容,兼容老的版本,一些 新的功能可能無法實用)
totem { //(圖騰的意思 ,多個節點傳遞心跳時的相關協議的信息)
version: 2 //版本號
secauth: off //是否代開安全認證
threads: 0 //多少個現成認證 0 無限制
interface {
ringnumber: 0
bindnetaddr: 192 168.1.1 //通過哪個網絡地址進行通訊.
mcastaddr: 226.94.1.1
mcastport: 5405
}
}
logging {
fileline: off
to_stderr: no //是否發送標準出錯
to_logfile: yes // 日誌
to_syslog: yes //系統日誌 (建議關掉一個),會降低性能
logfile: /var/log/cluster/corosync.log //(手動創建目錄)
debug: off //排除錯誤時可以起來
timestamp: on //日誌中是否記錄時間
以下是openais的東西,可以不用代開
logger_subsys {
subsys: AMF
debug: off
}
}
amf {
mode: disabled
}
前面只是底層的東西,因爲要用pacemaker需要下面的選項
service {
ver: 0
name: pacemaker
}
雖然用不到openais ,但是會用到一些子選項
aisexec {
user: root
group: root
}
7)創建日誌目錄
[root@n1 corosync]# mkdir /var/log/cluster
8)在n2上面安裝所需要的軟件包
[root@n2 corosync]# yum localinstall -y *.rpm –nogpgcheck
9)爲了便面其他主機加入該集羣,需要認證,生成一個authkey
[root@n1 corosync]# corosync-keygen
[root@n1 corosync]# scp -p authkey corosync.conf n2.xh.com:/etc/corosync/
authkey 100% 128 0.1KB/s 00:00
corosync.conf 100% 541 0.5KB/s 00:00
10)在另外的一個節點上創建日誌目錄
[root@n1 corosync]# ssh n2.xh.com 'mkdir /var/log/cluster'
11)在node1上啓動服務
[root@n1 corosync]# service corosync start
Starting Corosync Cluster Engine (corosync): [確定]
12)(驗證corosync引擎是否正常啓動了
[root@n1 corosync]# grep -i -e "corosync cluster engine" -e "configuration file" /var/log/messages
Oct 2 20:03:42 localhost smartd[3014]: Opened configuration file /etc/smartd.conf
Oct 2 20:03:43 localhost smartd[3014]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
Oct 3 00:10:34 localhost corosync[4592]: [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
Oct 3 00:10:34 localhost corosync[4592]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
13)查看初始化成員節點通知是否發出
[root@n1 corosync]# grep -i totem /var/log/messages
Oct 3 00:10:34 localhost corosync[4592]: [TOTEM ] Initializing transport (UDP/IP).
Oct 3 00:10:34 localhost corosync[4592]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Oct 3 00:10:35 localhost corosync[4592]: [TOTEM ] The network interface [192.168.20.133] is now up.
Oct 3 00:10:37 localhost corosync[4592]: [TOTEM ] Process pause detected for 1196 ms, flushing membership messages.
Oct 3 00:10:37 localhost corosync[4592]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
14)檢查過程中是否有錯誤產生
[root@n1 corosync]# grep -i error: /var/log/messages |grep -v unpack_resources
15)檢查pacemaker時候已經啓動了
[root@n1 corosync]# grep -i pcmk_startup /var/log/messages
Oct 3 00:10:35 localhost corosync[4592]: [pcmk ] info: pcmk_startup: CRM: Initialized
Oct 3 00:10:35 localhost corosync[4592]: [pcmk ] Logging: Initialized pcmk_startup
Oct 3 00:10:35 localhost corosync[4592]: [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295
Oct 3 00:10:35 localhost corosync[4592]: [pcmk ] info: pcmk_startup: Service: 9
Oct 3 00:10:36 localhost corosync[4592]: [pcmk ] info: pcmk_startup: Local hostname: n1.xh.com
16)節點n1上啓動n2節點
[root@n1 corosync]# ssh n2.xh.com '/etc/init.d/corosync start'
Starting Corosync Cluster Engine (corosync): [確定]
17)將前面的驗證步驟在n2節點上再次驗證一次
18) 在任何一個節點上 查看集羣的成員狀態
[root@n2 corosync]# crm status
============
Last updated: Wed Oct 3 00:39:35 2012
Stack: openais
Current DC: n1.xh.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
0 Resources configured.
============
Online: [ n1.xh.com n2.xh.com ]
3.5 提供高可用服務
1)在corosync中,定義服務可以用兩種接口
圖形接口 (使用hb—gui)
crm (pacemaker 提供一個shell)
比如查看兩個節點的日期是否同步
[root@n1 corosync]# ssh n2.xh.com 'date'
2012年 10月 03日 星期三 00:42:24 CST
[root@n1 corosync]# date
2012年 10月 03日 星期三 00:42:35 CST
[root@n1 corosync]# crm
crm(live)# configure
crm(live)configure# show
node n1.xh.com
node n2.xh.com
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2"
2)用xml格式顯示
crm(live)configure# show xml
<?xml version="1.0" ?>
<cib admin_epoch="0" crm_feature_set="3.0.5" dc-uuid="n1.xh.com" epoch="5" have-quorum="1" num_updates="18" validate-with="pacemaker-1.2">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="openais"/>
<nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="n2.xh.com" type="normal" uname="n2.xh.com"/>
<node id="n1.xh.com" type="normal" uname="n1.xh.com"/>
</nodes>
<resources/>
<constraints/>
</configuration>
</cib>
3)驗證該文件的語法錯誤
[root@n1 corosync]# crm_verify -L
crm_verify[4750]: 2012/10/03_01:02:18 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
crm_verify[4750]: 2012/10/03_01:02:18 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
crm_verify[4750]: 2012/10/03_01:02:18 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
4)可以看到有stonith錯誤,在高可用的環境裏面,會禁止實用任何支援 可以禁用stonith
[root@n1 corosync]# crm
crm(live)# configure
crm(live)configure# property stonith-enabled=false
crm(live)configure# commit
5)查看選項是否添加進去
crm(live)configure# show
node n1.xh.com
node n2.xh.com
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false"
6)再次進行語法檢查
[root@node1 corosync]# crm_verify -L
系統上有專門的stonith命令
stonith -L 顯示stonith所指示的類型
crm 可以使用交互式模式
可以執行help 查看幫助
集羣的資源類型有4種
primitive 本地主資源 (只能運行在一個節點上)
group 把多個資源軌道一個組裏面,便於管理
clone 需要在多個節點上同時啓用的 (如ocfs2 ,stonith ,沒有主次之分)
master 有主次之分,如drbd
使用list可以查看
crm(live)ra# classes
heartbeat
lsb
ocf / heartbeat pacemaker
stonith
7)可以實用list lsb 查看資源代理的腳本
crm(live)# ra
crm(live)ra# classes
heartbeat
lsb
ocf / heartbeat pacemaker
stonith
crm(live)ra# list lsb
8)查看ocf的heartbeat
crm(live)ra# list ocf heartbeat
9)配置一個資源,可以在configuration 下面進行配置
primitive webIP ocf:heartbeat:IPaddr params ip=192.168.2.137
crm(live)ra# end
crm(live)# configure
crm(live)configure# primitive webIP ocf:heartbeat:IPaddr params ip=192.168.2.137
查看剛纔的配置
crm(live)configure# show
node n1.xh.com
node n2.xh.com
primitive webIP ocf:heartbeat:IPaddr \
params ip="192.168.2.137"
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false"
crm(live)configure# commit
crm(live)configure# status
ERROR: syntax: status
crm(live)configure# end
crm(live)# status
============
Last updated: Wed Oct 3 01:23:05 2012
Stack: openais
Current DC: n1.xh.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ n1.xh.com n2.xh.com ]
webip (ocf::heartbeat:IPaddr): Started n1.xh.com
10)在n1上得到所定義的資源
[root@n1 corosync]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:0C:29:70:3F:F7
inet addr:192.168.20.133 Bcast:192.168.20.143 Mask:255.255.255.240
inet6 addr: fe80::20c:29ff:fe70:3ff7/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:50797 errors:0 dropped:0 overruns:0 frame:0
TX packets:65597 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:18088499 (17.2 MiB) TX bytes:21418409 (20.4 MiB)
Interrupt:67 Base address:0x2000
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:70:3F:F7
inet addr:192.168.20.137 Bcast:192.168.20.143 Mask:255.255.255.240
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:67 Base address:0x2000
11)在兩個節點上都要安裝httpd
[root@n1 html]# yum install httpd -y
12)安裝完畢後,可以查看httpd的lsb腳本
crm(live)ra# meta lsb:httpd
lsb:httpd
Apache is a World Wide Web server. It is used to serve \
HTML files and CGI.
Operations' defaults (advisory minimum):
start timeout=15
stop timeout=15
status timeout=15
restart timeout=15
force-reload timeout=15
monitor interval=15 timeout=15 start-delay=15
12)定義httpd的資源
crm(live)ra# end
crm(live)# configure
crm(live)configure# primitive webserver lsb:httpd
13)查看定義的httpd資源
crm(live)configure# show
node n1.xh.com
node n2.xh.com
primitive webip ocf:heartbeat:IPaddr \
params ip="192.168.20.137"
primitive webserver lsb:httpd
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false"
crm(live)configure# end
There are changes pending. Do you want to commit them? yes
crm(live)# status
============
Last updated: Wed Oct 3 03:13:27 2012
Stack: openais
Current DC: n2.xh.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ n1.xh.com n2.xh.com ]
webip (ocf::heartbeat:IPaddr): Started n1.xh.com
webserver (lsb:httpd): Started n2.xh.com
發現httpd已經啓動了,但是在node2節點上 (高級羣集服務資源越來越多,會分佈在不同的節點上,以儘量負載均衡)
需要把資源約束在同一個節點上可以把它們加入一個組
13)定義組並把 webip和webserver加入組
crm(live)configure# group web webip webserver
crm(live)configure# show
node n1.xh.com
node n2.xh.com
primitive webip ocf:heartbeat:IPaddr \
params ip="192.168.20.137"
primitive webserver lsb:httpd
group web webip webserver
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false"
crm(live)# status
============
Last updated: Wed Oct 3 03:20:46 2012
Stack: openais
Current DC: n2.xh.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ n1.xh.com n2.xh.com ]
Resource Group: web
webip (ocf::heartbeat:IPaddr): Started n1.xh.com
webserver (lsb:httpd): Started n1.xh.com
14)測試, 在節點1 和節點2 上分別創建網頁當羣集ip 發現在第一個節點上時
[root@n1 html]# service corosync stop
Signaling Corosync Cluster Engine (corosync) to terminate: [確定]
Waiting for corosync services to unload:....... [確定]
[root@node2 corosync]# crm status
============
Last updated: Mon May 7 20:16:58 2012
Stack: openais
Current DC: node2.a.com - partition WITHOUT quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node2.a.com ]
OFFLINE: [ node1.a.com ]
但是httpd服務沒有啓動是因爲n2沒有票數
關閉 quorum
可選的參數有如下 ignore (忽略)
freeze (凍結,表示已經啓用的資源繼續實用,沒有啓用的資源不能啓用)
stop(默認)
suicide (所有的資源殺掉)
將節點1 的corosync 服務啓動起來
改變quorum
crm(live)configure# property no-quorum-policy=ignore
cimmit
crm(live)# show (再次查看quorum 的屬性)
node n1.xh.com
node n2.xh.com
primitive webip ocf:heartbeat:IPaddr \
params ip="192.168.20.137"
primitive webserver lsb:httpd
group web webip webserver
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
15)停止n1的corosync服務
[root@n1 html]# service corosync stop
Signaling Corosync Cluster Engine (corosync) to terminate: [確定]
Waiting for corosync services to unload:.......... [確定]
這時已經看到資源已經轉移到n2上面
[root@n2 html]# crm status
============
Last updated: Wed Oct 3 03:41:01 2012
Stack: openais
Current DC: n2.xh.com - partition WITHOUT quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ n2.xh.com ]
OFFLINE: [ n1.xh.com ]
Resource Group: web
webip (ocf::heartbeat:IPaddr): Started n2.xh.com
webserver (lsb:httpd): Started n2.xh.com
4. 配置drbd實現數據的一致性
1)在兩個節點上建相應的分區,大小一致如下所示
在n1上
Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 * 1 65 522081 83 Linux
/dev/sda2 66 1370 10482412+ 83 Linux
/dev/sda3 1371 1631 2096482+ 82 Linux swap / Solaris
/dev/sda4 1632 2610 7863817+ 5 Extended
/dev/sda5 1632 1754 987966 83 Linux
在n2上
Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 * 1 65 522081 83 Linux
/dev/sda2 66 1370 10482412+ 83 Linux
/dev/sda3 1371 1631 2096482+ 82 Linux swap / Solaris
/dev/sda4 1632 2610 7863817+ 5 Extended
/dev/sda5 1632 1754 987966 83 Linux
2)在兩個節點使系統重新識別分區
[root@n1 ~]# partprobe /dev/sda
[root@n2 ~]# partprobe /dev/sda
3)在兩個節點安裝drbd軟件包
[root@n1 ~]# yum localinstall drbd83-8.3.8-1.el5.centos.i386.rpm -y –nogpgcheck
[root@n1 ~]# yum localinstall kmod-drbd83-8.3.8-1.el5.centos.i686.rpm -y –nogpgcheck
4)生成drbd的配置文件
[root@n1 drbd83-8.3.8]# cp drbd.conf /etc/
cp:是否覆蓋“/etc/drbd.conf”? y
5)備份並編輯global_common.conf文件
[root@n1 drbd.d]# cp global_common.conf global_common.conf.bak
global {
usage-count yes; //改成no
# minor-count dialog-refresh disable-ip-verification
}
把
startup {
# wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
}
改成
startup {
wfc-timeout 120;
degr-wfc-timeout 120;
}
把
disk {
# on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes
# no-disk-drain no-md-flushes max-bio-bvecs
}
改成
disk {
on-io-error detach;
fencing resource-only;
}
把
net {
# sndbuf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers
# max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret
# after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-cork
}
改成
net {
cram-hmac-alg "sha1";
shared-secret "mydrbdlab";
}
把
syncer {
# rate after al-extents use-rle cpu-mask verify-alg csums-alg
}
改成
syncer {
rate 100M;
}
修改完的結果如下
[root@node1 drbd.d]# vim global_common.conf
global {
usage-count yes;
# minor-count dialog-refresh disable-ip-verification
}
common {
protocol C;
startup {
wfc-timeout 120;
degr-wfc-timeout 120;
}
disk {
on-io-error detach;
fencing resource-only;
}
net {
cram-hmac-alg "sha1";
shared-secret "mydrbdlab";
}
syncer {
rate 100M;
}
}
6)創建資源文件其內容如下
[root@n1 drbd.d]# vim web.res
resource web {
on n1.xh.com {
device /dev/drbd0;
disk /dev/sda5;
address 192.168.20.133:7789;
meta-disk internal;
}
on n2.xh.com {
device /dev/drbd0;
disk /dev/sda5;
address 192.168.20.134:7789;
meta-disk internal;
}
}
7)爲確保配置文件的一致性把配置文件複製n2上
[root@n1 drbd.d]# scp global_common.conf n2.xh.com:/etc/drbd.d
[root@n1 drbd.d]# scp web.res n2.xh.com:/etc/drbd.d
[root@n1 drbd.d]# drbdadm create-md web
初始化資源,在n1和n2上分別執行:drbdadm create-md web
[root@n1 drbd.d]# service drbd start
[root@n1 drbd.d]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:16
0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:987896
都爲second 狀態,沒有同步
也可以使用 drbd-overview
把n1設爲主在n1上執行(只在第一次時使用此命令)
[root@n1 drbd.d]# drbdadm -- --overwrite-data-of-peer primary web
[root@n1 drbd.d]# watch -n 1 'cat /proc/drbd'
8)格式化drbd0
[root@n1 drbd.d]# mkfs -t ext3 -L drbdweb /dev/drbd0
9) 測試掛載drbd0
[root@n1 drbd.d]# mkdir /web
[root@n1 drbd.d]# mount /dev/drbd0 /web/
10) 創建index.html文件
[root@n1 web]# vim index.html
11)把n1變成從的,node2 變成主的
[root@n1 ~]# umount /web
[root@n1 ~]# drbdadm secondary web
[root@n1 ~]# drbdadm role web
Secondary/Secondary
在node2 節點上
[root@n2 ~]# mkdir /web
[root@n2 ~]# drbdadm primary web
[root@n2 ~]# drbd-overview
0:web Connected Primary/Secondary UpToDate/UpToDate C r----
[root@n2 ~]# mount /dev/drbd0 /web
[root@n2 ~]# drbdadm role web
Primary/Secondary
12) 此時能看見index.html數據已經同步
[root@n2 ~]# cd /web
[root@n2 web]# ll
總計 20
-rw-r--r-- 1 root root 17 10-03 06:53 index.html
drwx------ 2 root root 16384 10-03 05:51 lost+found
13)卸載web
[root@n2 web]# cd
[root@n2 ~]# umount /web
[root@n2 ~]# drbdadm secondary web
關閉drbd服務,並確保兩節點上的drbd服務不會開機自動啓動
service drbd stop
ssh node2 – 'service drbd stop'
chkconfig drbd off
ssh node2 -- 'chkconfig drbd off'
5. 配置drbd服務爲高可用
5.1 將已經配置好的drbd設備/dev/drbd0定義爲集羣服務;
1)配置drbd爲集羣資源:
drbd需要同時運行在兩個節點上,但只能有一個節點(primary/secondary模型)是Master,而另一個節點爲Slave;因此,它是一種比較特殊的集羣資源,其資源類型爲多狀態(Multi-state)clone類型,即主機節點有Master和Slave之分,且要求服務剛啓動時兩個節點都處於slave狀態。
[root@n1 ~]# crm
crm(live)# configure
crm(live)configure# primitive drbd ocf:heartbeat:drbd params drbd_resource=web op monitor role=Master interval=30s timeout=40s op monitor role=Slave interval=50s timeout=40s
crm(live)configure# master MS_Webdrbd drbd meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
crm(live)configure# verify
crm(live)configure# commit
此時查看資源的運行狀態
2)爲Primary節點上的web資源創建自動掛載的集羣服務
MS_Webdrbd的Master節點即爲drbd服務web資源的Primary節點,此節點的設備/dev/drbd0可以掛載使用,且在某集羣服務的應用當中也需要能夠實現自動掛載。假設我們這裏的web資源是爲Web服務器集羣提供網頁文件的共享文件系統,將其掛載至/web目錄。
因爲此自動掛載的集羣資源需要運行於drbd服務的Master節點上,並且只能在drbd服務將某節點設置爲Primary以後方可啓動,所以還需要爲這兩個資源建立排列約束和順序約束。
# crm
crm(live)# configure
crm(live)configure# primitive fs ocf:heartbeat:Filesystem params device="/dev/drbd0" directory="/web" fstype="ext3"
crm(live)configure# colocation drbd_fs inf: fs MS_Webdrbd:Master
crm(live)configure# order after_fs inf: MS_Webdrbd:promote fs:start
crm(live)configure# verify
crm(live)configure# commit
查看資源運行狀態
3)在n1的/web目錄創建一個網頁文件,而後在故障故障轉移後查看n2的/web目錄下是否存在這些文件。
[root@n1 ~]# vim /web/index.html
4) 在兩個節點做基於主機名的虛擬主機
[root@n1 ~]# vim /etc/httpd/conf/httpd.conf
[root@n2 ~]# vim /etc/httpd/conf/httpd.conf
5)在兩個節點重啓httpd服務之後再關閉httpd服務後訪問網頁
6)模擬n1節點故障,看此些資源可否正確轉移至n2。
n1:
crm node standby
由於我們在/etc/drbd.d/global_common.conf配置文件中開啓了資源隔離和腦列處理機制,所以在crm的配置文件cib中將會自動出現一個位置約束配置,當主節點宕機之後,禁止從節點變爲主節點,以免當主節點恢復的時候產生腦列,進行資源爭用,但是我們此時只是爲了驗證資源能夠流轉,所以將這個位置約束刪除:
crm(live)configure# edit
可以看到我們內有定義的一個以location開頭的位置約束,打開這個配置文件,將這一行刪除即可,保存退出,提交設置:
crm(live)configure# commit
然後在n2上進行狀態查看
此時訪問網頁
7)最後把組web和drbd的primary資源約束在一起。加入下圖標記哪一行。
#crmcrm configure edit
8)配置完畢。