紅帽集羣套件RHCS四部曲(測試篇)

集羣配置完成後,如何知道集羣已經配置成功了呢,下面我們就分情況測試RHCS提供的高可用集羣和存儲集羣功能。

一、 高可用集羣測試
在前面的文章中,我們配置了四個節點的集羣系統,每個節點的主機名分別是web1、web2、Mysql1、Mysql2,四個節點之間的關係是:web1和web2組成web集羣,運行webserver服務,其中web1是主節點,正常狀態下服務運行在此節點,web2是備用節點;Mysql1和Mysql2組成Mysql集羣,運行mysqlserver服務,其中,Mysql1是主節點,正常狀態下服務運行在此節點,Mysql2爲備用節點。
下面分四種情況介紹當節點發生宕機時,集羣是如何進行切換和工作的。

1 節點web2宕機時
宕機分爲正常關機和異常宕機兩種情況,下面分別說明。

(1) 節點web2正常關機
在節點web2上執行正常關機命令:
[root@web2 ~]#init 0
然後在web1節點查看/var/log/messages日誌,輸出信息如下:
Aug 24 00:57:09 web1 clurgmgrd[3321]: <notice> Member 1 shutting down
Aug 24 00:57:17 web1 qdiskd[2778]: <info> Node 1 shutdown
Aug 24 00:57:29 web1 openais[2755]: [TOTEM] The token was lost in the OPERATIONAL state.
Aug 24 00:57:29 web1 openais[2755]: [TOTEM] Receive multicast socket recv buffer size (320000 bytes).
Aug 24 00:57:29 web1 openais[2755]: [TOTEM] Transmit multicast socket send buffer size (219136 bytes).
Aug 24 00:57:29 web1 openais[2755]: [TOTEM] entering GATHER state from 2.
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] entering GATHER state from 0.
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] Creating commit token because I am the rep.
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] Saving state aru 73 high seq received 73
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] Storing new sequence id for ring bc8
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] entering COMMIT state.
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] entering RECOVERY state.
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] position [0] member 192.168.12.230:
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] previous ring seq 3012 rep 192.168.12.230
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] aru 73 high delivered 73 received flag 1
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] position [1] member 192.168.12.231:
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] previous ring seq 3012 rep 192.168.12.230
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] aru 73 high delivered 73 received flag 1
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] position [2] member 192.168.12.232:
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] previous ring seq 3012 rep 192.168.12.230
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] aru 73 high delivered 73 received flag 1
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] Did not need to originate any messages in recovery.
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] Sending initial ORF token
Aug 24 00:57:49 web1 openais[2755]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 24 00:57:49 web1 openais[2755]: [CLM  ] New Configuration:
Aug 24 00:57:49 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.230)  
Aug 24 00:57:49 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.231)  
Aug 24 00:57:49 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.232)  
Aug 24 00:57:49 web1 openais[2755]: [CLM  ] Members Left:
Aug 24 00:57:49 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.240)  
Aug 24 00:57:49 web1 kernel: dlm: closing connection to node 1

Aug 24 00:57:49 web1 openais[2755]: [CLM  ] Members Joined:
Aug 24 00:57:49 web1 openais[2755]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 24 00:57:49 web1 openais[2755]: [CLM  ] New Configuration:
Aug 24 00:57:49 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.230)  
Aug 24 00:57:49 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.231)  
Aug 24 00:57:49 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.232)  
Aug 24 00:57:49 web1 openais[2755]: [CLM  ] Members Left:
Aug 24 00:57:49 web1 openais[2755]: [CLM  ] Members Joined:
Aug 24 00:57:49 web1 openais[2755]: [SYNC ] This node is within the primary component and will provide service.
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] entering OPERATIONAL state.
Aug 24 00:57:49 web1 openais[2755]: [CLM  ] got nodejoin message 192.168.12.230
Aug 24 00:57:49 web1 openais[2755]: [CLM  ] got nodejoin message 192.168.12.231
Aug 24 00:57:49 web1 openais[2755]: [CLM  ] got nodejoin message 192.168.12.232
Aug 24 00:57:49 web1 openais[2755]: [CPG  ] got joinlist message from node 3
Aug 24 00:57:49 web1 openais[2755]: [CPG  ] got joinlist message from node 4
Aug 24 00:57:49 web1 openais[2755]: [CPG  ] got joinlist message from node 2
從輸出日誌可以看出,當web2節點正常關機後,qdiskd進程立刻檢測到web2節點已經關閉,然後dlm鎖進程正常關閉了從web2的連接,由於是正常關閉節點web2,所以RHCS認爲整個集羣系統沒有發生異常,僅僅把節點web2從集羣中隔離而已。請重點查看日誌中斜體部分。
重新啓動web2節點,然後在web1上繼續觀察日誌信息:
Aug 24 01:10:50 web1 openais[2755]: [TOTEM] entering GATHER state from 11.
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] Creating commit token because I am the rep.
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] Saving state aru 2b high seq received 2b
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] Storing new sequence id for ring bcc
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] entering COMMIT state.
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] entering RECOVERY state.
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] position [0] member 192.168.12.230:
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] previous ring seq 3016 rep 192.168.12.230
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] aru 2b high delivered 2b received flag 1
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] position [1] member 192.168.12.231:
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] previous ring seq 3016 rep 192.168.12.230
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] aru 2b high delivered 2b received flag 1
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] position [2] member 192.168.12.232:
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] previous ring seq 3016 rep 192.168.12.230
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] aru 2b high delivered 2b received flag 1
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] position [3] member 192.168.12.240:
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] previous ring seq 3016 rep 192.168.12.240
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] aru c high delivered c received flag 1
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] Did not need to originate any messages in recovery.
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] Sending initial ORF token
Aug 24 01:10:51 web1 openais[2755]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 24 01:10:51 web1 openais[2755]: [CLM  ] New Configuration:
Aug 24 01:10:51 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.230)  
Aug 24 01:10:51 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.231)  
Aug 24 01:10:51 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.232)  
Aug 24 01:10:51 web1 openais[2755]: [CLM  ] Members Left:
Aug 24 01:10:51 web1 openais[2755]: [CLM  ] Members Joined:
Aug 24 01:10:51 web1 openais[2755]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 24 01:10:51 web1 openais[2755]: [CLM  ] New Configuration:
Aug 24 01:10:51 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.230)  
Aug 24 01:10:51 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.231)  
Aug 24 01:10:51 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.232)  
Aug 24 01:10:51 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.240)  
Aug 24 01:10:51 web1 openais[2755]: [CLM  ] Members Left:
Aug 24 01:10:51 web1 openais[2755]: [CLM  ] Members Joined:
Aug 24 01:10:51 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.240)  
Aug 24 01:10:51 web1 openais[2755]: [SYNC ] This node is within the primary component and will provide service.
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] entering OPERATIONAL state.
Aug 24 01:10:51 web1 openais[2755]: [CLM  ] got nodejoin message 192.168.12.230
Aug 24 01:10:51 web1 openais[2755]: [CLM  ] got nodejoin message 192.168.12.231
Aug 24 01:10:51 web1 openais[2755]: [CLM  ] got nodejoin message 192.168.12.232
Aug 24 01:10:51 web1 openais[2755]: [CLM  ] got nodejoin message 192.168.12.240
Aug 24 01:10:51 web1 openais[2755]: [CPG  ] got joinlist message from node 3
Aug 24 01:10:51 web1 openais[2755]: [CPG  ] got joinlist message from node 4
Aug 24 01:10:51 web1 openais[2755]: [CPG  ] got joinlist message from node 2
Aug 24 01:10:55 web1 kernel: dlm: connecting to 1
從輸出可知,重新啓動節點web2後,openais底層通信進程檢測到web2節點已經激活,並且將web2重新加入集羣中,請重點查看日誌中斜體部分。

(2) 節點web2異常宕機
在節點web2上執行如下命令,讓內核崩潰:
[root@web2 ~]#echo c>/proc/sysrq-trigger
然後在節點Mysql1上查看/var/log/messages日誌,信息如下:
Aug 24 02:26:16 Mysql1 openais[2649]: [TOTEM] entering GATHER state from 12.
Aug 24 02:26:28 Mysql1 qdiskd[2672]: <notice> Node 1 evicted
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] entering GATHER state from 11.
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] Saving state aru 78 high seq received 78
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] Storing new sequence id for ring bd0
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] entering COMMIT state.
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] entering RECOVERY state.
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] position [0] member 192.168.12.230:
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] previous ring seq 3020 rep 192.168.12.230
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] aru 78 high delivered 78 received flag 1
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] position [1] member 192.168.12.231:
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] previous ring seq 3020 rep 192.168.12.230
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] aru 78 high delivered 78 received flag 1
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] position [2] member 192.168.12.232:
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] previous ring seq 3020 rep 192.168.12.230
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] aru 78 high delivered 78 received flag 1
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] Did not need to originate any messages in recovery.
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ] New Configuration:
Aug 24 02:26:36 Mysql1 kernel: dlm: closing connection to node 1
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ]   r(0) ip(192.168.12.230)  
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ]   r(0) ip(192.168.12.231)  
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ]   r(0) ip(192.168.12.232)  
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ] Members Left:
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ]   r(0) ip(192.168.12.240)  
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ] Members Joined:
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ] New Configuration:
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ]   r(0) ip(192.168.12.230)  
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ]   r(0) ip(192.168.12.231)  
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ]   r(0) ip(192.168.12.232)  
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ] Members Left:
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ] Members Joined:
Aug 24 02:26:36 Mysql1 openais[2649]: [SYNC ] This node is within the primary component and will provide service.
Aug 24 02:26:36 Mysql1 fenced[2688]: web2 not a cluster member after 0 sec post_fail_delay
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] entering OPERATIONAL state.
Aug 24 02:26:36 Mysql1 fenced[2688]: fencing node "web2"
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ] got nodejoin message 192.168.12.230
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ] got nodejoin message 192.168.12.231
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ] got nodejoin message 192.168.12.232
Aug 24 02:26:36 Mysql1 openais[2649]: [CPG  ] got joinlist message from node 3
Aug 24 02:26:36 Mysql1 openais[2649]: [CPG  ] got joinlist message from node 4
Aug 24 02:26:36 Mysql1 openais[2649]: [CPG  ] got joinlist message from node 2
Aug 24 02:26:45 Mysql1 fenced[2688]: fence "web2" success
Aug 24 02:26:45 Mysql1 kernel: GFS2: fsid=mycluster:my-gfs2.2: jid=3: Trying to acquire journal lock...
Aug 24 02:26:45 Mysql1 kernel: GFS2: fsid=mycluster:my-gfs2.2: jid=3: Looking at journal...
Aug 24 02:26:45 Mysql1 kernel: GFS2: fsid=mycluster:my-gfs2.2: jid=3: Done

從輸出信息可知,qdiskd首先檢測到web2出現異常,然後將它從集羣中隔離,由於是異常宕機,所以RHCS爲了保證集羣資源的唯一性,必須重置web2節點,於是fenced進程啓動, 在fenced進程沒有返回成功信息之前,所有節點掛載的GFS2共享分區將無法使用,處於hung住的狀態。直到fence成功。
此時,在節點web1上查看web2的集羣狀態:
[root@web1 ~]#clustat  -m web2
Member Name ID Status
------ ----   ---- ------
web2           1 Offline
由輸出可知,web2已經處於offline狀態了。

2 節點Mysql2宕機時
節點Mysql2在正常關機和異常宕機時,RHCS的切換狀態與上面講述的web2節點情況一模一樣,這麼不在重複講述。

3 節點web1宕機時
(1) 節點web1正常關機
[root@web1 ~]# init 0
然後在節點web2上查看/var/log/messages日誌,信息如下:
Aug 24 02:06:13 web2 last message repeated 3 times
Aug 24 02:14:58 web2 clurgmgrd[3239]: <notice> Member 4 shutting down
Aug 24 02:15:03 web2 clurgmgrd[3239]: <notice> Starting stopped service service:webserver
Aug 24 02:15:05 web2 avahi-daemon[3110]: Registering new address record for 192.168.12.233 on eth0.
Aug 24 02:15:06 web2 in.rdiscd[4451]: setsockopt (IP_ADD_MEMBERSHIP): Address already in use
Aug 24 02:15:06 web2 in.rdiscd[4451]: Failed joining addresses
Aug 24 02:15:07 web2 clurgmgrd[3239]: <notice> Service service:webserver started
Aug 24 02:15:08 web2 qdiskd[2712]: <info> Node 4 shutdown

Aug 24 02:15:21 web2 openais[2689]: [TOTEM] entering GATHER state from 12.
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] entering GATHER state from 11.
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] Saving state aru b7 high seq received b7
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] Storing new sequence id for ring bd8
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] entering COMMIT state.
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] entering RECOVERY state.
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] position [0] member 192.168.12.231:
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] previous ring seq 3028 rep 192.168.12.230
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] aru b7 high delivered b7 received flag 1
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] position [1] member 192.168.12.232:
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] previous ring seq 3028 rep 192.168.12.230
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] aru b7 high delivered b7 received flag 1
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] position [2] member 192.168.12.240:
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] previous ring seq 3028 rep 192.168.12.230
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] aru b7 high delivered b7 received flag 1
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] Did not need to originate any messages in recovery.
Aug 24 02:15:41 web2 openais[2689]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 24 02:15:41 web2 openais[2689]: [CLM  ] New Configuration:
Aug 24 02:15:41 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.231)  
Aug 24 02:15:41 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.232)  
Aug 24 02:15:41 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.240)  
Aug 24 02:15:41 web2 openais[2689]: [CLM  ] Members Left:
Aug 24 02:15:41 web2 kernel: dlm: closing connection to node 4
Aug 24 02:15:41 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.230)  

Aug 24 02:15:41 web2 openais[2689]: [CLM  ] Members Joined:
Aug 24 02:15:41 web2 openais[2689]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 24 02:15:41 web2 openais[2689]: [CLM  ] New Configuration:
Aug 24 02:15:41 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.231)  
Aug 24 02:15:41 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.232)  
Aug 24 02:15:41 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.240)  
Aug 24 02:15:41 web2 openais[2689]: [CLM  ] Members Left:
Aug 24 02:15:41 web2 openais[2689]: [CLM  ] Members Joined:
Aug 24 02:15:41 web2 openais[2689]: [SYNC ] This node is within the primary component and will provide service.
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] entering OPERATIONAL state.
Aug 24 02:15:41 web2 openais[2689]: [CLM  ] got nodejoin message 192.168.12.231
Aug 24 02:15:41 web2 openais[2689]: [CLM  ] got nodejoin message 192.168.12.232
Aug 24 02:15:41 web2 openais[2689]: [CLM  ] got nodejoin message 192.168.12.240
Aug 24 02:15:41 web2 openais[2689]: [CPG  ] got joinlist message from node 2
Aug 24 02:15:41 web2 openais[2689]: [CPG  ] got joinlist message from node 3
Aug 24 02:15:41 web2 openais[2689]: [CPG  ] got joinlist message from node 1
從輸出日誌可知,節點web1正常關機後,節點web1的服務和IP資源自動切換到了節點web2上,然後由qdiskd進程將節點web1從集羣系統中隔離。由於web1節點是正常關閉,所以集羣中GFS2共享文件系統可以正常讀寫,不受web1關閉的影響。
此時,在web2查看節點web1的狀態:
[root@web2 ~]# clustat  -m web1
Member Name        ID    Status
------ ----              ----  ------
web1                4   Offline
從輸出可知,web1節點已經處於offline狀態了。
接着,登錄到節點web2,查看集羣服務和IP資源是否正常切換,操作如下:
[root@web2 ~]# clustat  -s webserver
Service Name     Owner (Last)         State        
------- ----           ----- ------            -----        
service:webserver    web2             started
[root@web2 ~]# ip addr show|grep eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
  inet 192.168.12.240/24 brd 192.168.12.255 scope global eth0
  inet 192.168.12.233/24 scope global secondary eth0
從輸出可知,集羣服務和IP地址已經成功切換到web2節點。

最後,重新啓動節點web1,然後在節點web2查看/var/log/messages日誌,信息如下:
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] entering GATHER state from 11.
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] Saving state aru 2b high seq received 2b
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] Storing new sequence id for ring bdc
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] entering COMMIT state.
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] entering RECOVERY state.
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] position [0] member 192.168.12.230:
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] previous ring seq 3028 rep 192.168.12.230
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] aru 0 high delivered 0 received flag 1
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] position [1] member 192.168.12.231:
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] previous ring seq 3032 rep 192.168.12.231
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] aru 2b high delivered 2b received flag 1
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] position [2] member 192.168.12.232:
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] previous ring seq 3032 rep 192.168.12.231
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] aru 2b high delivered 2b received flag 1
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] position [3] member 192.168.12.240:
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] previous ring seq 3032 rep 192.168.12.231
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] aru 2b high delivered 2b received flag 1
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] Did not need to originate any messages in recovery.
Aug 24 02:42:36 web2 openais[2689]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 24 02:42:36 web2 openais[2689]: [CLM  ] New Configuration:
Aug 24 02:42:36 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.231)  
Aug 24 02:42:36 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.232)  
Aug 24 02:42:36 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.240)  
Aug 24 02:42:36 web2 openais[2689]: [CLM  ] Members Left:
Aug 24 02:42:36 web2 openais[2689]: [CLM  ] Members Joined:
Aug 24 02:42:36 web2 openais[2689]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 24 02:42:36 web2 openais[2689]: [CLM  ] New Configuration:
Aug 24 02:42:36 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.230)  
Aug 24 02:42:36 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.231)  
Aug 24 02:42:36 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.232)  
Aug 24 02:42:36 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.240)  
Aug 24 02:42:36 web2 openais[2689]: [CLM  ] Members Left:
Aug 24 02:42:36 web2 openais[2689]: [CLM  ] Members Joined:
Aug 24 02:42:36 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.230)  
Aug 24 02:42:36 web2 openais[2689]: [SYNC ] This node is within the primary component and will provide service.
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] entering OPERATIONAL state.
Aug 24 02:42:36 web2 openais[2689]: [CLM  ] got nodejoin message 192.168.12.230
Aug 24 02:42:36 web2 openais[2689]: [CLM  ] got nodejoin message 192.168.12.231
Aug 24 02:42:36 web2 openais[2689]: [CLM  ] got nodejoin message 192.168.12.232
Aug 24 02:42:36 web2 openais[2689]: [CLM  ] got nodejoin message 192.168.12.240
Aug 24 02:42:36 web2 openais[2689]: [CPG  ] got joinlist message from node 3
Aug 24 02:42:36 web2 openais[2689]: [CPG  ] got joinlist message from node 1
Aug 24 02:42:36 web2 openais[2689]: [CPG  ] got joinlist message from node 2
Aug 24 02:42:40 web2 kernel: dlm: got connection from 4
Aug 24 02:43:06 web2 clurgmgrd[3239]: <notice> Relocating service:webserver to better node web1
Aug 24 02:43:06 web2 clurgmgrd[3239]: <notice> Stopping service service:webserver
Aug 24 02:43:07 web2 avahi-daemon[3110]: Withdrawing address record for 192.168.12.233 on eth0.
Aug 24 02:43:17 web2 clurgmgrd[3239]: <notice> Service service:webserver is stopped
從輸出可知,節點web1在重新啓動後,再次被加入到集羣系統中,同時停止自身的服務以及釋放IP資源。這個切換方式跟集羣設置的Failover Domain策略有關,在創建的的失敗轉移域webserver-Failover中,沒有加入“Do not fail back services in this domain”一項功能,也就是主節點在重新啓動後,自動將服務切換回來。
此時在節點web1查看/var/log/messages日誌,信息如下:
Aug 24 02:43:19 web1 clurgmgrd[3252]: <notice> stop on script "mysqlscript" returned 5 (program not installed)
Aug 24 02:43:35 web1 clurgmgrd[3252]: <notice> Starting stopped service service:webserver
Aug 24 02:43:37 web1 avahi-daemon[3126]: Registering new address record for 192.168.12.233 on eth0.
Aug 24 02:43:38 web1 in.rdiscd[4075]: setsockopt (IP_ADD_MEMBERSHIP): Address already in use
Aug 24 02:43:38 web1 in.rdiscd[4075]: Failed joining addresses
Aug 24 02:43:39 web1 clurgmgrd[3252]: <notice> Service service:webserver started
這個輸出表明,web1在重啓後,自動將集羣服務和IP資源切換回來。

(2) 節點web1異常宕機
在節點web1上執行如下命令,讓系統內核崩潰:
[root@web2 ~]#echo c>/proc/sysrq-trigger
然後在節點web2上查看/var/log/messages日誌,信息如下:
Aug 24 02:59:57 web2 openais[2689]: [TOTEM] entering GATHER state from 12.
Aug 24 03:00:10 web2 qdiskd[2712]: <notice> Node 4 evicted
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] entering GATHER state from 11.
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] Saving state aru 92 high seq received 92
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] Storing new sequence id for ring be0
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] entering COMMIT state.
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] entering RECOVERY state.
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] position [0] member 192.168.12.231:
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] previous ring seq 3036 rep 192.168.12.230
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] aru 92 high delivered 92 received flag 1
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] position [1] member 192.168.12.232:
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] previous ring seq 3036 rep 192.168.12.230
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] aru 92 high delivered 92 received flag 1
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] position [2] member 192.168.12.240:
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] previous ring seq 3036 rep 192.168.12.230
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] aru 92 high delivered 92 received flag 1
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] Did not need to originate any messages in recovery.
Aug 24 03:00:17 web2 openais[2689]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 24 03:00:17 web2 openais[2689]: [CLM  ] New Configuration:
Aug 24 03:00:17 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.231)  
Aug 24 03:00:17 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.232)  
Aug 24 03:00:17 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.240)  
Aug 24 03:00:17 web2 openais[2689]: [CLM  ] Members Left:
Aug 24 03:00:17 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.230)  
Aug 24 03:00:17 web2 kernel: dlm: closing connection to node 4

Aug 24 03:00:17 web2 openais[2689]: [CLM  ] Members Joined:
Aug 24 03:00:17 web2 openais[2689]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 24 03:00:17 web2 openais[2689]: [CLM  ] New Configuration:
Aug 24 03:00:17 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.231)  
Aug 24 03:00:17 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.232)  
Aug 24 03:00:17 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.240)  
Aug 24 03:00:17 web2 openais[2689]: [CLM  ] Members Left:
Aug 24 03:00:17 web2 openais[2689]: [CLM  ] Members Joined:
Aug 24 03:00:17 web2 openais[2689]: [SYNC ] This node is within the primary component and will provide service.
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] entering OPERATIONAL state.
Aug 24 03:00:17 web2 openais[2689]: [CLM  ] got nodejoin message 192.168.12.231
Aug 24 03:00:17 web2 openais[2689]: [CLM  ] got nodejoin message 192.168.12.232
Aug 24 03:00:17 web2 openais[2689]: [CLM  ] got nodejoin message 192.168.12.240
Aug 24 03:00:17 web2 openais[2689]: [CPG  ] got joinlist message from node 2
Aug 24 03:00:17 web2 fenced[2728]: web1 not a cluster member after 0 sec post_fail_delay
Aug 24 03:00:17 web2 openais[2689]: [CPG  ] got joinlist message from node 3
Aug 24 03:00:17 web2 fenced[2728]: fencing node "web1"
Aug 24 03:00:17 web2 openais[2689]: [CPG  ] got joinlist message from node 1
Aug 24 03:00:55 web2 fenced[2728]: fence "web1" success
Aug 24 03:00:55 web2 kernel: GFS2: fsid=mycluster:my-gfs2.3: jid=0: Trying to acquire journal lock...
Aug 24 03:00:55 web2 kernel: GFS2: fsid=mycluster:my-gfs2.3: jid=0: Looking at journal...
Aug 24 03:00:55 web2 kernel: GFS2: fsid=mycluster:my-gfs2.3: jid=0: Done
Aug 24 03:00:55 web2 clurgmgrd[3239]: <notice> Taking over service service:webserver from down member web1
Aug 24 03:00:55 web2 avahi-daemon[3110]: Registering new address record for 192.168.12.233 on eth0.
Aug 24 03:00:55 web2 clurgmgrd[3239]: <notice> Service service:webserver started
從輸出日誌可以看出,web1在異常宕機後,首先由qdiskd進程將失敗節點從集羣中隔離,然後qdiskd進程將結果返回給cman進程,cman進程接着去調用Fence進程,最後Fence進程根據事先設置好的Fence agent調用Fence設備將web1成功Fence掉,clurgmgrd進程在接到成功Fence的信息後,web2開始接管web1的服務和IP資源,同時釋放dlm鎖,GFS2文件系統可以正常讀寫。

4 節點Mysql1宕機時
節點Mysql1在正常關機和異常宕機時,RHCS的切換狀態與上面講述的web1節點宕機情況一模一樣,這麼不在重複演示。

5 四個節點任意三個宕機
這裏將web1、Mysql2、Mysql1依次異常宕機,然後在web2節點查看RHCS是如何進行切換動作的。
首先停止web1節點,查看/var/log/messages日誌,信息如下:
Aug 24 18:57:55 web2 openais[2691]: [TOTEM] entering GATHER state from 12.
Aug 24 18:58:14 web2 qdiskd[2714]: <notice> Writing eviction notice for node 4
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] entering GATHER state from 0.
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] Saving state aru 8e high seq received 8e
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] Storing new sequence id for ring c14
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] entering COMMIT state.
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] entering RECOVERY state.
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] position [0] member 192.168.12.231:
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] previous ring seq 3088 rep 192.168.12.230
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] aru 8e high delivered 8e received flag 1
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] position [1] member 192.168.12.232:
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] previous ring seq 3088 rep 192.168.12.230
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] aru 8e high delivered 8e received flag 1
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] position [2] member 192.168.12.240:
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] previous ring seq 3088 rep 192.168.12.230
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] aru 8e high delivered 8e received flag 1
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] Did not need to originate any messages in recovery.
Aug 24 18:58:15 web2 openais[2691]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 24 18:58:15 web2 openais[2691]: [CLM  ] New Configuration:
Aug 24 18:58:15 web2 openais[2691]: [CLM  ]     r(0) ip(192.168.12.231)  
Aug 24 18:58:15 web2 openais[2691]: [CLM  ]     r(0) ip(192.168.12.232)  
Aug 24 18:58:15 web2 openais[2691]: [CLM  ]     r(0) ip(192.168.12.240)  
Aug 24 18:58:15 web2 kernel: dlm: closing connection to node 4
Aug 24 18:58:15 web2 openais[2691]: [CLM  ] Members Left:
Aug 24 18:58:15 web2 openais[2691]: [CLM  ]     r(0) ip(192.168.12.230)  
Aug 24 18:58:15 web2 openais[2691]: [CLM  ] Members Joined:
Aug 24 18:58:15 web2 openais[2691]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 24 18:58:15 web2 openais[2691]: [CLM  ] New Configuration:
Aug 24 18:58:15 web2 openais[2691]: [CLM  ]     r(0) ip(192.168.12.231)  
Aug 24 18:58:15 web2 openais[2691]: [CLM  ]     r(0) ip(192.168.12.232)  
Aug 24 18:58:15 web2 openais[2691]: [CLM  ]     r(0) ip(192.168.12.240)  
Aug 24 18:58:15 web2 openais[2691]: [CLM  ] Members Left:

Aug 24 18:58:15 web2 openais[2691]: [CLM  ] Members Joined:
Aug 24 18:58:15 web2 openais[2691]: [SYNC ] This node is within the primary component and will provide service.
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] entering OPERATIONAL state.
Aug 24 18:58:15 web2 openais[2691]: [CLM  ] got nodejoin message 192.168.12.231
Aug 24 18:58:15 web2 openais[2691]: [CLM  ] got nodejoin message 192.168.12.232
Aug 24 18:58:15 web2 openais[2691]: [CLM  ] got nodejoin message 192.168.12.240
Aug 24 18:58:15 web2 fenced[2730]: web1 not a cluster member after 0 sec post_fail_delay
Aug 24 18:58:15 web2 openais[2691]: [CPG  ] got joinlist message from node 3
Aug 24 18:58:15 web2 fenced[2730]: fencing node "web1"
Aug 24 18:58:15 web2 openais[2691]: [CPG  ] got joinlist message from node 1
Aug 24 18:58:15 web2 openais[2691]: [CPG  ] got joinlist message from node 2
Aug 24 18:58:17 web2 qdiskd[2714]: <notice> Node 4 evicted

Aug 24 18:58:29 web2 fenced[2730]: fence "web1" success
Aug 24 18:58:29 web2 kernel: GFS2: fsid=mycluster:my-gfs2.1: jid=3: Trying to acquire journal lock...
Aug 24 18:58:29 web2 kernel: GFS2: fsid=mycluster:my-gfs2.1: jid=3: Looking at journal...
Aug 24 18:58:29 web2 kernel: GFS2: fsid=mycluster:my-gfs2.1: jid=3: Done
Aug 24 18:58:30 web2 clurgmgrd[3300]: <notice> Taking over service service:webserver from down member web1
Aug 24 18:58:32 web2 avahi-daemon[3174]: Registering new address record for 192.168.12.233 on eth0.
Aug 24 18:58:33 web2 clurgmgrd[3300]: <notice> Service service:webserver started

通過觀察日誌可以發現,qdiskd進程會首先將節點web1從集羣隔離,然後由fenced進程將web1成功fence掉,最後,web2才接管了web1的服務和IP資源。這裏有個先後問題,也就是說,只有Fence成功,集羣資源切換纔會進行。
如果fenced進程沒有成功將web1節點fence掉,那麼RHCS會進入等待狀態,此時集羣系統GFS2共享存儲也將變得不可讀寫,直到Fence進程返回成功信息,集羣纔開始進行資源切換,同時GFS2文件系統也將恢復讀寫。
此時,在web2節點通過cman_tool查看集羣狀態,信息如下:
[root@web2 ~]# cman_tool  status
Version: 6.2.0
Config Version: 40
Cluster Name: mycluster
Cluster Id: 56756
Cluster Member: Yes
Cluster Generation: 3092
Membership state: Cluster-Member
Nodes: 3
Expected votes: 6
Quorum device votes: 2
Total votes: 5
Quorum: 3  

Active subsystems: 9
Flags: Dirty
Ports Bound: 0 177  
Node name: web2
Node ID: 1
Multicast addresses: 239.192.221.146
Node addresses: 192.168.12.240
從輸出可知,集羣節點數變爲3,同時Quorum值也有原來的4變爲3,Quorum值是通過N/2整除加一得到的,其中N爲所有集羣節點的投票數加上表決磁盤投票值。其實也就在這裏的“Total votes”值。
接着陸續將Mysql2、Mysql1異常宕機,宕機完成,在web2節點通過cman_tool查看集羣狀態,信息如下:

[root@web2 ~]# cman_tool  status
Version: 6.2.0
Config Version: 40
Cluster Name: mycluster
Cluster Id: 56756
Cluster Member: Yes
Cluster Generation: 3100
Membership state: Cluster-Member
Nodes: 1
Expected votes: 6
Quorum device votes: 2
Total votes: 3
Quorum: 2  
Active subsystems: 9

Flags: Dirty
Ports Bound: 0 177  
Node name: web2
Node ID: 1
Multicast addresses: 239.192.221.146
Node addresses: 192.168.12.240
從輸出可知,整個集羣系統節點數已經變爲一個,但是集羣系統仍然可用,GFS2仍然可以正常讀寫,這些都是Qdisk的功能,一個RHCS集羣中,如果沒有表決磁盤的話,
僅剩一個節點的集羣是不能工作的。

二、存儲集羣測試
GFS2文件系統是RHCS集羣中的共享存儲,在集羣各個節點掛載GFS2共享文件系統,然後在任意節點對共享分區進行數據的讀寫操作,例如:
在節點web2上創建一個文件ixdba:
[root@web2 gfs2]# echo "This is GFS2 Files test" >/gfs2/ixdba
然後在節點web1查看此文件
[root@web1 gfs2]# more /gfs2/ixdba
This is GFS2 Files test
可以看到,寫操作正常。
       接着測試同時讀寫文件,首先在節點web1上編輯文件ixdba,然後同時在節點web2也編輯ixdba,此時會提示該文件locked。
       到這裏爲止,RHCS所有功能點都已經測試完成了,在測試集羣功能時,觀察日誌信息是很有必要的,通過觀察日誌輸出,可以知道RHCS更加詳細的切換過程,如果切換失敗,也會在日誌中輸出,然後根據錯誤信息,就可以很容易的定位問題。
       目前,RHCS已經廣泛應用在企業的各個領域,與其它HA軟件相比,RHCS可能顯得比較複雜,管理和維護會比較困難,但是RHCS的可靠性和穩定性是毋庸置疑的,隨着人們對業務實時性和穩定性需求的不斷提升,RHCS的應用畢將越來越廣泛。

(待續)


文章來源:http://ixdba.blog.51cto.com/2895551/599020


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章