corosync+pacemaker做高可用web集羣

實驗環境：

兩臺主機： centos6.5+httpd2.4+php5.5組成web的基本環境，並且web頁面訪問正常，並且確保httpd24服務不會開機啓動。

node1.mylinux.com 10.204.80.79

node2.mylinux.com 10.204.80.80

我這裏啓用ansible來方便對兩個節點的管理，啓用一臺主機做爲管理節點，IP：10.204.80.71，在三臺主機的hosts裏同都添加對應的名稱解析，管理節點對node1,node2兩個節點啓用ssh互信機制，在ansible的配置文件中把node1和node2加入webservers組。

安裝軟件：

在兩臺主機上安裝corosync和pacemaker,在管理主機上使用ansible命令的yum模塊安裝，安裝過程會要有一段時間。

# ansible webservers -m yum -a 'name=corosync,pacemaker state=installed'

這裏顯示corosync和pacemaker已經安裝過了，在這裏pacemaker是做爲corosync的一個插件來運行的，在rhel7以後的版本中,pacemaker將做爲一個獨立的守護進程來運行。

配置corosync:

安裝完corosync後會在/etc/corosync目錄下有個默認的corosync.config.example，改爲corosync.config

在corosync.config配置文件加添回pacemaker做爲corosync的插件。添加如下內容

service {
ver: 0
name: pacemaker
# use_mgmtd: yes
}

aisexec {
user: root
group: root
}

並設定此配置文件中bindnetaddr後面的ip地址爲我主機所在的網絡地址，節點的網段在10.204.80網段，設爲10.204.80.0網段 bindnetaddr:10.204.80.0，配置文件如下

# Please read the corosync.conf.5 manual page
compatibility: whitetank

totem {
	version: 2
	secauth: on
	threads: 0
	interface {
		ringnumber: 0
		bindnetaddr: 10.204.80.0
		mcastaddr: 226.94.13.10
		mcastport: 5405
		ttl: 1
	}
}

logging {
	fileline: off
	to_stderr: no
	to_logfile: yes
	to_syslog: yes
	logfile: /var/log/cluster/corosync.log
	debug: off
	timestamp: on
	logger_subsys {
		subsys: AMF
		debug: off
	}
}

amf {
	mode: disabled
}
service {
      ver: 0
      name: pacemaker
       # use_mgmtd:yes
}
aisexec {
     user: root
     group: root
}

這裏要注意的是bindnetaddr的地址只設本地網絡的網段，否則有可能會報錯 corosync [TOTEM ] The network interface is down，以至節點心跳傳遞不過來。參考官方文檔，說得很明白。

bindnetaddr
              This specifies the network address the corosync executive should bind to. For example, if the local interface is 192.168.5.92 with netmask
              255.255.255.0, set bindnetaddr to 192.168.5.0.   If the local interface is 192.168.5.92 with netmask 255.255.255.192, set bindnetaddr to
              192.168.5.64, and so forth.

生成節點間通信時的認證密鑰文件：

#corosync-kegen 按幾個回車，就會生成authkey的密鑰文件

把corosync.conf和authkey拷到node2結點的/etc/corosync/目錄下

配置接口：

在rhel6.3以前常用的版本是crmsh, 是一個資源管理器的配置接口，在rhel6.4以後用的是pcs，這裏可以兩種都用，對比一下。因此crmsh在centos6.5的yum庫中沒有，所以要去下載安裝crmsh或自己編譯安裝crmsh。

下載地址：http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/x86_64/

需要下載的有三個文件crmsh-2.1-1.6.x86_64.rpm ， pssh-2.3.1-4.1.x86_64.rpm ，python-pssh-2.3.1-4.1.x86_64.rpm；因爲crmsh需要依賴pssh，所以也需要下載

安裝配置接口：安裝crmssh除了下載的幾個文件外，還有其他的python的依賴關係，所以用yum localinstall來安裝解決依賴關係。把這三個文件放在/root目錄下，然後安裝

# yum --nogpgcheck localinstall crmsh-2.1-1.6.x86_64.rpm pssh-2.3.1-4.1.x86_64.rpm python-pssh-2.3.1-4.1.x86_64.rpm

安裝完成，輸入crm 進入crm shell

在node1上啓動corosync

在節點1上啓動節點2

啓動完成，使用命令 # crm status 查看節點狀態

節點1和節點2 都已經正常啓動，並且集羣處於正常工作狀態。

執行ps auxf 命令可以查看corosync啓動的各相關進程

配置集羣的工作屬性：

corosync默認啓用了stonith，當前這個集羣沒有相應的stonith設備，於是首先要禁用stonith

#crm configure show 顯示當前集羣的配置信息，可以看到stonith已經禁用了

給集羣添加資源：

corosync支持的資源代理有lsb和ocf等，lsb就是一些服務的啓動腳本類的，如/etc/init.d/目錄下的一些腳本。查看當前集羣系統所支持的類型：

#crm ra classes

stonith是專門爲配置stonith設備而用

查看某種類型下的所有資源代理的列表，可以用#crm ra list lsb | ocf (heartbeat | pacmemaker) | stonith 來查看。

一個web集羣所需要的資源有 vip，web服務，

我這兩個節點的vip爲：10.204.80.88 web服務爲 http24

crm的配置命令的語法：

primitive <rsc> [<class>:[<provider>:]]<type>

[params attr_list]

[operations id_spec]

[op op_type [<attribute>=<value>...] ...]

op_type :: start | stop | monitor

primitive 主資源

格式 primitive ID (資源名稱) 資源代理

參數 params

meta 元屬性信息

utilization 使用信息

operations 操作信息操作類型 start |stop | monitor(監控，定義資源如果停止了是否會被發現，而且傳遞給集羣並且是否重新啓運它的)

添加一個IP=10.204.80.88爲主資源

把httpd24資源做爲web服務資源。檢測服務 verify，並提交commit

查看資源狀態

顯示有兩個節點，兩個資源設置，

因爲沒有把資源定義爲組，並且沒有定義約束，所以默認兩個資源分別運行在兩個節點上。

把這兩個資源定義到同一個組

在#crm 然後輸入configure 進入配置模式，然後輸入 help group

可以查看幫助文檔，有group命令的詳細說明和示例

group語法

group <name> <rsc> [<rsc>...]
[description=<description>]
[meta attr_list]
[params attr_list]

attr_list :: [$id=<id>] <attr>=<val> [<attr>=<val>...] | $id-ref=<id>

把webip 和webserverf定義到同一個組webgroup裏面，然後查看集羣狀態

兩個資源都運行在同一個節點node1上面，訪問測試

把node1關機後，發現資源沒有轉移到node2節點上，查看集羣狀態#crm status

發現在集羣狀態已經是“WITHOUT quorum”已經失去法定票數，已經不是處於集羣狀態，但我這裏只有兩個節點，這樣是不合理的，可以以下命令忽略quorum不能滿足的集羣狀態檢查；

# crm configure property no-quorum-policy=ignore

查看集羣狀態

集羣狀態已經在節點2上運行了

web頁面訪問正常

把node1節點啓動起來

node1和node2都在線，但資源並沒有轉移回到node1節點上，因爲這裏並沒有定義資源粘性

定義資源的約束有三種：

location 位置約束：定義資源可以、不可以或儘可能在哪些節點上運行

order 次序約束：約束定義集羣資源在節點上的啓動順序

colocation 排列約束：排列約束用以定義集羣資源或以或不可以在某個節點上同時運行

我這裏定義webgroup默認運行在node1節點上

#crm configure prefer_node1 webserver 200: node1.mylinux.com

這裏要注意的是: 後面跟節點的時候必須要有究竟間隔，否則會報錯，

查看配置文件

順序約束，要定義強制要求先啓動webip 然後再啓動http服務，命令如下

#crm configure order httpd-after-ip mandatory: webip webserver

corosync和pacemaker還有其他的功能強大功能，如果說健康狀態檢查，每隔多久檢查一下，等等，這裏先說到這

corosync+pacemaker做高可用web集羣

進程管理（一）

RHCS&集羣文件系統GFS2&CLVM

我的友情鏈接

搭建LNMP發佈ecshop系統及壓測啓用opcache緩存與否的情況

LVM

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結