corosync+pacemaker實驗記錄

OS: RHEL 6.5 64bit
corosync: 1.4.7 --yum方式安裝
pacemaker: 1.1.2 --作爲corosync依賴包自動安裝

pacemaker是heartbeat發展到3.0獨立出來的產物，紅帽6.0系列上，使用yum安裝corosync，默認會安裝pacemaker作爲CRM。

pacemaker常用配置工具：crmsh pcs
crmsh需要獨立安裝rpm包

主要配置文件：

/etc/corosync/corosync.conf
/etc/crm/crm.conf

實驗主機：A B

資源    RA提供方(crm->ra->classes;crm->ra->info XXX)
webip    ocf:heartbeat:Ipaddr2
webfs    ocf:heartbeat:Filesystem
webdb    lsb:mysqld
apache    ocf:heartbeat:apache(params比較多)

定義資源：
crm->configure->group/primitive

primitive apache ocf:heartbeat:apache \
params configfile="/usr/local/apache2/conf/httpd.conf" \
httpd="/usr/local/apache2/bin/httpd" \
port="80" statusurl="http://127.0.0.1/server-status" \
op monitor timeout=20s interval=10s \
op start timeout=40s \
op stop timeout=60s \
op status timeout=30s

定義約束：
crm->configure->colocation/order/其他

order start_order Mandatory: webfs:start webdb:start apache:start
關鍵字    ID    kind或者分數：[資源ID:選項]..

展示配置：
crm->configure->show/show xml

編輯配置文件：
crm->configure->edit

注：配置完成請先執行verify命令，再執行commit

standby/online:
crm->node->standby [node名，默認爲本地節點] /online

監控：
crm->status
crm_mon

重要的地方：
1.請在一臺主機上啓動和停止本地和其他節點（ssh互信方式）的corosync服務！否則有可能出現腦裂的情況。（在實驗中，在A節點進行先啓後停corosync服務再啓操作後，出現了與B節點資源爭用的情況；而在B節點上通過SSH啓動A節點服務，則沒有出現資源爭用，原因不明。）
2.雙機情況下應配置票數不足時集羣策略，如：
crm->configure->property no-quorum-policy=ignore
否則資源將無法切換

有趣的地方：
本次實驗中，我使用了共享文件系統/share下的目錄www作爲web服務器虛擬地址的DocumentRoot:

<VirtualHost *:80>
    ServerAdmin [email protected]
    DocumentRoot "/share/www/"
    ServerName www.test.kc
    ServerAlias test.kc
    ErrorLog "logs/test.kc-error_log"
    CustomLog "logs/test.kc-access_log" common
</VirtualHost>

在實驗過程對主機服務多次啓停的操作過程中，多次出現了資源apache無法正常啓動，並標註爲unmanaged，而其他資源能夠成功切換的情況：

Online: [ ha-test1 ha-test2 ]
webip   (ocf::heartbeat:IPaddr2):       Started ha-test2
webfs   (ocf::heartbeat:Filesystem):    Started ha-test2
webdb   (lsb:mysqld):   Started ha-test2
apache  (lsb:apache2):  FAILED ha-test1 (unmanaged)
Failed actions:
    apache_stop_0 on ha-test1 'unknown error' (1): call=611, status=complete, last-rc-change='Sun Jan  1 17:51:51 2017', queued=0ms, exec=51ms
    apache_stop_0 on ha-test1 'unknown error' (1): call=611, status=complete, last-rc-change='Sun Jan  1 17:51:51 2017', queued=0ms, exec=51ms

接下來怎麼做呢？
首先執行service corosync stop，會一直提示unload，猜想原因是因爲共享文件系統掛載在A節點，而apache服務運行在B節點，B節點上執行apache服務的stop操作，會因爲找不到配置文件中配置的共享文件存儲路徑而報錯，無法完成stop操作：

Syntax error on line 66 of /usr/local/apache2/conf/httpd.conf:
DocumentRoot must be a directory

於是我只能想到用殺進程的方式來強制停止corosync了：

ps -ef | grep corosync | grep -v grep |awk '{print $2}'  | xargs kill -9

後來我一想，既然找不到該路徑，導致關不掉apache，那我給他mkdir一個本地的目錄讓他找好了，在A/B主機上執行：

mkdir /share/www

再次啓動雙機的corosync服務時，多次啓停操作，apache資源都成功切過去了，再沒有出現上面的情況。
不知道算不算解決了問題，歡迎指正。

corosync+pacemaker實驗記錄

記一次MySQL Slave庫恢復實戰記錄

一次Mysql slave庫恢復實戰記錄

關於在openstack中創建實例時任務一直卡在Building的現象描述

郵件系統服務器搭建記錄（五）（Postfix+Cyrus-sasl+Courier-authlib+Dovecot+ExtMail+MySQL）

hadoop2.6.5+sqoop1.4.6環境部署與測試（三）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結