corosync,pacemaker,crm集羣安裝配置

一,概述:

什麼是AIS和OpenAIS?

AIS是應用接口規範,是用來定義應用程序接口(API)的開放性規範的集合,這些應用程序作爲中間件爲應用服務提供一種開放、高移植性的程序接口。是在實現高可用應用過程中是亟需的。服務可用性論壇(SA Forum)是一個開放性論壇,它開發併發布這些免費規範。使用AIS規範的應用程序接口(API),可以減少應用程序的複雜性和縮短應用程序的開發時間,這些規範的主要目的就是爲了提高中間組件可移植性和應用程序的高可用性。

OpenAIS是基於SA Forum 標準的集羣框架的應用程序接口規範。OpenAIS提供一種集羣模式,這個模式包括集羣框架,集羣成員管理,通信方式,集羣監測等,能夠爲集羣軟件或工具提供滿足 AIS標準的集羣接口,但是它沒有集羣資源管理功能,不能獨立形成一個集羣。


corosync簡介

corosync最初只是用來演示OpenAIS集羣框架接口規範的一個應用,可以說corosync是OpenAIS的一部分,但後面的發展明顯超越了官方最初的設想,越來越多的廠商嘗試使用corosync作爲集羣解決方案。如Redhat的RHCS集羣套件就是基於corosync實現。

corosync只提供了message layer,而沒有直接提供CRM,一般使用Pacemaker進行資源管理。


CRM中的幾個基本概念

資源粘性:

       資源粘性表示資源是否傾向於留在當前節點,如果爲正整數,表示傾向,負數則會離開,-inf表示正無窮,inf表示正無窮。


資源類類型:

  primitive (native):基本資源,原始資源

  group:資源組

  clone:克隆資源(可同時運行在多個節點上),要先定義爲primitive後才能進行clone。主要包含                             STONITH和集羣文件系統(cluster filesystem)

  master/slave:主從資源,如drbd



RA類型:

      lsb:linux表中庫,一般位於/etc/rc.d/init.d/目錄下的支持start|stop|status等參數的服務腳本都是lsb

      ocf:Open cluster Framework,開放集羣架構

      heartbeat:heartbeat V1版本

      stonith:專爲配置stonith設備而用

集羣類型和模型:

corosync+pacemaker可實現多種集羣模型,包括 Active/Active, Active/Passive, N+1, N+M, N-to-1 and N-to-N。

       Active/Passive冗餘:

wKiom1Mxm1XxcbIVAAI-RkHOIo4853.jpg


           N to N 冗餘(多個節點多個服務):


wKioL1Mxm3mwEGo1AAMX7BDPgxo139.jpg



二.系統環境:Centos 6.4 x86_64

1.配置倆臺節點的主機名

[root@node1 ~]# hostname node1.luojianlong
[root@node1 ~]# sed -i 's@\(HOSTNAME=\).*@\1node1.luojianlong.com@g'  /etc/sysconfig/network
[root@node1 ~]# bash


2.設置倆臺節點ssh互信無密碼登錄


[root@node1 ~]# ssh-keygen -t rsa
[root@node1 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2
[root@node2 ~]# ssh-keygen -t rsa
[root@node2 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1


1.配置倆臺節點主機名以及hosts文件:

[root@node1 ~]# cat /etc/hosts
192.168.30.116 node1.luojianlong.com
192.168.30.117 node2.luojianlong.com


2.在節點一和節點二上分別安裝corosync,pacemaker

[root@node1 ~]# yum -y install corosync pacemaker


3.在節點一和節點二上分別關閉NetworkManager服務

[root@node1 ~]# service NetworkManager stop
[root@node1 ~]# chkconfig NetworkManager off


4.在節點一和節點二上分別安裝crmsh-1.2.6-4

[root@node1 ~]# yum -y --nogpgcheck localinstall crmsh*.rpm pssh*.rpm
[root@node2 ~]# yum -y --nogpgcheck localinstall crmsh*.rpm pssh*.rpm


5.編譯corosync配置文件:


totem {
        version: 2
        secauth: on
        threads: 0
        interface {
                ringnumber: 0
                bindnetaddr: 192.168.30.0
                mcastaddr: 226.94.1.1
                mcastport: 5405
                ttl: 1
        }
}
logging {
        fileline: off
        to_stderr: no
        to_logfile: yes
        to_syslog: no
        logfile: /var/log/cluster/corosync.log
        debug: off
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
        }
}
amf {
        mode: disabled
}
service {
  ver:  0
  name: pacemaker
  # use_mgmtd: yes
}
aisexec {
  user: root
  group:  root
}


version:表示版本信息;

secauth:表示集羣之間加密認證;

threads:表示開啓的線程數;

bindnetaddr:集羣所在的網絡地址;

mcastaddr:集羣發送信息的多播地址

service:表示以插件的方式運行pacemaker;


6.生成集羣認證密鑰

[root@node1 ~]# corosync-keygen
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/random.
Press keys on your keyboard to generate entropy.
Press keys on your keyboard to generate entropy (bits = 176).
Press keys on your keyboard to generate entropy (bits = 240).
Press keys on your keyboard to generate entropy (bits = 304).
Press keys on your keyboard to generate entropy (bits = 368).
Press keys on your keyboard to generate entropy (bits = 432).
Press keys on your keyboard to generate entropy (bits = 496).
Press keys on your keyboard to generate entropy (bits = 560).
Press keys on your keyboard to generate entropy (bits = 624).
Press keys on your keyboard to generate entropy (bits = 688).
Press keys on your keyboard to generate entropy (bits = 752).
Press keys on your keyboard to generate entropy (bits = 816).
Press keys on your keyboard to generate entropy (bits = 880).
Press keys on your keyboard to generate entropy (bits = 944).
Press keys on your keyboard to generate entropy (bits = 1008).
Writing corosync key to /etc/corosync/authkey.
[root@node1 corosync]# scp authkey corosync.conf [email protected]:/etc/corosync/


7.啓動corosync服務:

[root@node1 ~]# service corosync start
Starting Corosync Cluster Engine (corosync):               [  OK  ]


8.查看corosync日誌

  查看corosync引擎是否正常啓動:


[root@node1 ~]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log
Jan 10 10:49:13 corosync [MAIN  ] Corosync Cluster Engine ('1.4.1'): started and ready to provide service.
Jan 10 10:49:13 corosync [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.

  查看初始化成員節點通知是否正常發出:

[root@node1 ~]# grep  TOTEM  /var/log/cluster/corosync.log
Jan 10 10:49:13 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).
Jan 10 10:49:13 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jan 10 10:49:13 corosync [TOTEM ] The network interface [192.168.30.116] is now up.
Jan 10 10:49:14 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jan 10 10:51:11 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.


檢查啓動過程中是否有錯誤產生。下面的錯誤信息表示packmaker不久之後將不再作爲corosync的插   件   運行,因此,建議使用cman作爲集羣基礎架構服務;此處可安全忽略

[root@node1 ~]# grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources
Jan 10 10:49:14 corosync [pcmk  ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon.
Jan 10 10:49:14 corosync [pcmk  ] ERROR: process_ais_conf:  Please see Chapter 8 of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN


 查看pacemaker是否正常啓動:


[root@node1 ~]# grep pcmk_startup /var/log/cluster/corosync.log
Jan 10 10:49:14 corosync [pcmk  ] info: pcmk_startup: CRM: Initialized
Jan 10 10:49:14 corosync [pcmk  ] Logging: Initialized pcmk_startup
Jan 10 10:49:14 corosync [pcmk  ] info: pcmk_startup: Maximum core file size is: 18446744073709551615
Jan 10 10:49:14 corosync [pcmk  ] info: pcmk_startup: Service: 9
Jan 10 10:49:14 corosync [pcmk  ] info: pcmk_startup: Local hostname: node1.luojianlong.com


如果上面命令執行均沒有問題,接着可以執行如下命令啓動node2上的corosync


[root@node1 ~]# ssh node2 '/etc/init.d/corosync start'


注意:啓動node2需要在node1上使用如上命令進行,不要在node2節點上直接啓動。下面是node1上的相    關日誌

[root@node1 ~]# tail /var/log/cluster/corosync.log
Jan 10 10:51:14 [14875] node1.luojianlong.com       crmd:     info: te_rsc_command:     Action 3 confirmed - no wait
Jan 10 10:51:14 [14875] node1.luojianlong.com       crmd:   notice: run_graph:  Transition 1 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-1.bz2): Complete
Jan 10 10:51:14 [14875] node1.luojianlong.com       crmd:     info: do_log:     FSA: Input I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE
Jan 10 10:51:14 [14875] node1.luojianlong.com       crmd:   notice: do_state_transition:    State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan 10 10:51:14 [14869] node1.luojianlong.com        cib:     info: cib_process_request:    Completed cib_query operation for section //cib/status//node_state[@id='node1.luojianlong.com']//transient_attributes//nvpair[@name='probe_complete']: OK (rc=0, origin=local/attrd/7, version=0.5.6)
Jan 10 10:51:14 [14869] node1.luojianlong.com        cib:     info: cib_process_request:    Completed cib_modify operation for section status: OK (rc=0, origin=local/attrd/8, version=0.5.6)
Jan 10 10:51:16 [14869] node1.luojianlong.com        cib:     info: cib_process_request:    Completed cib_modify operation for section status: OK (rc=0, origin=node2.luojianlong.com/attrd/6, version=0.5.7)
Jan 10 10:51:31 [14869] node1.luojianlong.com        cib:     info: crm_client_new:     Connecting 0x913f10 for uid=0 gid=0 pid=9832 id=41a762cd-b87c-4f39-8482-a64c2c61209c
Jan 10 10:51:31 [14869] node1.luojianlong.com        cib:     info: cib_process_request:    Completed cib_query operation for section 'all': OK (rc=0, origin=local/crm_mon/2, version=0.5.7)
Jan 10 10:51:31 [14869] node1.luojianlong.com        cib:     info: crm_client_destroy:     Destroying 0 events



9.如果安裝了crmsh,可使用如下命令查看集羣節點的啓動狀態:


[root@node1 ~]# crm status
Last updated: Fri Jan 10 11:07:14 2014
Last change: Fri Jan 10 10:51:11 2014 via crmd on node1.luojianlong.com
Stack: classic openais (with plugin)
Current DC: node1.luojianlong.com - partition with quorum
Version: 1.1.10-14.el6_5.1-368c726
2 Nodes configured, 2 expected votes
0 Resources configured
Online: [ node1.luojianlong.com node2.luojianlong.com ]
#從上面的信息可以看出兩個節點都已經正常啓動,並且集羣已經處於正常工作狀態。


10.執行ps auxf命令可以查看corosync啓動的各相關進程。

[root@node1 ~]# ps auxf
189      14869  0.4  0.2  93928 10624 ?        S    09:20   0:05  \_ /usr/libexec/pacemaker/cib
root     14871  0.0  0.1  94280  4036 ?        S    09:20   0:01  \_ /usr/libexec/pacemaker/stonithd
root     14872  0.0  0.0  75996  3216 ?        S    09:20   0:00  \_ /usr/libexec/pacemaker/lrmd
189      14873  0.0  0.0  89536  3444 ?        S    09:20   0:00  \_ /usr/libexec/pacemaker/attrd
189      14874  0.0  0.4 117180 18920 ?        S    09:20   0:00  \_ /usr/libexec/pacemaker/pengine
189      14875  0.0  0.1 147684  6624 ?        S    09:20   0:00  \_ /usr/libexec/pacemaker/crmd


11.配置集羣的工作屬性,禁用stonith

#corosync默認啓用了stonith,而當前集羣並沒有相應的stonith設備,因此此默認配置目前尚不可用,這可以通過如下命令驗正:
[root@node1 ~]# crm_verify -L -V
   error: unpack_resources:     Resource start-up disabled since no STONITH resources have been defined
   error: unpack_resources:     Either configure some or disable STONITH with the stonith-enabled option
   error: unpack_resources:     NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
#我們裏可以通過如下命令先禁用stonith:
[root@node1 ~]# crm configure property stonith-enabled=false
#使用如下命令查看當前的配置信息:
[root@node1 ~]# crm configure show
node node1.luojianlong.com
node node2.luojianlong.com
property $id="cib-bootstrap-options" \
    dc-version="1.1.10-14.el6_5.1-368c726" \
    cluster-infrastructure="classic openais (with plugin)" \
    expected-quorum-votes="2" \
    stonith-enabled="false"
#從中可以看出stonith已經被禁用。
#上面的crm,crm_verify命令是1.0後的版本的pacemaker提供的基於命令行的集羣管理工具;可以在集羣中的任何一個節點上執行。


12.查看支持的集羣資源

#corosync支持heartbeat,LSB和ocf等類型的資源代理,目前較爲常用的類型爲LSB和OCF兩類,stonith類專爲配置stonith設備而用;
#可以通過如下命令查看當前集羣系統所支持的類型:
[root@node1 ~]# crm ra classes
lsb
ocf / heartbeat pacemaker
service
stonith
#如果想要查看某種類別下的所用資源代理的列表,可以使用類似如下命令實現:
[root@node1 ~]# crm ra list lsb
[root@node1 ~]# crm ra list ocf heartbeat
[root@node1 ~]# crm ra list ocf pacemaker
[root@node1 ~]# crm ra list stonith
# crm ra info [class:[provider:]]resource_agent
#例如:
[root@node1 ~]# crm ra info ocf:heartbeat:IPaddr


13.接下來要創建的web集羣創建一個IP地址資源,以在通過集羣提供web服務時使用;這可以通過如下方式實現:

語法:

primitive <rsc> [<class>:[<provider>:]]<type>

         [params attr_list]

         [operations id_spec]

           [op op_type [<attribute>=<value>...] ...]


op_type :: start | stop | monitor


例子:

primitive apcfence stonith:apcsmart \

         params ttydev=/dev/ttyS0 hostlist="node1 node2" \

         op start timeout=60s \

         op monitor interval=30m timeout=60s


[root@node1 ~]# crm configure primitive WebIP ocf:heartbeat:IPaddr params ip=192.168.30.230


14.通過如下的命令執行結果可以看出此資源已經在node1.luojianlong.com上啓動:


[root@node1 ~]# crm status
Last updated: Tue Mar 25 11:10:40 2014
Last change: Tue Mar 25 11:10:00 2014 via cibadmin on node1.luojianlong.com
Stack: classic openais (with plugin)
Current DC: node1.luojianlong.com - partition with quorum
Version: 1.1.10-14.el6_5.1-368c726
2 Nodes configured, 2 expected votes
1 Resources configured
Online: [ node1.luojianlong.com node2.luojianlong.com ]
 WebIP  (ocf::heartbeat:IPaddr):    Started node1.luojianlong.com
[root@node1 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:f3:fc:ba brd ff:ff:ff:ff:ff:ff
    inet 192.168.30.116/24 brd 192.168.30.255 scope global eth0
    inet 192.168.30.230/24 brd 192.168.30.255 scope global secondary eth0
    inet6 fe80::20c:29ff:fef3:fcba/64 scope link
       valid_lft forever preferred_lft forever


15.而後我們到node2上通過如下命令停止node1上的corosync服務,並查看集羣狀態


[root@node2 ~]# ssh node1 '/etc/init.d/corosync stop'
Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]
Waiting for corosync services to unload:.[  OK  ]
[root@node2 ~]# crm status
Last updated: Tue Mar 25 11:13:57 2014
Last change: Tue Mar 25 11:13:22 2014 via crmd on node2.luojianlong.com
Stack: classic openais (with plugin)
Current DC: node2.luojianlong.com - partition WITHOUT quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
1 Resources configured
Online: [ node2.luojianlong.com ]
OFFLINE: [ node1.luojianlong.com ]


16.上面的信息顯示node1.luojianlong.com已經離線,但資源WebIP卻沒能在node2.luojianlong.com上啓動。這是因爲此時的集羣狀態爲"WITHOUT quorum",即已經失去了quorum,此時集羣服務本身已經不滿足正常運行的條件,這對於只有兩節點的集羣來講是不合理的。因此,我們可以通過如下的命令來修改忽略quorum不能滿足的集羣狀態檢查:


[root@node2 ~]# crm configure property no-quorum-policy=ignore


17.片刻之後,集羣就會在目前仍在運行中的節點node2上啓動此資源了,如下所示:


[root@node2 ~]# crm status
Last updated: Tue Mar 25 11:16:21 2014
Last change: Tue Mar 25 11:15:47 2014 via cibadmin on node2.luojianlong.com
Stack: classic openais (with plugin)
Current DC: node2.luojianlong.com - partition WITHOUT quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
1 Resources configured
Online: [ node2.luojianlong.com ]
OFFLINE: [ node1.luojianlong.com ]
 WebIP  (ocf::heartbeat:IPaddr):    Started node2.luojianlong.com


18.好了,驗正完成後,我們正常啓動node1.luojianlong.com


[root@node2 ~]# ssh node1 '/etc/init.d/corosync start'
Starting Corosync Cluster Engine (corosync): [  OK  ]
[root@node1 ~]# ssh node2 '/etc/init.d/corosync stop'
Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]
Waiting for corosync services to unload:.[  OK  ]
[root@node1 ~]# crm status
Last updated: Tue Mar 25 11:24:10 2014
Last change: Tue Mar 25 11:22:42 2014 via crmd on node1.luojianlong.com
Stack: classic openais (with plugin)
Current DC: node1.luojianlong.com - partition WITHOUT quorum
Version: 1.1.10-14.el6_5.1-368c726
2 Nodes configured, 2 expected votes
1 Resources configured
Online: [ node1.luojianlong.com ]
OFFLINE: [ node2.luojianlong.com ]
 WebIP  (ocf::heartbeat:IPaddr):    Started node1.luojianlong.com


正常啓動node1.luojianlong.com後,集羣資源WebIP很可能會重新從node2.luojianlong.com轉移回node1.luojianlong.com。資源的這種在節點間每一次的來回流動都會造成那段時間內其無法正常被訪問,所以,我們有時候需要在資源因爲節點故障轉移到其它節點後,即便原來的節點恢復正常也禁止資源再次流轉回來。這可以通過定義資源的黏性(stickiness)來實現。在創建資源時或在創建資源後,都可以指定指定資源黏性。

資源黏性值範圍及其作用:

0:這是默認選項。資源放置在系統中的最適合位置。這意味着當負載能力“較好”或較差的節點變得可用時才轉移資源。此選項的作用基本等同於自動故障回覆,只是資源可能會轉移到非之前活動的節點上;

大於0:資源更願意留在當前位置,但是如果有更合適的節點可用時會移動。值越高表示資源越願意留在當前位置;

小於0:資源更願意移離當前位置。絕對值越高表示資源越願意離開當前位置;

INFINITY:如果不是因節點不適合運行資源(節點關機、節點待機、達到migration-threshold 或配置更改)而強制資源轉移,資源總是留在當前位置。此選項的作用幾乎等同於完全禁用自動故障回覆;

-INFINITY:資源總是移離當前位置;


19.我們這裏可以通過以下方式爲資源指定默認黏性值:


[root@node1 ~]# crm configure rsc_defaults resource-stickiness=100



20.結合上面已經配置好的IP地址資源,將此集羣配置成爲一個active/passive模型的web(httpd)服務集羣,爲了將此集羣啓用爲web(httpd)服務器集羣,我們得先在各節點上安裝httpd,並配置其能在本地各自提供一個測試頁面


# node1
[root@node1 ~]# yum -y install httpd
[root@node1 ~]# echo "node1.luojianlong.com" > /var/www/html/index.html
# node2
[root@node2 ~]# yum -y install httpd
[root@node2 ~]# echo "node2.luojianlong.com" > /var/www/html/index.html


21.而後在各節點手動啓動httpd服務,並確認其可以正常提供服務。接着使用下面的命令停止httpd服務,並確保其不會自動啓動(在兩個節點各執行一遍)


[root@node1 ~]# /etc/init.d/httpd stop
Stopping httpd:                                            [FAILED]
[root@node1 ~]# chkconfig httpd off
[root@node2 ~]# /etc/init.d/httpd stop
Stopping httpd:                                            [FAILED]
[root@node2 ~]# chkconfig httpd off


22.接下來我們將此httpd服務添加爲集羣資源。將httpd添加爲集羣資源有兩處資源代理可用:lsb和ocf:heartbeat,爲了簡單起見,我們這裏使用lsb類型,首先可以使用如下命令查看lsb類型的httpd資源的語法格式


[root@node1 ~]# crm ra info lsb:httpd


23.接下來新建資源WebSite


[root@node1 ~]# crm configure primitive WebServer lsb:httpd


24.查看配置文件中生成的定義


[root@node1 ~]# crm configure show
node node1.luojianlong.com
node node2.luojianlong.com
primitive WebIP ocf:heartbeat:IPaddr \
    params ip="192.168.30.230"
primitive WebServer lsb:httpd
property $id="cib-bootstrap-options" \
    dc-version="1.1.10-14.el6_5.1-368c726" \
    cluster-infrastructure="classic openais (with plugin)" \
    expected-quorum-votes="2" \
    stonith-enabled="false" \
    no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
    resource-stickiness="100"


25.查看資源的啓用狀態


[root@node1 ~]# crm status
Last updated: Tue Mar 25 11:36:53 2014
Last change: Tue Mar 25 11:35:37 2014 via cibadmin on node1.luojianlong.com
Stack: classic openais (with plugin)
Current DC: node1.luojianlong.com - partition with quorum
Version: 1.1.10-14.el6_5.1-368c726
2 Nodes configured, 2 expected votes
2 Resources configured
Online: [ node1.luojianlong.com node2.luojianlong.com ]
 WebIP  (ocf::heartbeat:IPaddr):    Started node1.luojianlong.com
 WebServer  (lsb:httpd):    Started node2.luojianlong.com


從上面的信息中可以看出WebIP和WebServer有可能會分別運行於兩個節點上,這對於通過此IP提供Web服務的應用來說是不成立的,即此兩者資源必須同時運行在某節點上。

由此可見,即便集羣擁有所有必需資源,但它可能還無法進行正確處理。資源約束則用以指定在哪些羣集節點上運行資源,以何種順序裝載資源,以及特定資源依賴於哪些其它資源。pacemaker共給我們提供了三種資源約束方法:

1)Resource Location(資源位置):定義資源可以、不可以或儘可能在哪些節點上運行;

2)Resource Collocation(資源排列):排列約束用以定義集羣資源可以或不可以在某個節點上同時運行;

3)Resource Order(資源順序):順序約束定義集羣資源在節點上啓動的順序;

定義約束時,還需要指定分數。各種分數是集羣工作方式的重要組成部分。其實,從遷移資源到決定在已降級集羣中停止哪些資源的整個過程是通過以某種方式修改分數來實現的。分數按每個資源來計算,資源分數爲負的任何節點都無法運行該資源。在計算出資源分數後,集羣選擇分數最高的節點。INFINITY(無窮大)目前定義爲 1,000,000。加減無窮大遵循以下3個基本規則:

1)任何值 + 無窮大 = 無窮大

2)任何值 - 無窮大 = -無窮大

3)無窮大 - 無窮大 = -無窮大

#定義資源約束時,也可以指定每個約束的分數。分數表示指派給此資源約束的值。分數較高的約束先應用,分數較低的約束後應用。通過使用不同的分數爲既定資源創建更多位置約束,可以指定資源要故障轉移至的目標節點的順序。


26.對於前述的WebIP和WebServer可能會運行於不同節點的問題,可以通過以下命令來解決


[root@node1 ~]# crm configure colocation webserver-with-webip INFINITY: WebServer WebIP
[root@node1 ~]# crm status
Last updated: Tue Mar 25 11:39:58 2014
Last change: Tue Mar 25 11:39:31 2014 via cibadmin on node1.luojianlong.com
Stack: classic openais (with plugin)
Current DC: node1.luojianlong.com - partition with quorum
Version: 1.1.10-14.el6_5.1-368c726
2 Nodes configured, 2 expected votes
2 Resources configured
Online: [ node1.luojianlong.com node2.luojianlong.com ]
 WebIP  (ocf::heartbeat:IPaddr):    Started node1.luojianlong.com
 WebServer  (lsb:httpd):    Started node1.luojianlong.com


27.接着,我們還得確保WebSite在某節點啓動之前得先啓動WebIP,這可以使用如下命令實現


[root@node1 ~]# crm configure order web-server-after-webip mandatory: WebIP WebServer




28.接下來驗證高可用的效果:


#現在所有資源都在node1上面:
[root@node1 ~]# crm status
Last updated: Tue Mar 25 11:46:41 2014
Last change: Tue Mar 25 11:41:18 2014 via cibadmin on node1.luojianlong.com
Stack: classic openais (with plugin)
Current DC: node1.luojianlong.com - partition WITHOUT quorum
Version: 1.1.10-14.el6_5.1-368c726
2 Nodes configured, 2 expected votes
2 Resources configured
Online: [ node1.luojianlong.com ]
OFFLINE: [ node2.luojianlong.com ]
 WebIP  (ocf::heartbeat:IPaddr):    Started node1.luojianlong.com
 WebServer  (lsb:httpd):    Started node1.luojianlong.com
#在node2上面停止node1的corosync
[root@node2 ~]# ssh node1 '/etc/init.d/corosync stop'
Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]
Waiting for corosync services to unload:..[  OK  ]


訪問http://192.168.30.230/index.html


wKioL1Mw_XLAg3vzAADSagTe9Zc155.jpg


29.此外,由於HA集羣本身並不強制每個節點的性能相同或相近,所以,某些時候我們可能希望在正常時服務總能在某個性能較強的節點上運行,這可以通過位置約束來實現:


[root@node1 ~]# crm configure location prefer-node1 WebServer rule 200: node1.luojianlong.com


這條命令實現了將WebSite約束在node1上,且指定其分數爲200。


部署過程中常見的問題:

1.啓動corosync服務後,發現日誌不斷的報下面這個錯誤,後來發現是selinux沒有關閉的問題。

[root@node1 ~]# /etc/init.d/corosync start
[root@node1 ~]# tail -f /var/log/cluster/corosync.log
Mar 26 11:38:43 [7875] node1.luojianlong.com       lrmd:     info: crm_client_new:  Connecting 0x1bce860 for uid=189 gid=0 pid=8001 id=630eb39c-1acd-4cad-8930-700c71680f91
Mar 26 11:38:43 [7875] node1.luojianlong.com       lrmd:    error: qb_ipcs_shm_rb_open:     qb_rb_chmod:lrmd-request-7875-8001-153: Operation not permitted (1)
Mar 26 11:38:43 [7875] node1.luojianlong.com       lrmd:    error: qb_ipcs_shm_connect:     shm connection FAILED: Operation not permitted (1)
Mar 26 11:38:43 [7875] node1.luojianlong.com       lrmd:    error: handle_new_connection:   Error in connection setup (7875-8001-153): Operation not permitted (1)
Mar 26 11:38:43 [8001] node1.luojianlong.com       crmd:     info: crm_ipc_connect:     Could not establish lrmd connection: Operation not permitted (1)
Mar 26 11:38:43 [8001] node1.luojianlong.com       crmd:  warning: do_lrm_control:  Failed to sign on to the LRM 12 (30 max) times
Mar 26 11:38:43 [7960] node1.luojianlong.com        cib:    error: plugin_dispatch:     Receiving message body failed: (2) Library error: Success (0)
Mar 26 11:38:43 [7960] node1.luojianlong.com        cib:    error: cib_cs_destroy:  Corosync connection lost!  Exiting.
Mar 26 11:38:43 [7960] node1.luojianlong.com        cib:     info: terminate_cib:   cib_cs_destroy: Exiting fast...
Mar 26 11:38:43 [7960] node1.luojianlong.com        cib:     info: qb_ipcs_us_withdraw:     withdrawing server sockets


解決辦法:


[root@node1 ~]# setenforce 0
[root@node1 ~]# vi /etc/selinux/config
# 修改SELINUX=disabled
# 查看selinux狀態
[root@node1 ~]# getenforce


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章