五、測試

5.1 備節點失效

在node2上殺死postgres數據庫進程，模擬備節點上數據庫崩潰：

[root@node2 ~]# killall -9 postgres

查看此時集羣狀態：

[root@node1 ~]# crm_mon -Afr -1
Last updated: Wed Jan 22 02:15:06 2014
Last change: Wed Jan 22 02:15:33 2014 via crm_attribute on node1
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
7 Resources configured
 
 
Online: [ node1 node2 ]
 
Full list of resources:
 
 vip-slave (ocf::heartbeat:IPaddr2): Started node1 
 Resource Group: master-group
     vip-master (ocf::heartbeat:IPaddr2): Started node1 
     vip-rep (ocf::heartbeat:IPaddr2): Started node1 
 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ node1 ]
     Stopped: [ node2 ]
 Clone Set: clnPingCheck [pingCheck]
     Started: [ node1 node2 ]
 
Node Attributes:
* Node node1:
    + default_ping_set                 : 100       
    + master-pgsql                     : 1000      
    + pgsql-data-status                : LATEST    
    + pgsql-master-baseline            : 0000000006000078
    + pgsql-status                     : PRI       
* Node node2:
    + default_ping_set                 : 100       
    + master-pgsql                     : -INFINITY 
    + pgsql-data-status                : DISCONNECT
    + pgsql-status                     : STOP      
 
Migration summary:
* Node node2: 
   pgsql: migration-threshold=1 fail-count=1 last-failure='Wed Jan 22 02:15:35 2014'
* Node node1: 
 
Failed actions:
    pgsql_monitor_7000 on node2 'not running' (7): call=42, status=complete, last-rc-change='Wed Jan 22 02:14:58 2014', queued=0ms, exec=0ms

{vip-slave資源已成功切換到了node1上}

重啓node2上的corosync，數據庫將重新伴隨啓動：

[root@node2 ~]# service corosync restart
[root@node1 ~]# crm_mon -Afr -1
Last updated: Wed Jan 22 02:16:24 2014
Last change: Wed Jan 22 02:16:55 2014 via crm_attribute on node1
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
7 Resources configured
 
 
Online: [ node1 node2 ]
 
Full list of resources:
 
 vip-slave (ocf::heartbeat:IPaddr2): Started node2 
 Resource Group: master-group
     vip-master (ocf::heartbeat:IPaddr2): Started node1 
     vip-rep (ocf::heartbeat:IPaddr2): Started node1 
 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ node1 ]
     Slaves: [ node2 ]
 Clone Set: clnPingCheck [pingCheck]
     Started: [ node1 node2 ]
 
Node Attributes:
* Node node1:
    + default_ping_set                 : 100       
    + master-pgsql                     : 1000      
    + pgsql-data-status                : LATEST    
    + pgsql-master-baseline            : 0000000006000078
    + pgsql-status                     : PRI       
* Node node2:
    + default_ping_set                 : 100       
    + master-pgsql                     : 100       
    + pgsql-data-status                : STREAMING|SYNC
    + pgsql-status                     : HS:sync   
 
Migration summary:
* Node node2: 
* Node node1:

{vip-slave又重新回到了nod2上}

5.2 主節點失效切換

在node1上殺死postgres數據庫進程，模擬主節點上數據庫崩潰：

[root@node1 ~]# killall -9 postgres

等會查看集羣狀態：

[root@node2 ~]# crm_mon -Afr -1
Last updated: Wed Jan 22 02:17:50 2014
Last change: Wed Jan 22 02:18:16 2014 via crm_attribute on node2
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
7 Resources configured
 
 
Online: [ node1 node2 ]
 
Full list of resources:
 
 vip-slave (ocf::heartbeat:IPaddr2): Started node2 
 Resource Group: master-group
     vip-master (ocf::heartbeat:IPaddr2): Started node2 
     vip-rep (ocf::heartbeat:IPaddr2): Started node2 
 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ node2 ]
     Stopped: [ node1 ]
 Clone Set: clnPingCheck [pingCheck]
     Started: [ node1 node2 ]
 
Node Attributes:
* Node node1:
    + default_ping_set                 : 100       
    + master-pgsql                     : -INFINITY 
    + pgsql-data-status                : DISCONNECT
    + pgsql-status                     : STOP      
* Node node2:
    + default_ping_set                 : 100       
    + master-pgsql                     : 1000      
    + pgsql-data-status                : LATEST    
    + pgsql-master-baseline            : 0000000008014A70
    + pgsql-status                     : PRI       
 
Migration summary:
* Node node2: 
* Node node1: 
   pgsql: migration-threshold=1 fail-count=1 last-failure='Wed Jan 22 02:18:11 2014'
 
Failed actions:
    pgsql_monitor_2000 on node1 'not running' (7): call=2435, status=complete, last-rc-change='Wed Jan 22 02:18:11 2014', queued=0ms, exec=0ms

{vip-master/vip-rep都已成功切換到node2上，且node2已變爲master，node2上pg數據庫狀態已切換爲PRI}

停止node1上的corosync：

[root@node1 ~]# service corosync stop

執行一次基礎同步：

[postgres@node1 data]$ pwd
/opt/pgsql/data
[postgres@node1 data]$ rm -rf *
[postgres@node1 data]$ pg_basebackup -h 192.168.1.3 -U postgres -D /opt/pgsql/data/ -P
19172/19172 kB (100%), 1/1 tablespace
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived
[postgres@node1 data]$ ls
backup_label      base    pg_clog      pg_ident.conf  pg_notify  pg_stat_tmp  pg_tblspc    PG_VERSION  postgresql.conf
backup_label.old  global  pg_hba.conf  pg_multixact   pg_serial  pg_subtrans  pg_twophase  pg_xlog     recovery.done

啓動node1上的corosync：

[root@node1 ~]# service corosync start

5.3 主節點恢復

修復原主節點後將其恢復爲當前備節點

在node1上執行一次基礎同步：

[postgres@node1 data]$ pwd
/opt/pgsql/data
[postgres@node1 data]$ rm -rf *
[postgres@node1 data]$ pg_basebackup -h 192.168.2.3 -U postgres -D /opt/pgsql/data/ -P
19172/19172 kB (100%), 1/1 tablespace
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived
[postgres@node1 data]$ ls
backup_label      base    pg_clog      pg_ident.conf  pg_notify  pg_stat_tmp  pg_tblspc    PG_VERSION  postgresql.conf
backup_label.old  global  pg_hba.conf  pg_multixact   pg_serial  pg_subtrans  pg_twophase  pg_xlog     recovery.done

啓動heartbeat之前必須刪除資鎖，不然資源將不會伴隨heartbeat啓動：

[root@node1 ~]# rm -rf /var/lib/pgsql/tmp/PGSQL.lock

{該鎖文件在當節點爲主節點時創建，但不會因爲heartbeat的異常停止或數據庫/系統的異常終止而自動刪除，所以在恢復一個節點的時候只要該節點充當過主節點就需要手動清理該鎖文件}

重啓node1上的heartbeat：

[root@node1 ~]# service heartbeat restart

過段時間後查看集羣狀態：

[root@node2 ~]# crm_mon -Afr1
============
Last updated: Mon Jan 27 08:50:43 2014
Stack: Heartbeat
Current DC: node2 (f2dcd1df-7429-42f5-82e9-b73921f97cab) - partition with quorum
Version: 1.0.12-unknown
2 Nodes configured, unknown expected votes
4 Resources configured.
============
 
Online: [ node1 node2 ]
 
Full list of resources:
 
 vip-slave (ocf::heartbeat:IPaddr2): Started node1
 Resource Group: master-group
     vip-master (ocf::heartbeat:IPaddr2): Started node2
     vip-rep (ocf::heartbeat:IPaddr2): Started node2
 Master/Slave Set: msPostgresql
     Masters: [ node2 ]
     Slaves: [ node1 ]
 Clone Set: clnPingCheck
     Started: [ node1 node2 ]
 
Node Attributes:
* Node node1:
    + default_ping_set                 : 100       
    + master-pgsql:0                   : 100       
    + pgsql-data-status                : STREAMING|SYNC
    + pgsql-status                     : HS:sync   
* Node node2:
    + default_ping_set                 : 100       
    + master-pgsql:1                   : 1000      
    + pgsql-data-status                : LATEST    
    + pgsql-master-baseline            : 00000000120000B0
    + pgsql-status                     : PRI       
 
Migration summary:
* Node node1: 
* Node node2:

{vip-slave已成功切到node1上，node1成功成爲流複製備節點}

六、管理

6.1 啓動關閉corosync

[root@node1 ~]# service corosync start
[root@node1 ~]# service corosync stop

6.2 查看HA狀態

[root@node1 ~]# crm status
Last updated: Tue Jan 21 23:55:13 2014
Last change: Tue Jan 21 23:37:36 2014 via crm_attribute on node1
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
7 Resources configured
 
 
Online: [ node1 node2 ]
 
 vip-slave (ocf::heartbeat:IPaddr2): Started node2 
 Resource Group: master-group
     vip-master (ocf::heartbeat:IPaddr2): Started node1 
     vip-rep (ocf::heartbeat:IPaddr2): Started node1 
 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ node1 ]
     Slaves: [ node2 ]
 Clone Set: clnPingCheck [pingCheck]
     Started: [ node1 node2 ]

6.3 查看資源狀態及節點屬性

[root@node1 ~]# crm_mon -Afr -1
Last updated: Tue Jan 21 23:37:20 2014
Last change: Tue Jan 21 23:37:36 2014 via crm_attribute on node1
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
7 Resources configured
 
 
Online: [ node1 node2 ]
 
Full list of resources:
 
 vip-slave (ocf::heartbeat:IPaddr2): Started node2 
 Resource Group: master-group
     vip-master (ocf::heartbeat:IPaddr2): Started node1 
     vip-rep (ocf::heartbeat:IPaddr2): Started node1 
 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ node1 ]
     Slaves: [ node2 ]
 Clone Set: clnPingCheck [pingCheck]
     Started: [ node1 node2 ]
 
Node Attributes:
* Node node1:
    + default_ping_set                 : 100       
    + master-pgsql                     : 1000      
    + pgsql-data-status                : LATEST    
    + pgsql-master-baseline            : 0000000006000078
    + pgsql-status                     : PRI       
* Node node2:
    + default_ping_set                 : 100       
    + master-pgsql                     : 100       
    + pgsql-data-status                : STREAMING|SYNC
    + pgsql-status                     : HS:sync   
 
Migration summary:
* Node node2: 
* Node node1:

6.4 查看配置

[root@node1 ~]# crm configure show
node node1 \
        attributes pgsql-data-status="LATEST"
node node2 \
        attributes pgsql-data-status="STREAMING|SYNC"
primitive pgsql ocf:heartbeat:pgsql \
        params pgctl="/opt/pgsql/bin/pg_ctl" psql="/opt/pgsql/bin/psql" pgdata="/opt/pgsql/data/" start_opt="-p 5432" rep_mode="sync" node_list="node1 node2" restore_command="cp /opt/archivelog/%f %p" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" master_ip="192.168.1.3" stop_escalate="0" \
        op start timeout="60s" interval="0s" on-fail="restart" \
        op monitor timeout="60s" interval="7s" on-fail="restart" \
        op monitor timeout="60s" interval="2s" on-fail="restart" role="Master" \
        op promote timeout="60s" interval="0s" on-fail="restart" \
        op demote timeout="60s" interval="0s" on-fail="stop" \
……
……

6.5 實時監控HA

[root@node1 ~]# crm_mon -Afr
Last updated: Wed Jan 22 00:40:12 2014
Last change: Tue Jan 21 23:37:36 2014 via crm_attribute on node1
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
7 Resources configured
 
 
Online: [ node1 node2 ]
 
Full list of resources:
 
vip-slave (ocf::heartbeat:IPaddr2): Started node2
 Resource Group: master-group
     vip-master (ocf::heartbeat:IPaddr2): Started node1
     vip-rep    (ocf::heartbeat:IPaddr2): Started node1
 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ node1 ]
     Slaves: [ node2 ]
 Clone Set: clnPingCheck [pingCheck]
     Started: [ node1 node2 ]
 
Node Attributes:
* Node node1:
    + default_ping_set                  : 100
    + master-pgsql                      : 1000
    + pgsql-data-status                 : LATEST    
    + pgsql-master-baseline             : 0000000006000078
    + pgsql-status                      : PRI
* Node node2:
    + default_ping_set                  : 100
    + master-pgsql                      : 100
    + pgsql-data-status                 : STREAMING|SYNC
    + pgsql-status                      : HS:sync   
 
Migration summary:* Node node2: * Node node1:

6.6 crm_resource命令

資源啓動/關閉：

[root@node1 ~]# crm_resource -r vip-master -v started
[root@node1 ~]# crm_resource -r vip-master -v stoped

列舉資源：

[root@node1 ~]# crm_resource -L
 vip-slave (ocf::heartbeat:IPaddr2): Started 
 Resource Group: master-group
     vip-master (ocf::heartbeat:IPaddr2): Started 
     vip-rep (ocf::heartbeat:IPaddr2): Started 
 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ node1 ]
     Slaves: [ node2 ]
 Clone Set: clnPingCheck [pingCheck]
     Started: [ node1 node2 ]

查看資源位置：

[root@node1 ~]# crm_resource -W -r pgsql
resource pgsql is running on: node2

遷移資源：

[root@node1 ~]# crm_resource -M -r vip-slave -N node2

刪除資源：

[root@node1 ~]# crm_resource -D -r vip-slave -t primitive

6.7 crm命令

列舉指定的RA：

[root@node1 ~]# crm ra list ocf pacemaker
ClusterMon     Dummy          HealthCPU      HealthSMART    Stateful       SysInfo        SystemHealth   controld       ping           pingd
remote

刪除節點：

[root@node1 ~]# crm node delete node2

停用節點：

[root@node1 ~]# crm node standby node2

啓用節點：

[root@node1 ~]# crm node online node2

配置pacemaker：

[root@node1 ~]# crm configure
crm(live)configure#
……
……
crm(live)configure# commit
crm(live)configure# quit

6.8 重置failcount

[root@node1 ~]# crm resource
crm(live)resource# failcount pgsql set node1 0
crm(live)resource# failcount pgsql show node1
scope=status  name=fail-count-pgsql value=0
 
 
[root@node1 ~]# crm resource cleanup pgsql
Cleaning up pgsql:0 on node1
Waiting for 1 replies from the CRMd. OK
 
 
[root@node1 ~]# crm_failcount -G -U node1 -r pgsql
scope=status  name=fail-count-pgsql value=INFINITY
[root@node1 ~]# crm_failcount -D -U node1 -r pgsql

七、問題記錄

7.1 Q1

問題現象：

corosync.log日誌中報錯：

Jan 15 10:23:57 node1 lrmd: [6327]: info: RA output: (pgsql:0:monitor:stderr) /usr/lib/ocf/resource.d//heartbeat/pgsql: line 1749: ocf_local_nodename: command not found

Jan 15 10:23:57 node1 crm_attribute: [11094]: info: Invoked: /usr/sbin/crm_attribute -l reboot -N -n -v 0000000006000090 pgsql-xlog-loc lrm_get_rsc_type_metadata(578)

Jan 15 10:23:57 node1 lrmd: [6327]: info: RA output: (pgsql:0:monitor:stderr) Could not map uname=-n to a UUID: The object/attribute does not exist

解決方式：

查看pgsql腳本，發現其中使用了ocf_local_nodename，該函數本該在ocf-shellfuncs.in中有定義，但卻沒有這個函數，上網查看相關論壇

http://www.gossamer-threads.com/lists/linuxha/users/89379?do=post_view_threaded

指出此時需要相關補丁，解決ocf_local_nodename函數的補丁：

https://github.com/ClusterLabs/resource-agents/commit/abc1c3f6464f6e5e7a1e41cd7c9b8179896c1903

最新的版本沒有ocf_local_nodename函數，所以使用以下版本：

{注：確保pacemaker版本>1.1.8，不然crm_node -n命令無法使用}

https://github.com/ClusterLabs/resource-agents/blob/a6f4ddf76cb4bbc1b3df4c9b6632a6351b63c19e/heartbeat/pgsql

https://github.com/ClusterLabs/resource-agents/tree/abc1c3f6464f6e5e7a1e41cd7c9b8179896c1903/heartbeat

不含有ocf_local_nodename函數的pgsql腳本：

https://raw.github.com/ClusterLabs/resource-agents/a6f4ddf76cb4bbc1b3df4c9b6632a6351b63c19e/heartbeat/pgsql

7.2 Q2

問題現象：

[root@node1 ~]# crm configure load update pgsql.crm 
WARNING: pingCheck: specified timeout 60s for start is smaller than the advised 90
WARNING: pingCheck: specified timeout 60s for stop is smaller than the advised 100
WARNING: pgsql: specified timeout 60s for stop is smaller than the advised 120
WARNING: pgsql: specified timeout 60s for start is smaller than the advised 120
WARNING: pgsql: specified timeout 60s for notify is smaller than the advised 90
WARNING: pgsql: specified timeout 60s for demote is smaller than the advised 120
WARNING: pgsql: specified timeout 60s for promote is smaller than the advised 120
ERROR: master-group: attribute ordered does not exist
Do you still want to commit? no

解決方式：

錯誤提示：在定義的master-group中ordered屬性不存在

（1）該問題是pacemaker版本所致，在pacemaker-1.1版本中不支持ordered,colocated屬性，通過以下方法以1.0版本的cibconfig.py替換當前新版本試圖解決此問題，結果失敗：

[root@node1 ~]# vim /usr/lib64/python2.6/site-packages/crmsh/cibconfig.py
[root@node1 ~]# cd /usr/lib64/python2.6/site-packages/crmsh/
[root@node1 crmsh]# mv cibconfig.py cibconfig.py.bak
[root@node1 crmsh]# wget https://github.com/ClusterLabs/pacemaker-1.0/blob/fa1a99ab36e0ed015f1bcbbb28f7db962a9d1abc/shell/modules/cibconfig.py

（2）從配置腳本中去除關於ordered的定義（成功）：

group master-group \

vip-master \

vip-rep \

meta \

ordered="false"

改爲：

group master-group \

vip-master \

vip-rep

7.3 Q3

問題現象：

安裝pacemaker時報錯：

# yum install pacemaker*
……
--> Processing Dependency: libesmtp.so.5()(64bit) for package: pacemaker
--> Finished Dependency Resolution
pacemaker-1.0.12-1.el5.centos.i386 from clusterlabs has depsolving problems
  --> Missing Dependency: libesmtp.so.5 is needed by package pacemaker-1.0.12-1.el5.centos.i386 (clusterlabs)
pacemaker-1.0.12-1.el5.centos.x86_64 from clusterlabs has depsolving problems
  --> Missing Dependency: libesmtp.so.5()(64bit) is needed by package pacemaker-1.0.12-1.el5.centos.x86_64 (clusterlabs)
Error: Missing Dependency: libesmtp.so.5 is needed by package pacemaker-1.0.12-1.el5.centos.i386 (clusterlabs)
Error: Missing Dependency: libesmtp.so.5()(64bit) is needed by package pacemaker-1.0.12-1.el5.centos.x86_64 (clusterlabs)
 You could try using --skip-broken to work around the problem
 You could try running: package-cleanup --problems
                        package-cleanup --dupes
                        rpm -Va --nofiles --nodigest
The program package-cleanup is found in the yum-utils package.

解決方式：

提示缺少libesmtp，安裝即可

# wget ftp://ftp.univie.ac.at/systems/linux/fedora/epel/5/x86_64/libesmtp-1.0.4-5.el5.x86_64.rpm
# wget ftp://ftp.univie.ac.at/systems/linux/fedora/epel/5/i386/libesmtp-1.0.4-5.el5.i386.rpm
# rpm -ivh libesmtp-1.0.4-5.el5.x86_64.rpm
# rpm -ivh libesmtp-1.0.4-5.el5.i386.rpm

7.4 Q4

問題現象：

加載crm配置時報錯：

[root@node1 ~]# crm configure load update pgsql.crm 
ERROR: pgsql: parameter rep_mode does not exist
ERROR: pgsql: parameter node_list does not exist
ERROR: pgsql: parameter master_ip does not exist
ERROR: pgsql: parameter restore_command does not exist
ERROR: pgsql: parameter primary_conninfo_opt does not exist
WARNING: pgsql: specified timeout 60s for stop is smaller than the advised 120
WARNING: pgsql: action monitor_Master not advertised in meta-data, it may not be supported by the RA
WARNING: pgsql: specified timeout 60s for start is smaller than the advised 120
WARNING: pgsql: action notify not advertised in meta-data, it may not be supported by the RA
WARNING: pgsql: action demote not advertised in meta-data, it may not be supported by the RA
WARNING: pgsql: action promote not advertised in meta-data, it may not be supported by the RA
WARNING: pingCheck: specified timeout 60s for start is smaller than the advised 90
WARNING: pingCheck: specified timeout 60s for stop is smaller than the advised 100
Do you still want to commit? no

解決方式：

參數不存在是因爲pgsql腳本太舊，需要替換

scp pgsql [email protected]:/usr/lib/ocf/resource.d/heartbeat/
scp ocf-shellfuncs.in [email protected]:/usr/lib/ocf/lib/heartbeat/
 
scp pgsql [email protected]:/usr/lib/ocf/resource.d/heartbeat/
scp ocf-shellfuncs.in [email protected]:/usr/lib/ocf/lib/heartbeat/

7.5 Q5

問題現象：

[root@node1 ~]# crm_mon -Afr -1
Last updated: Tue Jan 21 05:10:56 2014
Last change: Tue Jan 21 05:10:08 2014 via cibadmin on node1
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
7 Resources configured
 
 
Online: [ node1 node2 ]
 
Full list of resources:
 
 vip-slave (ocf::heartbeat:IPaddr2): Stopped 
 Resource Group: master-group
     vip-master (ocf::heartbeat:IPaddr2): Stopped 
     vip-rep (ocf::heartbeat:IPaddr2): Stopped 
 Master/Slave Set: msPostgresql [pgsql]
     Stopped: [ node1 node2 ]
 Clone Set: clnPingCheck [pingCheck]
     Stopped: [ node1 node2 ]
 
Node Attributes:
* Node node1:
* Node node2:
 
Migration summary:
* Node node1: 
* Node node2: 
 
Failed actions:
    pingCheck_monitor_0 on node1 'invalid parameter' (2): call=23, status=complete, last-rc-change='Tue Jan 21 05:10:10 2014', queued=200ms, exec=0ms
    pingCheck_monitor_0 on node2 'invalid parameter' (2): call=23, status=complete, last-rc-change='Tue Jan 21 05:09:36 2014', queued=281ms, exec=0ms

解決方式：

該錯誤是因爲腳本定義中的pingCheck調用的pingd腳本中存在未知參數，經查ocf/pacemaker/pingd中不存在multiplier參數：

primitive pingCheck ocf:pacemaker:pingd \

params \

name="default_ping_set" \

host_list="192.168.100.1" \

multiplier="100" \

op start timeout="60s" interval="0s" on-fail="restart" \

op monitor timeout="60s" interval="10s" on-fail="restart" \

op stop timeout="60s" interval="0s" on-fail="ignore"

因此將調用改爲ocf:heartbeat:pingd

7.6 Q6

問題現象：

corosync日誌中報錯：

Jan 21 04:36:02 corosync [TOTEM ] Received message has invalid digest... ignoring.

Jan 21 04:36:02 corosync [TOTEM ] Invalid packet data

解決方式：

說明網絡中存在相同的多播，更改多播地址即可。

八、參考資源

腳本：

https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/pgsql

腳本使用說明：

https://github.com/t-matsuo/resource-agents/wiki/Resource-Agent-for-PostgreSQL-9.1-streaming-replication

crm_resouce命令：

http://www.novell.com/zh-cn/documentation/sle_ha/book_sleha/data/man_crmresource.html

crm_failcount命令：

http://www.novell.com/zh-cn/documentation/sle_ha/book_sleha/data/man_crmfailcount.html

corosync+ pacemaker實現pg流複製自動切換（二）

五、測試

5.1 備節點失效

5.2 主節點失效切換

5.3 主節點恢復

六、管理

6.1 啓動關閉corosync

6.2 查看HA狀態

6.3 查看資源狀態及節點屬性

6.4 查看配置

6.5 實時監控HA

6.6 crm_resource命令

資源啓動/關閉：

列舉資源：

查看資源位置：

遷移資源：

刪除資源：

6.7 crm命令

列舉指定的RA：

刪除節點：

停用節點：

啓用節點：

配置pacemaker：

6.8 重置failcount

七、問題記錄

7.1 Q1

7.2 Q2

7.3 Q3

7.4 Q4

7.5 Q5

7.6 Q6

八、參考資源

我的友情鏈接

corosync+ pacemaker實現pg流複製自動切換（一）

PostgreSQL從繼承到分區（三）

Skytools安裝配置管理（三）

Skytools安裝配置管理（一）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結