爲RAC私有網絡配置網卡Bonding

在RAC的安裝部署過程中，並不僅僅是簡單的安裝完成了事，整個安裝過程要考慮可能出現的單點問題，其中比較重要的是私有網絡。

私有網絡是RAC節點間通信的通道，包括節點間的網絡心跳信息、Cache fusion傳遞數據塊都需要通過私有網絡。而很多的私有網絡都僅僅是一塊單獨的網卡連接上交換機就完成了，更有甚者，直接使用服務器間網卡互連的方式配置私有網絡。這種部署方式簡單，但RAC投入使用後風險非常大，存在諸多單點如網卡、網線、交換機口、交換機。幾乎每個組件發生故障都會導致RAC split，所以建議爲私有網絡配置雙網卡bonding。

下面是我的配置步驟：

環境：

OS：CentOS release 6.4 (Final)

Oracle：11.2.0.4 RAC

網卡：4個 em1,em2,em3,em4，當前em1作爲公有網卡，em3作爲私有網卡已經啓用了，em2和em4閒置。

配置bond模塊並加載（在2個節點執行）：

編輯/etc/modprobe.d/bonding.conf加入內容：

[root@node2 ~]# vi /etc/modprobe.d/bonding.conf

alias bond0 bonding

[root@node2 ~]# modprobe -a bond0

驗證：

[root@node2 ~]# lsmod |grep bond

bonding 127331 0

8021q 25317 1 bonding

ipv6 321422 274 bonding,ip6t_REJECT,nf_conntrack_ipv6,nf_defrag_ipv6

編輯網卡配置文件，編輯成如下內容：

節點一：

Ifcfg-em2:

DEVICE=em2

BOOTPROTO=none

ONBOOT=yes

MASTER=bond0

SLAVE=yes

Ifcfg-em4:

DEVICE=em4

BOOTPROTO=none

ONBOOT=yes

MASTER=bond0

SLAVE=yes

Ifcfg-bond0:

DEVICE=bond0

MASTER=yes

BOOTPROTO=node

ONBOOT=yes

BONDING_OPTS="mode=1 miimon=100"

IPADDR=10.10.10.105

PREFIX=24

GATEWAY=10.10.10.1

節點二：

ifcfg-em2:

DEVICE=em2

BOOTPROTO=none

ONBOOT=yes

MASTER=bond0

SLAVE=yes

ifcfg-em4:

DEVICE=em4

BOOTPROTO=none

ONBOOT=yes

MASTER=bond0

SLAVE=yes

Ifcfg-bond0:

DEVICE=bond0

MASTER=yes

BOOTPROTO=node

ONBOOT=yes

BONDING_OPTS="mode=1 miimon=100"

IPADDR=10.10.10.106

PREFIX=24

GATEWAY=10.10.10.1

我這裏使用的是mode=1的主備網卡模式，平時只激活一塊網卡，一旦主網卡發生故障，會切換鏈路到備網卡，其他也可以考慮4,6兩種mode。

修改完了配置文件之後，分別在2個節點啓動bond0：ifup bond0。

此時可以看到：

[root@node1 ~]# ifconfig

bond0 Link encap:Ethernet HWaddr C8:1F:66:FB:6F:CB

inet addr:10.10.10.105 Bcast:10.10.10.255 Mask:255.255.255.0

inet6 addr: fe80::ca1f:66ff:fefb:6fcb/64 Scope:Link

UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1

RX packets:9844809 errors:0 dropped:0 overruns:0 frame:0

TX packets:7731078 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:9097132073 (8.4 GiB) TX bytes:6133004979 (5.7 GiB)

em2 Link encap:Ethernet HWaddr C8:1F:66:FB:6F:CB

UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1

RX packets:9792915 errors:0 dropped:0 overruns:0 frame:0

TX packets:7731078 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:9088278883 (8.4 GiB) TX bytes:6133004979 (5.7 GiB)

Interrupt:38

em4 Link encap:Ethernet HWaddr C8:1F:66:FB:6F:CB

UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1

RX packets:51894 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:8853190 (8.4 MiB) TX bytes:0 (0.0 b)

Interrupt:36

網卡的bonding已經配置成功了。

測試驗證

此時可以測試分別斷掉em2 em4，在一個節點長ping另一個節點的私有IP，並結合/proc/net/bonding/bond0的信息觀察primary slave的變化，可以發現當down一個網卡時ping不會中斷。

Bond0配置好之後，接下來一步就是把配置成RAC的私有網卡。

爲了避免配置失敗，首先要備份好原來的配置文件。

以grid用戶在2個節點對$GRID_HOME/ grid/gpnp/noden/profiles/peer/profile.xml文件執行備份：

cd /u01/app/11.2.0/grid/gpnp/noden/profiles/peer

cp profile.xml profile.xml.bk

[root@node2 peer]# ls

pending.xml profile_orig.xml profile.xml profile.xml.bk、

查看目前的私有網絡配置：

node2-> oifcfg getif

em1 192.168.10.0 global public

em3 10.10.10.0 global cluster_interconnect

先添加新的私有網絡，在任一節點執行即可：

node1-> oifcfg setif -global bond0/10.10.10.0:cluster_interconnect

這一步在執行時可能會報錯：

node1-> oifcfg setif -global bond0/10.10.10.0:cluster_interconnect

PRIF-33: Failed to set or delete interface because hosts could not be discovered

CRS-02307: No GPnP services on requested remote hosts.

PRIF-32: Error in checking for profile availability for host node2

CRS-02306: GPnP service on host "node2" not found.

這是因爲gpnpd服務異常導致的。

解決方法：可以Kill掉gpnpd進程，GI會自動重新啓動gpnpd服務。

在2個節點執行：

[root@node2 ~]# ps -ef| grep gpnp

grid 4927 1 0 Sep22 ? 00:26:38 /u01/app/11.2.0/grid/bin/gpnpd.bin

grid 48568 46762 0 17:26 pts/3 00:00:00 tail -f /u01/app/11.2.0/grid/log/node2/gpnpd/gpnpd.log

root 48648 48623 0 17:26 pts/4 00:00:00 grep gpnp

[root@node2 ~]# kill -9 4927

[root@node2 ~]#

參考gpnpd.log

添加私有網絡之後，我們按照如下步驟將原來的私有網絡刪除：

首先停止並disable掉crs。

以root用戶在2個節點分別執行以下命令：

停止crs

crsctl stop crs

禁用crs

crsctl disable crs

修改hosts文件，將私有IP地址改爲新地址。

2個節點分別執行：

ping node1-priv

ping node2-priv

再啓動crs。

[root@node2 ~]# crsctl enable crs

CRS-4622: Oracle High Availability Services autostart is enabled.

[root@node2 ~]# crsctl start crs

刪除原來的私有網絡：

node2-> oifcfg delif -global em3/10.10.10.0:cluster_interconnect

檢查驗證，配置成功了。

node2-> oifcfg getif

em1 192.168.10.0 global public

bond0 10.10.10.0 global cluster_interconnect

node2->

下面做一個測試驗證下bonding的效果：

ifdown掉em2，此時messages會出現日誌信息：

Oct 25 22:00:32 node1 kernel: bonding: bond0: Removing slave em2

Oct 25 22:00:32 node1 kernel: bonding: bond0: Warning: the permanent HWaddr of em2 - c8:1f:66:fb:6f:cb - is still in use by bond0. Set the HWaddr of em2 to a different address to avoid conflicts.

Oct 25 22:00:32 node1 kernel: bonding: bond0: releasing active interface em2

Oct 25 22:00:32 node1 kernel: bonding: bond0: making interface em4 the new active one.

此時bond0自動切換到em4上，所以這個時候ping私有IP是沒有問題的。

查看/proc/net/bonding/bond0發現active的slave已經變成em4了：

[root@node1 ~]# cat /proc/net/bonding/bond0

Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: fault-tolerance (active-backup)

Primary Slave: None

Currently Active Slave: em4

MII Status: up

MII Polling Interval (ms): 100

Up Delay (ms): 0

Down Delay (ms): 0

Slave Interface: em4

MII Status: up

Speed: 1000 Mbps

Duplex: full

Link Failure Count: 3

Permanent HW addr: c8:1f:66:fb:6f:cd

Slave queue ID: 0

[root@node1 ~]#

crsd.log ocssd.log中沒有報錯信息，css仍然每隔5秒發送一次網絡心跳。

2014-10-25 22:00:32.975: [ CSSD][656893696]clssnmSendingThread: sent 5 status msgs to all nodes

2014-10-25 22:00:37.977: [ CSSD][656893696]clssnmSendingThread: sending status msg to all nodes

2014-10-25 22:00:37.977: [ CSSD][656893696]clssnmSendingThread: sent 5 status msgs to all nodes

2014-10-25 22:00:42.978: [ CSSD][656893696]clssnmSendingThread: sending status msg to all nodes

這說明bonding確實做到了保護私有網絡的單點故障。

此時再down掉em4：

[root@node1 ~]# ifdown em4

Em4down掉之後，私有IP無法從Node2上ping通了，此時bond0也down掉：

[root@node1 ~]# cat /proc/net/bonding/bond0

Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: fault-tolerance (active-backup)

Primary Slave: None

Currently Active Slave: None

MII Status: down

MII Polling Interval (ms): 100

Up Delay (ms): 0

Down Delay (ms): 0

此時messages信息：

Oct 25 22:02:23 node1 kernel: bonding: bond0: Removing slave em4

Oct 25 22:02:23 node1 kernel: bonding: bond0: releasing active interface em4

ocssd.log日誌顯示2秒後css已經檢測到了私有網絡故障：

2014-10-25 22:02:25.573: [GIPCHGEN][1744828160] gipchaInterfaceFail: marking interface failing 0x7f025c00c0a0 { host '', haName 'c617-7010-b72d-6c39', local (nil), ip '10.10.10.105:46469', subnet '10.10.10.0', mask '255.255.255.0', mac 'c8-1f-66-fb-6f-cb', ifname 'bond0', numRef 1, numFail 0, idxBoot 0, flags 0xd }

2014-10-25 22:02:25.661: [GIPCHGEN][1951459072] gipchaInterfaceFail: marking interface failing 0x7f025c023d90 { host 'node2', haName 'ba2c-9227-ca29-8a21', local 0x7f025c00c0a0, ip '10.10.10.106:32369', subnet '10.10.10.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 2, flags 0x6 }

2014-10-25 22:02:27.663: [GIPCHTHR][1951459072] gipchaWorkerCreateInterface: created remote interface for node 'node2', haName 'ba2c-9227-ca29-8a21', inf 'udp://10.10.10.106:32369'

並且發現本服務器的私有網絡連不上了

2014-10-25 22:02:27.572: [GIPCHDEM][536868608] gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0xc99250 [0000000000000010] { gipchaContext : host 'node1', name 'CSS_node-cluster', luid 'e2a491a6-00000000', numNode 1, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd

2014-10-25 22:02:31.012: [ CSSD][656893696]clssnmSendingThread: sending status msg to all nodes

2014-10-25 22:02:31.012: [GIPCHALO][669509376] gipchaLowerProcessNode: no valid interfaces found to node for 5530 ms, node 0x7f0c18065e40 { host 'node2', haName 'CSS_node-cluster', srcLuid e2a491a6-d8e24a48, dstLuid ebce4a7f-2b4e4348 numInf 0, contigSeq 317864, lastAck 317717, lastValidAck 317864, sendSeq [317718 : 317729], createTime 4294073210, sentRegister 1, localMonitor 1, flags 0x2408 }

2014-10-25 22:02:31.012: [ CSSD][656893696]clssnmSendingThread: sent 5 status msgs to all nodes

2014-10-25 22:02:33.573: [GIPCHDEM][536868608] gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0xc99250 [0000000000000010] { gipchaContext : host 'node1', name 'CSS_node-cluster', luid 'e2a491a6-00000000', numNode 1, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd

2014-10-25 22:02:36.013: [ CSSD][656893696]clssnmSendingThread: sending status msg to all nodes

2014-10-25 22:02:36.013: [ CSSD][656893696]clssnmSendingThread: sent 5 status msgs to all nodes

2014-10-25 22:02:37.014: [GIPCHALO][669509376] gipchaLowerProcessNode: no valid interfaces found to node for 11530 ms, node 0x7f0c18065e40 { host 'node2', haName 'CSS_node-cluster', srcLuid e2a491a6-d8e24a48, dstLuid ebce4a7f-2b4e4348 numInf 0, contigSeq 317864, lastAck 317717, lastValidAck 317864, sendSeq [317718 : 317741], createTime 4294073210, sentRegister 1, localMonitor 1, flags 0x2408 }

顯示節點二的網絡心跳失敗，14.34秒之後會將節點二移出集羣：

2014-10-25 22:02:39.010: [ CSSD][658470656]clssnmPollingThread: node node2 (2) at 50% heartbeat fatal, removal in 14.340 seconds

2014-10-25 22:02:39.010: [ CSSD][658470656]clssnmPollingThread: node node2 (2) is impending reconfig, flag 2228230, misstime 15660

網絡心跳失敗容忍多長時間由css中的Misscount決定，RAC默認配置是30秒：

node2-> crsctl get css misscount

CRS-4678: Successful get misscount 30 for Cluster Synchronization Services.

此時開始報has a disk HB, but no network HB：

2014-10-25 22:02:53.012: [ CSSD][664778496]clssnmvDHBValidateNcopy: node 2, node2, has a disk HB, but no network HB, DHB has rcfg 309527290, wrtcnt, 3048511, LATS 212362014, lastSeqNo 3048510, uniqueness 1414032467, timestamp 1414245773/212369384

2014-10-25 22:02:53.192: [ CSSD][667932416]clssnmvDiskPing: Writing with status 0x3, timestamp 1414245773/212362194

2014-10-25 22:02:53.352: [ CSSD][658470656]clssnmPollingThread: Removal started for node node2 (2), flags 0x220006, state 3, wt4c 0

2014-10-25 22:02:53.352: [ CSSD][658470656]clssnmMarkNodeForRemoval: node 2, node2 marked for removal

2014-10-25 22:02:53.352: [ CSSD][658470656]clssnmDiscHelper: node2, node(2) connection failed, endp (0x6b7), probe(0x7f0c00000000), ninf->endp 0x6b7

2014-10-25 22:02:53.352: [ CSSD][658470656]clssnmDiscHelper: node 2 clean up, endp (0x6b7), init state 5, cur state 5

2014-10-25 22:02:53.352: [GIPCXCPT][658470656] gipcInternalDissociate: obj 0x7f0bf00084c0 [00000000000006b7] { gipcEndpoint : localAddr 'gipcha://node1:c5bc-f486-c390-b48', remoteAddr 'gipcha://node2:nm2_node-cluster/b370-8934-efb4-3f2', numPend 1, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 1, wobj 0x7f0bf000a3f0, sendp (nil)flags 0x38606, usrFlags 0x0 } not associated with any container, ret gipcretFail (1)

之後node2被踢出：

2014-10-25 22:02:53.353: [ CSSD][655316736]clssnmNeedConfReq: No configuration to change

2014-10-25 22:02:53.353: [ CSSD][655316736]clssnmDoSyncUpdate: Terminating node 2, node2, misstime(30000) state(5)

2014-10-25 22:02:53.353: [ CSSD][655316736]clssnmDoSyncUpdate: Wait for 0 vote ack(s)

2014-10-25 22:02:53.353: [ CSSD][655316736]clssnmCheckDskInfo: Checking disk info...

2014-10-25 22:02:53.353: [ CSSD][655316736]clssnmCheckSplit: Node 2, node2, is alive, DHB (1414245773, 212369384) more than disk timeout of 27000 after the last NHB (1414245743, 212339734)

2014-10-25 22:02:53.353: [ CSSD][655316736]clssnmCheckDskInfo: My cohort: 1

2014-10-25 22:02:53.353: [ CSSD][891287296]clssgmQueueGrockEvent: groupName(crs_version) count(3) master(0) event(2), incarn 3, mbrc 3, to member 2, events 0x0, state 0x0

2014-10-25 22:02:53.353: [ CSSD][655316736]clssnmRemove: Start

2014-10-25 22:02:53.353: [ CSSD][655316736](:CSSNM00007:)clssnmrRemoveNode: Evicting node 2, node2, from the cluster in incarnation 309527290, node birth incarnation 309527289, death incarnation 309527290, stateflags 0x224000 uniqueness value 1414032467

2014-10-25 22:02:53.353: [ CSSD][891287296]clssgmQueueGrockEvent: groupName(IGSSZGDBsszgdb) count(2) master(1) event(2), incarn 2, mbrc 2, to member 1, events 0x0, state 0x0

2014-10-25 22:02:53.353: [ default][655316736]kgzf_gen_node_reid2: generated reid cid=6d207e372096ef48ff1031c3298552d5,icin=309527289,nmn=2,lnid=309527289,gid=0,gin=0,gmn=0,umemid=0,opid=0,opsn=0,lvl=node hdr=0xfece0100

Node2被隔離：

2014-10-25 22:02:53.353: [ CSSD][655316736]clssnmrFenceSage: Fenced node node2, number 2, with EXADATA, handle 0

2014-10-25 22:02:53.353: [ CSSD][655316736]clssnmSendShutdown: req to node 2, kill time 212362354

2014-10-25 22:02:53.353: [ CSSD][891287296]clssgmQueueGrockEvent: groupName(CRF-) count(4) master(0) event(2), incarn 4, mbrc 4, to member 2, events 0x38, state 0x0

2014-10-25 22:02:53.353: [ CSSD][655316736]clssnmsendmsg: not connected to node 2

2014-10-25 22:02:53.353: [ CSSD][655316736]clssnmSendShutdown: Send to node 2 failed

2014-10-25 22:02:53.353: [ CSSD][655316736]clssnmWaitOnEvictions: Start

防止node2寫數據文件：

2014-10-25 22:02:53.354: [ CSSD][664778496]clssnmvDiskEvict: Kill block write, file /dev/asm_datafile flags 0x00010004, kill block unique 1414032467, stamp 212362354/212362354

crsd.log顯示原來運行的Node2上的資源被切換到node1上運行：

2014-10-25 22:03:00.946: [ CRSPE][870311680]{1:61167:7502} CRS-2676: Start of 'ora.node2.vip' on 'node1' succeeded

2014-10-25 22:03:02.358: [ CRSPE][870311680]{1:61167:7502} CRS-2676: Start of 'ora.LISTENER_SCAN2.lsnr' on 'node1' succeeded

並且此時Node2上的crs已經無法通信：

node2-> crs_stat -t

CRS-0184: Cannot communicate with the CRS daemon.

將em4 up：

ifup em4

[root@node1 ~]# cat /proc/net/bonding/bond0

Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: fault-tolerance (active-backup)

Primary Slave: None

Currently Active Slave: None

MII Status: down

MII Polling Interval (ms): 100

Up Delay (ms): 0

Down Delay (ms): 0

Slave Interface: em4

MII Status: down

Speed: 1000 Mbps

Duplex: full

Link Failure Count: 0

Permanent HW addr: c8:1f:66:fb:6f:cd

Slave queue ID: 0

查看messages：

Oct 25 22:03:43 node1 kernel: bonding: bond0: setting mode to active-backup (1).

Oct 25 22:03:43 node1 kernel: bonding: bond0: Setting MII monitoring interval to 100.

Oct 25 22:03:43 node1 kernel: bonding: bond0: Adding slave em4.

Oct 25 22:03:44 node1 kernel: bonding: bond0: enslaving em4 as a backup interface with a down link.

Oct 25 22:03:47 node1 kernel: tg3 0000:02:00.1: em4: Link is up at 1000 Mbps, full duplex

Oct 25 22:03:47 node1 kernel: tg3 0000:02:00.1: em4: Flow control is on for TX and on for RX

Oct 25 22:03:47 node1 kernel: tg3 0000:02:00.1: em4: EEE is disabled

手動啓動em4之後，em4被自動設置爲Slave Interface，因爲之前bond0已經被down，所以需要手動啓動bond0

ifup bond0

[root@node1 ~]# ifup bond0

[root@node1 ~]# cat /proc/net/bonding/bond0

Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: fault-tolerance (active-backup)

Primary Slave: None

Currently Active Slave: em4

MII Status: up

MII Polling Interval (ms): 100

Up Delay (ms): 0

Down Delay (ms): 0

Slave Interface: em4

MII Status: up

Speed: 1000 Mbps

Duplex: full

Link Failure Count: 0

Permanent HW addr: c8:1f:66:fb:6f:cd

Slave queue ID: 0

Slave Interface: em2

MII Status: down

Speed: Unknown

Duplex: Unknown

Link Failure Count: 0

Permanent HW addr: c8:1f:66:fb:6f:cb

Slave queue ID: 0

此時messages顯示bond0已經就緒：

Oct 25 22:05:25 node1 kernel: bonding: bond0: Adding slave em2.

Oct 25 22:05:26 node1 kernel: bonding: bond0: enslaving em2 as a backup interface with a down link.

Oct 25 22:05:26 node1 kernel: ADDRCONF(NETDEV_UP): bond0: link is not ready

Oct 25 22:05:26 node1 kernel: 8021q: adding VLAN 0 to HW filter on device bond0

Oct 25 22:05:26 node1 kernel: bond0: link status definitely up for interface em4, 1000 Mbps full duplex.

Oct 25 22:05:26 node1 kernel: bonding: bond0: making interface em4 the new active one.

Oct 25 22:05:26 node1 kernel: bonding: bond0: first active interface up!

Oct 25 22:05:26 node1 kernel: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready

Ocssd.log也顯示可以連接到node2的信息：

2014-10-25 22:05:28.026: [GIPCHGEN][536868608] gipchaNodeAddInterface: adding interface information for inf 0x7f0c14222be0 { host '', haName 'CSS_node-cluster', local (nil), ip '10.10.10.105', subnet '10.10.10.0', mask '255.255.255.0', mac 'c8-1f-66-fb-6f-cd', ifname 'bond0', numRef 0, numFail 0, idxBoot 0, flags 0x1841 }

2014-10-25 22:05:28.254: [GIPCHTHR][669509376] gipchaWorkerCreateInterface: created local interface for node 'node1', haName 'CSS_node-cluster', inf 'udp://10.10.10.105:60625'

2014-10-25 22:05:28.255: [GIPCHTHR][669509376] gipchaWorkerCreateInterface: created local bootstrap multicast interface for node 'node1', haName 'CSS_node-cluster', inf 'mcast://224.0.0.251:42424/10.10.10.105'

2014-10-25 22:05:30.560: [ CSSD][653739776]clssnmSendConnAck: connected to node 2, node2, con (0xa111d5), state 0

2014-10-25 22:05:30.560: [ CSSD][653739776]clssnmCompleteConnProtocol: node node2, 2, uniqueness 1414245779, msg uniqueness 1414245779, endp 0xa111d5 probendp 0x7f0c00000000 endp 0xa111d5

2014-10-25 22:05:31.465: [ CSSD][653739776]clssnmHandleJoin: node 2 JOINING, state 0->1 ninfendp 0x7f0c00a111d5

2014-10-25 22:05:31.940: [ CSSD][664778496]clssnmvReadDskHeartbeat: Reading DHBs to get the latest info for node(2/node2), LATSvalid(0), nodeInfoDHB uniqueness(1414032467)

2014-10-25 22:05:31.940: [ CSSD][664778496]clssnmvDHBValidateNcopy: Setting LATS valid due to uniqueness change for node(node2) number(2), nodeInfoDHB(1414032467), readInfo(1414245779)

2014-10-25 22:05:31.940: [ CSSD][664778496]clssnmvDHBValidateNcopy: Saving DHB uniqueness for node node2, number 2 latestInfo(1414245779), readInfo(1414245779), nodeInfoDHB(1414032467)

2014-10-25 22:05:32.366: [ CSSD][655316736]clssnmDoSyncUpdate: Initiating sync 309527291

2014-10-25 22:05:32.366: [ CSSD][655316736]clssscCompareSwapEventValue: changed NMReconfigInProgress val 1, from -1, changes 7

2014-10-25 22:05:32.366: [ CSSD][655316736]clssnmDoSyncUpdate: local disk timeout set to 200000 ms, remote disk timeout set to 200000

2014-10-25 22:05:32.366: [ CSSD][655316736]clssnmDoSyncUpdate: new values for local disk timeout and remote disk timeout will take effect when the sync is completed.

2014-10-25 22:05:32.366: [ CSSD][655316736]clssnmDoSyncUpdate: Starting cluster reconfig with incarnation 309527291

2014-10-25 22:05:32.366: [ CSSD][655316736]clssnmSetupAckWait: Ack message type (11)

2014-10-25 22:05:32.366: [ CSSD][655316736]clssnmSetupAckWait: node(1) is ALIVE

2014-10-25 22:05:32.366: [ CSSD][655316736]clssnmSetupAckWait: node(2) is ALIVE

2014-10-25 22:05:32.369: [ CSSD][655316736]clssnmDoSyncUpdate: node(2) is transitioning from joining state to active state

2014-10-25 22:05:32.370: [ CSSD][653739776]clssnmHandleUpdate: NODE 1 (node1) IS ACTIVE MEMBER OF CLUSTER

2014-10-25 22:05:32.370: [ CSSD][653739776]clssnmHandleUpdate: NODE 2 (node2) IS ACTIVE MEMBER OF CLUSTER

2014-10-25 22:05:32.370: [ CSSD][891287296]clssgmSuspendAllGrocks: done

crsd.log顯示將節點1上運行的節點二資源停止並在node2上重新啓動：

2014-10-25 22:06:05.097: [ CRSPE][870311680]{2:25913:2} CRS-2677: Stop of 'ora.node2.vip' on 'node1' succeeded

2014-10-25 22:06:05.101: [ CRSPE][870311680]{2:25913:2} CRS-2672: Attempting to start 'ora.node2.vip' on 'node2'

此時資源又重新飄回node2了：

node2-> crs_stat -t

Name Type Target State Host

------------------------------------------------------------

ora....ER.lsnr ora....er.type ONLINE ONLINE node1

ora....N1.lsnr ora....er.type ONLINE ONLINE node2

ora....N2.lsnr ora....er.type ONLINE ONLINE node1

ora....N3.lsnr ora....er.type ONLINE ONLINE node1

ora.OCR.dg ora....up.type ONLINE ONLINE node1

ora.TEMP.dg ora....up.type ONLINE ONLINE node1

ora.UNDO.dg ora....up.type ONLINE ONLINE node1

ora.asm ora.asm.type ONLINE ONLINE node1

ora.cvu ora.cvu.type ONLINE ONLINE node1

ora.gsd ora.gsd.type OFFLINE OFFLINE

ora....network ora....rk.type ONLINE ONLINE node1

ora....SM1.asm application ONLINE ONLINE node1

ora....E1.lsnr application ONLINE ONLINE node1

ora.node1.gsd application OFFLINE OFFLINE

ora.node1.ons application ONLINE ONLINE node1

ora.node1.vip ora....t1.type ONLINE ONLINE node1

ora....SM2.asm application ONLINE ONLINE node2

ora....E2.lsnr application ONLINE ONLINE node2

ora.node2.gsd application OFFLINE OFFLINE

ora.node2.ons application ONLINE ONLINE node2

ora.node2.vip ora....t1.type ONLINE ONLINE node2

ora.oc4j ora.oc4j.type ONLINE ONLINE node1

ora.ons ora.ons.type ONLINE ONLINE node1

ora.scan1.vip ora....ip.type ONLINE ONLINE node2

ora.scan2.vip ora....ip.type ONLINE ONLINE node1

ora.scan3.vip ora....ip.type ONLINE ONLINE node1

ora.sszgdb.db ora....se.type ONLINE ONLINE node1

我們我們可以將bond0的另一個slave interface啓用起來：

[root@node1 ~]# ifup em2

[root@node1 ~]# cat /proc/net/bonding/bond0

Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: fault-tolerance (active-backup)

Primary Slave: None

Currently Active Slave: em4

MII Status: up

MII Polling Interval (ms): 100

Up Delay (ms): 0

Down Delay (ms): 0

Slave Interface: em4

MII Status: up

Speed: 1000 Mbps

Duplex: full

Link Failure Count: 0

Permanent HW addr: c8:1f:66:fb:6f:cd

Slave queue ID: 0

Slave Interface: em2

MII Status: up

Speed: 1000 Mbps

Duplex: full

Link Failure Count: 0

Permanent HW addr: c8:1f:66:fb:6f:cb

Slave queue ID: 0

Messages：

Oct 25 22:05:29 node1 kernel: tg3 0000:01:00.1: em2: Link is up at 1000 Mbps, full duplex

Oct 25 22:05:29 node1 kernel: tg3 0000:01:00.1: em2: Flow control is on for TX and on for RX

Oct 25 22:05:29 node1 kernel: tg3 0000:01:00.1: em2: EEE is disabled

Oct 25 22:05:29 node1 kernel: bond0: link status definitely up for interface em2, 1000 Mbps full duplex.

可以看到em2已經就緒，私有網絡又恢復到擁有2個網卡保駕護航的狀態。

爲RAC私有網絡配置網卡Bonding

測試人員都是畫畫大神，讓我看看誰還不會用代碼圖？

Object.values()對象遍歷

爲RAC私有網絡配置網卡Bonding

取消SQL執行，卻半天不響應是何故

RAC優化之配置大幀（jumbo frame）

spring配置讀取方式給數據遷移帶來的風險

oracle event 2

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結