爲什麼使用vip
在Oracle RAC環境下爲什麼必須要使用VIP地址呢?簡而言之,這是因爲TCP超時。下面對此進行詳細討論。
無論相信與否,TCP超時對現實應用程序的可用性起到了巨大的作用。當Oracle RAC環境中的節點關閉時,客戶端將可能無法獲知節點停機。如果客戶端連接使用的是TNS別名,或所連接的服務可連接到多個節點,客戶端將在不知情的情況下首先嚐試與停機的節點連接。這本身並不是問題,因爲在列表中有多個地址,如表1,
racdb_cl = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.10.22)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.10.23)(PORT = 1521)) (LOAD_BALANCE = yes) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = racdb) ) ) |
表1
當客戶端未接收到列表中的第一個地址的響應時,將嘗試下一個地址,直至成功。問題在於嘗試連接列表中下一個地址之前的等待時間。
比如在表1中列出的地址列表中包含兩個IP地址,分別是192.168.10.22 和 192.168.10.23,當客戶端使用TNS別名racdb_cl (表1中的連接字符)的時候,首先應該去嘗試192.168.10.22。當服務器192.168.10.22沒有響應時,客戶端將會嘗試192.168.10.23,那麼客戶端將要等192.168.10.22多長時間呢?可能不同的環境有不同的情況,下面通過Linux環境爲服務器,Windows爲客戶端來進行模擬。
| 客戶端 | 服務器 | |
操作系統 | Windows 7 | RHEL5u8 | |
IP地址 | 192.168.10.254 | 192.168.10.22 | rac1 |
192.168.10.23 | rac2 | ||
庫名 | - | racdb | |
TNS別名 | racdb_cl | - |
1 在服務端查看各個資源
$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....SM1.asm application ONLINE ONLINE rac1
ora....C1.lsnr application ONLINE ONLINE rac1
ora....C1.lsnr application ONLINE ONLINE rac1
ora.rac1.gsd application ONLINE ONLINE rac1
ora.rac1.ons application ONLINE ONLINE rac1
ora.rac1.vip application ONLINE ONLINE rac1
ora....SM2.asm application ONLINE ONLINE rac2
ora....C2.lsnr application ONLINE ONLINE rac2
ora....C2.lsnr application ONLINE ONLINE rac2
ora.rac2.gsd application ONLINE ONLINE rac2
ora.rac2.ons application ONLINE ONLINE rac2
ora.rac2.vip application ONLINE ONLINE rac2
ora.racdb.db application ONLINE ONLINE rac1
ora.....rac.cs application ONLINE ONLINE rac1
ora....db1.srv application ONLINE ONLINE rac1
ora....b1.inst application ONLINE ONLINE rac1
ora....b2.inst application ONLINE ONLINE rac2
2 在節點1上查看監聽
$ lsnrctl status
LSNRCTL for Linux: Version 10.2.0.1.0 - Production on 17-JUN-2014 06:26:40
Copyright (c) 1991, 2005, Oracle. All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias LISTENER2_RAC1
Version TNSLSNR for Linux: Version 10.2.0.1.0 - Production
Start Date 17-JUN-2014 06:19:23
Uptime 0 days 0 hr. 7 min. 18 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/oracle/product/10gR2.1/network/admin/listener.ora
Listener Log File /u01/app/oracle/product/10gR2.1/network/log/listener2_rac1.log
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.10.22)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.10.20)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM1", status BLOCKED, has 1 handler(s) for this service...
Service "+ASM_XPT" has 1 instance(s).
Instance "+ASM1", status BLOCKED, has 1 handler(s) for this service...
Service "PLSExtProc" has 1 instance(s).
Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
Service "rac" has 1 instance(s).
Instance "racdb1", status READY, has 2 handler(s) for this service...
Service "racdb" has 2 instance(s).
Instance "racdb1", status READY, has 2 handler(s) for this service...
Instance "racdb2", status READY, has 1 handler(s) for this service...
Service "racdbXDB" has 2 instance(s).
Instance "racdb1", status READY, has 1 handler(s) for this service...
Instance "racdb2", status READY, has 1 handler(s) for this service...
Service "racdb_XPT" has 2 instance(s).
Instance "racdb1", status READY, has 2 handler(s) for this service...
Instance "racdb2", status READY, has 1 handler(s) for this service...
The command completed successfully
3 測試登錄數據庫
現在庫是正常運行狀態,那麼在客戶端使用racdb_cl連接,可能會被任意分配到某個節點,連接會被瞬間完成。
sqlplus scott/tiger@racdb_cl
SQL*Plus: Release 11.2.0.1.0 Production on 星期三 6月 18 23:20:46 2014
Copyright (c) 1982, 2010, Oracle. All rights reserved.
連接到:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options
23:20:47 SCOTT@RACDB>
4 下面關掉rac1節點
這麼做的目的是爲了讓vip飄到另外一個節點上去
# crsctl stop crs
5 在rac2節點上查看集羣狀態
剛纔的那一步等待一段時間之後查看狀態,發現所有的資源都飄到rac2上了。
# crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....SM1.asm application ONLINE OFFLINE
ora....C1.lsnr application OFFLINE OFFLINE
ora....C1.lsnr application OFFLINE OFFLINE
ora.rac1.gsd application ONLINE OFFLINE
ora.rac1.ons application ONLINE OFFLINE
ora.rac1.vip application ONLINE ONLINE rac2
ora....SM2.asm application ONLINE ONLINE rac2
ora....C2.lsnr application ONLINE ONLINE rac2
ora....C2.lsnr application ONLINE ONLINE rac2
ora.rac2.gsd application ONLINE ONLINE rac2
ora.rac2.ons application ONLINE ONLINE rac2
ora.rac2.vip application ONLINE ONLINE rac2
ora.racdb.db application ONLINE ONLINE rac2
ora.....rac.cs application ONLINE ONLINE rac2
ora....db1.srv application ONLINE ONLINE rac2
ora....b1.inst application ONLINE OFFLINE
ora....b2.inst application ONLINE ONLINE rac2
6 查看rac2的監聽
su - oracle
$ lsnrctl status
LSNRCTL for Linux: Version 10.2.0.1.0 - Production on 17-JUN-2014 06:39:50
Copyright (c) 1991, 2005, Oracle. All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias LISTENER2_RAC2
Version TNSLSNR for Linux: Version 10.2.0.1.0 - Production
Start Date 17-JUN-2014 06:19:38
Uptime 0 days 0 hr. 20 min. 12 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/oracle/product/10gR2.1/network/admin/listener.ora
Listener Log File /u01/app/oracle/product/10gR2.1/network/log/listener2_rac2.log
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.10.23)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.10.21)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM2", status BLOCKED, has 1 handler(s) for this service...
Service "+ASM_XPT" has 1 instance(s).
Instance "+ASM2", status BLOCKED, has 1 handler(s) for this service...
Service "PLSExtProc" has 1 instance(s).
Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
Service "rac" has 1 instance(s).
Instance "racdb2", status READY, has 2 handler(s) for this service...
Service "racdb" has 1 instance(s).
Instance "racdb2", status READY, has 2 handler(s) for this service...
Service "racdbXDB" has 1 instance(s).
Instance "racdb2", status READY, has 1 handler(s) for this service...
Service "racdb_XPT" has 1 instance(s).
Instance "racdb2", status READY, has 2 handler(s) for this service...
The command completed successfully
從上面的監聽可以看出,監聽僅僅監聽兩個地址,分別是192.168.10.23,192.168.10.21,並沒有包含192.168.10.22(rac1節點的vip地址)。我們知道如果rac2包含192.168.10.22這樣的一個IP地址,但是卻沒有在這個地址上打開一個1521的端口,那麼當它收到客戶端訪問192.168.10.22:1521的請求,就會立刻回絕一個類似這樣的消息:訪問錯誤,我沒有監聽這個插口(IP+端口=插口)。
7 使用rac1的vip地址連接
我用客戶端對這個地址進行訪問,並且抓包看看情況
使用下面的語句來訪問庫
sqlplus scott/[email protected]/racdb
下面是我抓的包,圖1
圖1
上圖一共顯示了6個包,其中第一個包是訪問192.168.10.22的1521,前面已經提到,如果服務器並沒有監聽1521,會直接拒絕掉訪問,所以第二個包返回了一個RST重置會話。
緊接着192.168.10.254不甘心,又重複了兩次之後發現都拒絕,終於心灰意冷,停止訪問。
8 使用不存在的ip連接
這麼做的目的是爲了模擬不存在vip的情況,假如192.168.10.29是某一臺節點的唯一IP,當這個節點down了,並沒有留給其他節點vip,那麼就會發生下面的情況。
sqlplus scott/[email protected]/racdb
在我的實驗環境中並沒有192.168.10.29這個地址,所以預計應該是等待,知道超時爲止。
在以太網中,如果沒有mac地址那麼網卡是不會發包的,所以需要做一個arp的映射。
C:\Users\hani>arp -s 192.168.10.29 aa-bb-cc-11-22-33
訪問這個地址
C:\Users\hani>sqlplus scott/[email protected]:1521/racdb
圖2
從上面的圖中可以看出,第一個包和第二個包之間間隔了3秒鐘,第二個包和第三個包之間間隔了6秒,那麼猜測第三個包和第四個包之間應該相差12秒纔對,而我們的客戶端嘗試3次之後並沒有發送第四個包。直接判斷超時
實驗到這裏就得出結論了,VIP飄的原因,是爲了快速拒絕客戶端而產生的。從圖1不難發現,第一個包到第六個包之間才相隔了不到1s。
參考
《ORACLE DATABASE 11G R2高可用性:使用GRID INFRASTRUCTURE、RAC和DATA GUARD最大限度提高可用性(第2版)》