前期發現的M5000服務器數據庫集羣存在壓力分配不均的問題,經過排查分析,此問題目前已解決,節點二已經可以通過scanip 創建新連接,
現將問題解決過程彙報如下:
問題現象:
1. 巡檢中發現rac集羣兩節點分配的會話連接數相差很大,大部分會話均創建在第一節點,第二節點不能通過scanip分配新會話;
修改前,各個實例分配的會話數如下:
INST_ID COUNT(USERNAME)
-------------------------
2 85
1 1463
2. 查看集羣服務進程均顯示正常,本地監聽與遠程監聽服務正常,參數文件配置正常,但是scanip下注冊的服務並不包括第二節點監聽信息;
查看監聽註冊信息如下:
-bash-3.00$lsnrctl status listener_scan1
LSNRCTLfor Solaris: Version 11.2.0.2.0 - Production on 31-OCT-2013 16:08:10
Copyright(c) 1991, 2010, Oracle. All rights reserved.
Connectingto (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUSof the LISTENER
------------------------
AliasLISTENER_SCAN1
VersionTNSLSNR for Solaris: Version 11.2.0.2.0 - Production
StartDate 16-OCT-2013 14:54:42
Uptime15 days 1 hr. 13 min. 28 sec
TraceLevel off
SecurityON: Local OS Authentication
SNMP OFF
ListenerParameter File /opt/grid/app/11.2.0/grid/network/admin/listener.ora
ListenerLog File/opt/grid/app/11.2.0/grid/log/diag/tnslsnr/ecsyhdb2/listener_scan1/alert/log.xml
ListeningEndpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.110)(PORT=1525)))
ServicesSummary...
Service "orcl"has 1 instance(s).
Instance "orcl1",status READY, has 1 handler(s) for this service...
Thecommand completed successfully
----查詢信息顯示,只註冊了orcl1服務,而沒有orcl2 服務。
解決過程
1. 經過分析節點二監聽信息,沒有註冊進scan服務,屬於oracle rac 的一個bug,相關bug號爲:13066936;
2.將遠程監聽註冊手工重新註冊一次,以繞過此bug,經過修改,節點二監聽信息註冊成功;
1. show parameter remote_listener
2. alter system set remote_listener='';
3. alter system register;
4. alter system setremote_listener='db-cluster-scan:1525';
5. alter system register;
注意:執行第二條命令,將取消各節點在scan下的註冊信息,此時查看scan下的服務將顯示爲空,因此建議在修改監聽之前,先準備好腳本,爭取快速解決,以防程序段出現異常。在執行完第五條命令後,再次查看scan 下的服務,可以查看到兩個節點的監聽信息均已註冊到scanip下
執行第二條命令後的查看scan狀態:
-bash-3.00$lsnrctl status listener_scan1
LSNRCTLfor Solaris: Version 11.2.0.2.0 - Production on 31-OCT-2013 16:08:56
Copyright(c) 1991, 2010, Oracle. All rightsreserved.
Connectingto (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUSof the LISTENER
------------------------
Alias LISTENER_SCAN1
Version TNSLSNR for Solaris: Version11.2.0.2.0 - Production
StartDate 16-OCT-2013 14:54:42
Uptime 15 days 1 hr. 14 min. 15sec
TraceLevel off
Security ON: Local OS Authentication
SNMP OFF
ListenerParameter File /opt/grid/app/11.2.0/grid/network/admin/listener.ora
ListenerLog File /opt/grid/app/11.2.0/grid/log/diag/tnslsnr/db2/listener_scan1/alert/log.xml
ListeningEndpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.110)(PORT=1525)))
Thelistener supports no services
Thecommand completed successfully
修改後,兩節點會話數查詢:
INST_ID COUNT(USERNAME)
-------------------------
2 343
1 1201
----------此時查看第二節點的會話數已經有明顯的上升。
查看scanip狀態
-bash-3.00$lsnrctl status listener_scan1
LSNRCTLfor Solaris: Version 11.2.0.2.0 - Production on 31-OCT-2013 16:09:09
Copyright(c) 1991, 2010, Oracle. All rights reserved.
Connectingto (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUSof the LISTENER
------------------------
AliasLISTENER_SCAN1
VersionTNSLSNR for Solaris: Version 11.2.0.2.0 - Production
StartDate 16-OCT-2013 14:54:42
Uptime15 days 1 hr. 14 min. 27 sec
TraceLevel off
SecurityON: Local OS Authentication
SNMPOFF
ListenerParameter File /opt/grid/app/11.2.0/grid/network/admin/listener.ora
ListenerLog File/opt/grid/app/11.2.0/grid/log/diag/tnslsnr/db2/listener_scan1/alert/log.xml
ListeningEndpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.110)(PORT=1525)))
ServicesSummary...
Service "orcl"has 2 instance(s).
Instance "orcl1",status READY, has 1 handler(s) for this service...
Instance "orcl2",status READY, has 1 handler(s) for this service...
Thecommand completed successfully
-------------重新註冊監聽後,查看scan下的註冊服務,已經可以看到兩個節點的信息。
因第二節點監聽是新註冊,所以節點二建立的會話數仍然較少,隨着運行時間的增加,兩節點的會話數將逐漸趨於平衡。
問題原因:
可能是scan listener 之前發生過failover,實例重啓後由於oracle的bug導致實例不被註冊上。
以下內容爲oracle官方文檔對此問題的描述:
Description
On migration of remote listener from one node to another, forexample during
node eviction (failover), the database does not re-register with the listener
as it does not receive any EOF.
As a result the database keeps listening on the same socket whenideally
it should re-register.
Rediscovery Notes:
Instance does not register serviceswhen scan fails over
Workaround
alter system set remote_listener ='';
alter system set remote_listener ='<remote server>:<listener port>';
Note:
This fix is just one piece of a largersolution needing other fixes
so that the clusterware / node monitorcan trigger a re-register as
required.
References
Bug:13066936 (This link will only work for PUBLISHED bugs)
Note:245840.1 Information on the sections in this article
源文檔 <https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=89966933432507&id=13066936.8&_afrWindowMode=0&_adf.ctrl-state=v9ps5dfil_4>