rac 下 負載不均 問題解決

前期發現的M5000服務器數據庫集羣存在壓力分配不均的問題,經過排查分析,此問題目前已解決,節點二已經可以通過scanip 創建新連接,

現將問題解決過程彙報如下:

問題現象:

1. 巡檢中發現rac集羣兩節點分配的會話連接數相差很大,大部分會話均創建在第一節點,第二節點不能通過scanip分配新會話;

修改前,各個實例分配的會話數如下:

INST_ID COUNT(USERNAME)

-------------------------

2 85

1 1463


2. 查看集羣服務進程均顯示正常,本地監聽與遠程監聽服務正常,參數文件配置正常,但是scanip下注冊的服務並不包括第二節點監聽信息;

查看監聽註冊信息如下:


-bash-3.00$lsnrctl status listener_scan1

LSNRCTLfor Solaris: Version 11.2.0.2.0 - Production on 31-OCT-2013 16:08:10

Copyright(c) 1991, 2010, Oracle. All rights reserved.

Connectingto (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))

STATUSof the LISTENER

------------------------

AliasLISTENER_SCAN1

VersionTNSLSNR for Solaris: Version 11.2.0.2.0 - Production

StartDate 16-OCT-2013 14:54:42

Uptime15 days 1 hr. 13 min. 28 sec

TraceLevel off

SecurityON: Local OS Authentication

SNMP OFF

ListenerParameter File /opt/grid/app/11.2.0/grid/network/admin/listener.ora

ListenerLog File/opt/grid/app/11.2.0/grid/log/diag/tnslsnr/ecsyhdb2/listener_scan1/alert/log.xml

ListeningEndpoints Summary...

(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))

(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.110)(PORT=1525)))

ServicesSummary...

Service "orcl"has 1 instance(s).

Instance "orcl1",status READY, has 1 handler(s) for this service...

Thecommand completed successfully

----查詢信息顯示,只註冊了orcl1服務,而沒有orcl2 服務。

解決過程

1. 經過分析節點二監聽信息,沒有註冊進scan服務,屬於oracle rac 的一個bug,相關bug號爲:13066936

2.將遠程監聽註冊手工重新註冊一次,以繞過此bug,經過修改,節點二監聽信息註冊成功;


1. show parameter remote_listener

2. alter system set remote_listener='';

3. alter system register;

4. alter system setremote_listener='db-cluster-scan:1525';

5. alter system register;


注意:執行第二條命令,將取消各節點在scan下的註冊信息,此時查看scan下的服務將顯示爲空,因此建議在修改監聽之前,先準備好腳本,爭取快速解決,以防程序段出現異常。在執行完第五條命令後,再次查看scan 下的服務,可以查看到兩個節點的監聽信息均已註冊到scanip


執行第二條命令後的查看scan狀態:


-bash-3.00$lsnrctl status listener_scan1


LSNRCTLfor Solaris: Version 11.2.0.2.0 - Production on 31-OCT-2013 16:08:56


Copyright(c) 1991, 2010, Oracle. All rightsreserved.


Connectingto (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))

STATUSof the LISTENER

------------------------

Alias LISTENER_SCAN1

Version TNSLSNR for Solaris: Version11.2.0.2.0 - Production

StartDate 16-OCT-2013 14:54:42

Uptime 15 days 1 hr. 14 min. 15sec

TraceLevel off

Security ON: Local OS Authentication

SNMP OFF

ListenerParameter File /opt/grid/app/11.2.0/grid/network/admin/listener.ora

ListenerLog File /opt/grid/app/11.2.0/grid/log/diag/tnslsnr/db2/listener_scan1/alert/log.xml

ListeningEndpoints Summary...

(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))

(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.110)(PORT=1525)))

Thelistener supports no services

Thecommand completed successfully


修改後,兩節點會話數查詢:

INST_ID COUNT(USERNAME)

-------------------------

2 343

1 1201

----------此時查看第二節點的會話數已經有明顯的上升。


查看scanip狀態

-bash-3.00$lsnrctl status listener_scan1

LSNRCTLfor Solaris: Version 11.2.0.2.0 - Production on 31-OCT-2013 16:09:09

Copyright(c) 1991, 2010, Oracle. All rights reserved.

Connectingto (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))

STATUSof the LISTENER

------------------------

AliasLISTENER_SCAN1

VersionTNSLSNR for Solaris: Version 11.2.0.2.0 - Production

StartDate 16-OCT-2013 14:54:42

Uptime15 days 1 hr. 14 min. 27 sec

TraceLevel off

SecurityON: Local OS Authentication

SNMPOFF

ListenerParameter File /opt/grid/app/11.2.0/grid/network/admin/listener.ora

ListenerLog File/opt/grid/app/11.2.0/grid/log/diag/tnslsnr/db2/listener_scan1/alert/log.xml

ListeningEndpoints Summary...

(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))

(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.110)(PORT=1525)))

ServicesSummary...

Service "orcl"has 2 instance(s).

Instance "orcl1",status READY, has 1 handler(s) for this service...

Instance "orcl2",status READY, has 1 handler(s) for this service...

Thecommand completed successfully

-------------重新註冊監聽後,查看scan下的註冊服務,已經可以看到兩個節點的信息。

因第二節點監聽是新註冊,所以節點二建立的會話數仍然較少,隨着運行時間的增加,兩節點的會話數將逐漸趨於平衡。

問題原因:

可能是scan listener 之前發生過failover,實例重啓後由於oraclebug導致實例不被註冊上。


以下內容爲oracle官方文檔對此問題的描述:

Description

On migration of remote listener from one node to another, forexample during
node eviction (failover), the database does not re-register with the listener
as it does not receive any EOF.
As a result the database keeps listening on the same socket whenideally
it should re-register.

Rediscovery Notes:
Instance does not register serviceswhen scan fails over

Workaround
alter system set remote_listener ='';
alter system set remote_listener ='<remote server>:<listener port>';

Note:
This fix is just one piece of a largersolution needing other fixes
so that the clusterware / node monitorcan trigger a re-register as
required.

References

Bug:13066936 (This link will only work for PUBLISHED bugs)

Note:245840.1 Information on the sections in this article



源文檔 <https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=89966933432507&id=13066936.8&_afrWindowMode=0&_adf.ctrl-state=v9ps5dfil_4>


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章