oracle 11G rac服務不能停止

1.問題:
節點二用crsctl stop crs -f停rac服務,無法停止,d.bin相關的9個進程都還存在
版本:oracle 11.2.0.4 for solaris

2.分析:
查看/abcapp/oragrid/11.2.0/log/abc208下的alert.log文件,日誌如下:
[/abcapp/oragrid/11.2.0/bin/scriptagent.bin(10605)]CRS-5818:Aborted command 'clean' for resource 'ora.oc4j'. Details at (:CRSAGF00
113:) {2:26009:18659} in /abcapp/oragrid/11.2.0/log/abc208/agent/crsd/scriptagent_oragrid/scriptagent_oragrid.log.
2017-08-30 23:28:10.192: 
[crsd(62374)]CRS-2757:Command 'Clean' timed out waiting for response from the resource 'ora.oc4j'. Details at (:CRSPE00111:) {2:2600
9:18659} in /abcapp/oragrid/11.2.0/log/abc208/crsd/crsd.log.
/abcapp/oragrid/11.2.0/log/abc208/crsd/crsd.log報錯如下:
2017-08-30 23:48:10.228: [UiServer][47]{2:26009:18672} Container [ Name: ORDER
        MESSAGE: 
        TextMessage[CRS-2680: Clean of 'ora.oc4j' on 'abc208' failed]
        MSGTYPE: 
        TextMessage[1]
        OBJID: 
        TextMessage[ora.oc4j]
        WAIT: 
        TextMessage[0]
]
2017-08-30 23:48:10.228: [   CRSPE][46]{2:26009:18672} Sequencer for [ora.oc4j 1 1] has completed with error: CRS-0216: Could not st
op resource 'ora.oc4j'.
2017-08-30 23:48:10.230: [UiServer][47]{2:26009:18673} Container [ Name: ORDER
        MESSAGE: 
        TextMessage[CRS-2503: Resource 'ora.oc4j' is in UNKNOWN state and must be stopped first]
        MSGTYPE: 
        TextMessage[1]
        OBJID: 
        TextMessage[ora.oc4j]
        WAIT: 
        TextMessage[0]
]

/abcapp/oragrid/11.2.0/log/abc208/agent/crsd/scriptagent_oragrid/scriptagent_oragrid.log如下:

2017-08-30 22:37:10.040: [ora.oc4j][46]{1:63945:12686} [check] Executing action script: /abcapp/oragrid/11.2.0/bin/oc4jctl[check]
2017-08-30 22:37:49.597: [    AGFW][9]{1:63945:12686} Agent received the message: AGENT_HB[Engine] ID 12293:21601515
2017-08-30 22:38:10.044: [   AGENT][58]{1:63945:12686} {1:63945:12686} Created alert : (:CRSAGF00113:) :  Aborting the command: chec
k for resource: ora.oc4j 1 1
2017-08-30 22:38:10.044: [ora.oc4j][58]{1:63945:12686} [check] Killing action script: check
2017-08-30 22:38:10.044: [    AGFW][58]{1:63945:12686} Command: check for resource: ora.oc4j 1 1 completed with status: TIMEDOUT
2017-08-30 22:38:10.072: [    AGFW][46]{1:63945:12686} Received unknown resource status code: 255
2017-08-30 22:38:49.600: [    AGFW][9]{1:63945:12686} Agent received the message: AGENT_HB[Engine] ID 12293:21601539
2017-08-30 22:39:10.047: [ora.oc4j][46]{1:63945:12686} [check] Executing action script: /abcapp/oragrid/11.2.0/bin/oc4jctl[check]
2017-08-30 22:39:49.603: [    AGFW][9]{1:63945:12686} Agent received the message: AGENT_HB[Engine] ID 12293:21601561
2017-08-30 22:40:10.049: [   AGENT][58]{1:63945:12686} {1:63945:12686} Created alert : (:CRSAGF00113:) :  Aborting the command: chec
k for resource: ora.oc4j 1 1

上面明顯爲oc4j服務停不下來阻塞了後面的服務引起,oc4j爲jvm的進程,理論上殺掉grid用戶下的java進程即可。
-bash-4.1$ kill -9 10789
-bash-4.1$ ps -ef |grep 10789
 oragrid 10789     1   0   May 29 ?         847:17 /abcapp/oragrid/11.2.0/jdk/bin/sparcv9/java -server -Xcheck:jni -Xms128M -Xmx
殺了很多遍,沒有反應。
說明問題是由java 進程僵死導致的。而檢查發現實例1上沒有跑oc4j服務,grid用戶下沒有對應java進程,所以,不會有這個問題。

3.解決:
節點二重啓OS,執行init 6,若執行後沒有反應的話,將crsd進程kill後,os就能重啓了。
啓動OS後能正常啓crs服務和數據庫實例,並啓動oc4j服務,crsctl start res ora.oc4j,最後節點一重啓crs服務非常順利。
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章