vote 仲裁盤offline,導致cssd進程異常,節點集羣資源無法使用

系統環境:

虛擬化平臺;華爲fusion sphere

操作系統平臺:RedHat

存儲:EMC unit-400

故障現象:應用部門反饋數據庫無法連接,提示監聽故障

故障分析:

1、登錄數據庫查看集羣資源狀態

crsctl status res -t

提示集羣資源異常,crs資源offline

2、lsblk卡住,無內容輸出

3、查看crsd.log,具體報錯信息如下圖所示:

2020-06-30 02:17:18.112: [    CRSD][2161772320] Logging level for Module: OCRASM  1

2020-06-30 02:17:18.112: [ CRSMAIN][2161772320] Checking the OCR device

2020-06-30 02:17:18.112: [ CRSMAIN][2161772320] Sync-up with OCR

2020-06-30 02:17:18.112: [ CRSMAIN][2161772320] Connecting to the CSS Daemon

2020-06-30 02:17:18.123: [ CSSCLNT][2155321088]clssnsquerymode: not connected to CSSD

2020-06-30 02:17:48.185: [ CSSCLNT][2155321088]clssnsquerymode: not connected to CSSD

2020-06-30 02:18:18.190: [ CSSCLNT][2155321088]clssnsquerymode: not connected to CSSD

2020-06-30 02:18:48.194: [ CSSCLNT][2155321088]clssnsquerymode: not connected to CSSD

2020-06-30 02:19:18.197: [ CSSCLNT][2155321088]clssnsquerymode: not connected to CSSD

2020-06-30 02:19:18.214: [ CSSCLNT][2161772320]clssscConnect: gipcWait failed with 16 (0x1a)

2020-06-30 02:19:18.214: [ CSSCLNT][2161772320]clsssInitNative: connect to (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_oracle-rac01_)) failed, rc 16

2020-06-30 02:19:18.218: [  CRSRTI][2161772320] CSS is not ready. Received status 3

2020-06-30 02:19:18.218: [    CRSD][2161772320] Created alert : (:CRSD00109:) :  Could not init the CSS context, error: 3

2020-06-30 02:19:18.218: [    CRSD][2161772320][PANIC] CRSD exiting: Could not init the CSS context, error: 3

2020-06-30 02:19:18.218: [    CRSD][2161772320] Done.

2020-06-30 02:21:18.504: [ CSSCLNT][387819296]clssscConnect: gipcWait failed with 16 (0x1a)

2020-06-30 02:21:18.504: [ CSSCLNT][387819296]clsssInitNative: connect to (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_oracle-rac01_)) failed, rc 16

2020-06-30 02:21:18.508: [  CRSRTI][387819296] CSS is not ready. Received status 3

2020-06-30 02:21:18.508: [ CRSMAIN][387819296] First attempt: init CSS context failed. Error = 3

[  clsdmt][381368064]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=oracle-rac01DBG_CRSD))

2020-06-30 02:21:18.511: [  clsdmt][381368064]PID for the Process [23043], connkey 1

2020-06-30 02:21:18.512: [  clsdmt][381368064]Creating PID [23043] file for home /u01/11.2.0/grid host oracle-rac01 bin crs to /u01/11.2.0/grid/crs/init/

2020-06-30 02:21:18.512: [  clsdmt][381368064]Writing PID [23043] to the file [/u01/11.2.0/grid/crs/init/oracle-rac01.pid]

Crsd.log提示CSS進程不能不連接,處於不可以狀態

4、查看css日誌,具體日誌信息報錯如下:

2020-06-30 02:04:04.814: [    CSSD][2121881344]clssscMonitorThreads clssnmvWorkerThread not scheduled for 196870 msecs

2020-06-30 02:04:04.814: [    CSSD][2121881344]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 811354450 msecs

2020-06-30 02:04:04.814: [    CSSD][2121881344]clssscMonitorThreads clssnmvWorkerThread not scheduled for 811354540 msecs

2020-06-30 02:04:04.942: [    CSSD][2095437568]clssnmSendingThread: sending status msg to all nodes

2020-06-30 02:04:04.942: [    CSSD][2095437568]clssnmSendingThread: sent 5 status msgs to all nodes

2020-06-30 02:04:07.813: [    CSSD][2101761792](:CSSNM00058:)clssnmvDiskCheck: No I/O completions for 200460 ms for voting file /dev/asm-diskf)

2020-06-30 02:04:07.813: [    CSSD][2101761792]clssnmCompleteGMReq: Completed request type 17 with status 1

2020-06-30 02:04:07.813: [    CSSD][2101761792]clssgmDoneQEle: re-queueing req 0x7fba7653d510 status 1

2020-06-30 02:04:07.813: [    CSSD][2101761792]clssnmvDiskAvailabilityChange: voting file /dev/asm-diskf now offline

2020-06-30 02:04:07.813: [    CSSD][2101761792](:CSSNM00018:)clssnmvDiskCheck: Aborting, 1 of 3 configured voting disks available, need 2

2020-06-30 02:04:07.813: [    CSSD][2101761792]###################################

2020-06-30 02:04:07.813: [    CSSD][2101761792]clssscExit: CSSD aborting from thread clssnmvDiskPingMonitorThread

2020-06-30 02:04:07.813: [    CSSD][2101761792]###################################

2020-06-30 02:04:07.813: [    CSSD][2101761792](:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally

Cssd.log日誌提示asm-diskf I/O等待超時,系統offline asm-diskf,導致css資源異常中斷

5、故障處理步驟

5.1 查看虛擬化和存儲相關聯磁盤,發現並沒有報錯日誌,仲裁盤應該是健康狀態。

5.2 嘗試重啓集羣資源,ocss進程無法關閉,crs集羣資源重啓失敗,懷疑係統管理進程處於無法通信狀態,操作系統進程異常

5.3 嘗試重啓操作系統,操作系統重啓完畢,集羣資源恢復正常,業務恢復

疑問:什麼原因導致vote盤offline還有待分析

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章