How To Setup Partitioned Linux Block Devices Using UDEV (Non-ASMLIB) And Assign Them To ASM

昨天同事遇到一個 2節點，基於AIX 7.1的的ASM ocr訪問超時的問題，Node2無法正常訪問，檢查Node2的alert_asm.log日誌如下：

Reference ：ASM diskgroup dismount with "Waited 15 secs for write IO to PST" (Doc ID 1581684.1)

Thu Aug 21 17:24:06 2014

WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.

WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.

WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.

WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.

WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.

WARNING: Waited 15 secs for write IO to PST disk 1 in group 3.

WARNING: Waited 15 secs for write IO to PST disk 2 in group 3.

WARNING: Waited 15 secs for write IO to PST disk 3 in group 3.

WARNING: Waited 15 secs for write IO to PST disk 4 in group 3.

WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.

WARNING: Waited 15 secs for write IO to PST disk 1 in group 3.

WARNING: Waited 15 secs for write IO to PST disk 2 in group 3.

WARNING: Waited 15 secs for write IO to PST disk 3 in group 3.

WARNING: Waited 15 secs for write IO to PST disk 4 in group 3.

Thu Aug 21 17:24:06 2014

NOTE: process _b000_+asm1 (24903780) initiating offline of disk 0.2095165706 (GRID_0000) with mask 0x7e in group 3

NOTE: process _b000_+asm1 (24903780) initiating offline of disk 1.2095165707 (GRID_0001) with mask 0x7e in group 3

NOTE: process _b000_+asm1 (24903780) initiating offline of disk 2.2095165708 (GRID_0002) with mask 0x7e in group 3

NOTE: process _b000_+asm1 (24903780) initiating offline of disk 3.2095165709 (GRID_0003) with mask 0x7e in group 3

NOTE: process _b000_+asm1 (24903780) initiating offline of disk 4.2095165710 (GRID_0004) with mask 0x7e in group 3

NOTE: checking PST: grp = 3

GMON checking disk modes for group 3 at 10 for pid 35, osid 24903780

ERROR: no read quorum in group: required 3, found 0 disks

NOTE: checking PST for grp 3 done.
....
NOTE: initiating PST update: grp = 3, dsk = 0/0x7ce1b10a, mask = 0x6a, op = clear

crs.log

2014-08-21 20:36:04.495: [  OCRRAW][9264]proprior: Retrying buffer read from another mirror for disk group [+GRID] for block at offset [6909952]

2014-08-21 20:36:04.495: [  OCRASM][9264]proprasmres: Total 0 mirrors detected

2014-08-21 20:36:04.495: [  OCRASM][9264]proprasmres: Only 1 mirror found in this disk group.

2014-08-21 20:36:04.495: [  OCRASM][9264]proprasmres: Need to invoke checkdg. Mirror #0 has an invalid buffer.

2014-08-21 20:36:04.595: [  OCRASM][9264]proprasmres: kgfoControl returned error [8]

[OCRASM][9264]SLOS : SLOS: cat=8, opn=kgfoCkDG01, dep=15032, loc=kgfokge
2014-08-21 20:36:04.595: [  OCRASM][9264]ASM Error Stack : ORA-27091: unable to queue I/O

ORA-15078: ASM diskgroup was forcibly dismounted

ORA-06512: at line 4
2014-08-21 20:36:04.595: [  OCRRAW][9264]proprior: ASM re silver returned [22]

2014-08-21 20:36:04.597: [  OCRRAW][9264]fkce:2: problem [22] reading the tnode 6909952

基於2個日誌，可以發先是ocr所在的diskgroup 訪問超時，MOS 中提及到AIX有個rw_timeout參數時間和asm一個IO超時時間對存儲訪問的請求時間不一致所致:

Symptoms 
Normal or high redundancy diskgroup is dismounted with these WARNING messages.
//ASM alert.log
Mon Jul 01 09:10:47 2013

WARNING: Waited 15 secs for write IO to PST disk 1 in group 6.

WARNING: Waited 15 secs for write IO to PST disk 4 in group 6.

WARNING: Waited 15 secs for write IO to PST disk 1 in group 6.

WARNING: Waited 15 secs for write IO to PST disk 4 in group 6.

....

GMON dismounting group 6 at 72 for pid 44, osid 8782162 
Cause
Generally this kind messages comes in ASM alertlog file on below situations,

Delayed ASM PST heart beats on ASM disks in normal or high redundancy diskgroup,

thus the ASM instance dismount the diskgroup.By default, it is 15 seconds.

By the way the heart beat delays are sort of ignored for external redundancy diskgroup.

ASM instance stop issuing more PST heart beat until it succeeds PST revalidation,

but the heart beat delays do not dismount external redundancy diskgroup directly.
The ASM disk could go into unresponsiveness, normally in the following scenarios:
+    Some of the paths of the physical paths of the multipath device are offline or lost

+    During path 'failover' in a multipath set up

+    Server load, or any sort of storage/multipath/OS maintenance

The Doc ID 10109915.8  briefs about Bug 10109915(this fix introduce this underscore parameter). And the issue is with no OS/Storage tunable timeout mechanism in a case of a Hung NFS Server/Filer. And then  _asm_hbeatiowait  helps in setting the time out.
Solution
1]    Check with OS and Storage admin that there is disk unresponsiveness.
2]    Possibly keep the disk responsiveness to below 15 seconds. 
This will depend on various factors like

+    Operating System

+    Presence of Multipath ( and Multipath Type )

+    Any kernel parameter
So you need to find out, what is the 'maximum' possible disk unresponsiveness for your set up.

For example, on AIX  rw_timeout  setting affects this and defaults to 30 seconds.

Another example is Linux with native multipathing. In such set up, number of physical paths and  polling_interval value in multipath.conf file, will dictate this maximum disk unresponsiveness.

So for your set up ( combination of OS / multipath / storage ), you need to find out this.
3]If you can not keep the disk unresponsiveness to below 15 seconds, then the below parameter can be set in the ASM instance ( on all the Nodes of RAC ):
 _asm_hbeatiowait

Set it to 200.
Run below in asm instance to set desired value for _asm_hbeatiowait
alter system set "_asm_hbeatiowait"=<value> scope=spfile sid='*';

on AIX rw_timeout setting affects this and defaults to 30 seconds.

AIX中對存儲io的請求超時時間（rw_timeout）默認爲30秒

Delayed ASM PST（Partner and Status Table） heart beats on ASM disks in normal or high redundancy diskgroup,thus the ASM instance dismount the diskgroup.By default, it is 15 seconds.By the way the heart beat delays are sort of ignored for external redundancy diskgroup.

默認ASM 有個Delayed ASM PST heart beats的時間爲15秒，超過15秒會被dismount（crs日誌可知），在dg的冗餘配置成external redundanccy下可避免被dismount（normal/high redundancy 會發生這種情況，所以很不幸正好中招。）

最終通過將所有節點的_asm_hbeatiowait，重啓所有節點服務生效。

alter system set "_asm_hbeatiowait"=（大於30秒） scope=spfile sid='*';