深度分析LINUX環境下如何配置multipath

首先介紹一下什麼是多路徑(multi-path)?先說說多路徑功能產生的背景,在多路徑功能出現之前,主機上的硬盤是直接掛接到一個總線(PCI)上,路徑是一對一的關係,也就是一條路徑指向一個硬盤或是存儲設備,這樣的一對一關係對於操作系統而言,處理相對簡單,但是缺少了可靠性。當出現了光纖通道網絡(Fibre Channle)也就是通常所說的SAN網絡時,或者由iSCSI組成的IPSAN環境時,由於主機和存儲之間通過光纖通道交換機或者多塊網卡及IP來連接時,構成了多對多關係的IO通道,也就是說一臺主機到一臺存儲設備之間存在多條路徑。當這些路徑同時生效時,I/O流量如何分配和調度,如何做IO流量的負載均衡,如何做主備。這種背景下多路徑軟件就產生了。

多路徑的主要功能就是和存儲設備一起配合實現如下功能:
1.故障的切換和恢復
2.IO流量的負載均衡
3.磁盤的虛擬化

在linux操作系統中,RedHat和Suse的2.6的內核中都自帶了免費的多路徑軟件包,ESX操作系統下也是自帶了免費的多路徑功能,而windows操作系統下,就需要購買一個叫MPIO的軟件lience才能使用multi-path多路徑功能。其他windows和ESX操作系統下的多路徑 功能都是圖形化界面比較簡單這裏就不多做介紹了,在這裏就是介紹一下linux環境下如何配置multi-path多路徑功能。

一、Linux下multipath相關工具和參數介紹:

1、device-mapper-multipath:即multipath-tools。主要提供multipathd和multipath等工具和 multipath.conf等配置文件。這些工具通過device mapper的ioctr的接口創建和配置multipath,設備創建的多路徑設備映射會在/dev /mapper中。

2、 device-mapper:主要包括兩大部分:內核部分和用戶部分。內核部分主要由device mapper核心(dm.ko)和一些target driver(md-multipath.ko)。核心完成設備的映射,而target根據映射關係和自身特點具體處理從mappered device 下來的i/o。同時,在覈心部分,提供了一個接口,用戶通過ioctr可和內核部分通信,以指導內核驅動的行爲,比如如何創建mappered device,這些divece的屬性等。linux device mapper的用戶空間部分主要包括device-mapper這個包。其中包括dmsetup工具和一些幫助創建和配置mappered device的庫。這些庫主要抽象,封裝了與ioctr通信的接口,以便方便創建和配置mappered device。multipath-tool的程序中就需要調用這些庫。

3、dm-multipath.ko和dm.ko:dm.ko是device mapper驅動。它是實現multipath的基礎。dm-multipath其實是dm的一個target驅動。

4、scsi_id: 包含在udev程序包中,可以在multipath.conf中配置該程序來獲取scsi設備的序號。通過序號,便可以判斷多個路徑對應了同一設備。這個是多路徑實現的關鍵。scsi_id是通過sg驅動,向設備發送EVPD page80或page83 的inquery命令來查詢scsi設備的標識。但一些設備並不支持EVPD 的inquery命令,所以他們無法被用來生成multipath設備。但可以改寫scsi_id,爲不能提供scsi設備標識的設備虛擬一個標識符,並輸出到標準輸出。multipath程序在創建multipath設備時,會調用scsi_id,從其標準輸出中獲得該設備的scsi id。在改寫時,需要修改scsi_id程序的返回值爲0。因爲在multipath程序中,會檢查該值來確定scsi id是否已經成功得到。

二、multipath在redhat 6.2中的基本配置:

1. 通過命令:lsmod |grep dm_multipath  檢查是否正常安裝成功。如果沒有輸出說明沒有安裝那麼通過yum功能安裝一下軟件包:yum –y install device-mapper device-mapper-multipath

接着通過命令:multipath –ll 查看多路徑狀態查看模塊是否加載成功

[root@liujing ~]#  multipath –ll   查看多路徑狀態

Mar 10 19:18:28 | /etc/multipath.conf does not exist, blacklisting all devices.

Mar 10 19:18:28 | A sample multipath.conf file is located at

Mar 10 19:18:28 | /usr/share/doc/device-mapper-multipath-0.4.9/multipath.conf

Mar 10 19:18:28 | You can run /sbin/mpathconf to create or modify /etc/multipath.conf

Mar 10 19:18:28 | DM multipath kernel driver not loaded    ----DM模塊沒有加載

如果模塊沒有加載成功請使用下列命初始化DM,或重啓系統
---Use the following commands to initialize and start DM for the first time:
# modprobe dm-multipath
# modprobe dm-round-robin
# service multipathd start
# multipath –v2

初始化完了之後再通過multipath -ll命令查看是否加載成功

[root@liujing ~]#  multipath -ll

Mar 10 19:21:14 | /etc/multipath.conf does not exist, blacklisting all devices.

Mar 10 19:21:14 | A sample multipath.conf file is located at

Mar 10 19:21:14 | /usr/share/doc/device-mapper-multipath-0.4.9/multipath.conf

Mar 10 19:21:14 | You can run /sbin/mpathconf to create or modify /etc/multipath.conf

DM multipath kernel driver not loaded    ----這個提示沒了說明DM模塊已加載成功。

從上面的提示可以看到,DM模塊是成功加載,但是/etc/下沒有multipath.conf 配置文件,下一步介紹如何配置multipath.conf 文件。

2. 配置multipath:

通過vi命令創建一個Multipath的配置文件路徑是/etc/multipath.conf ,在配置文件中添加multipath正常工作的最簡配置如下:

vi /etc/multipath.conf

blacklist {

devnode "^sda"

}

defaults {

user_friendly_names yes

path_grouping_policy multibus

failback immediate

no_path_retry fail

}

編輯完成後保存配置,同時通過命令:

# /etc/init.d/multipathd start #開啓mulitipath服務

如果出現無法開啓服務的情況,沒有提示OK的話如下:

[root@liujing mapper]# service multipathd start

Starting multipathd daemon:     沒有提示OK

重新開關一下服務就可以解決了。

[root@liujing mapper]# /etc/init.d/multipathd stop

Stopping multipathd daemon:                                [  OK  ]

[root@localhost mapper]# /etc/init.d/multipathd start

Starting multipathd daemon:                                [  OK  ]  -----提示OK 正常開啓服務

通過命令查看:

[root@liujing mapper]# multipath -ll

mpatha (360a9800064665072443469563477396c) dm-0 NETAPP,LUN    ----創建了一個lun

size=3.5G features='0' hwhandler='0' wp=rw

`-+- policy='round-robin 0' prio=4 status=active

|- 1:0:0:0 sdb 8:16 active ready  running   ----多路徑下的兩個盤符sdb和sde.

`- 2:0:0:0 sde 8:64 active ready  running

目錄/dev/mapper/   下多了兩個文件夾mpatha 和mpathap1。

[root@liujing mapper]# cd /dev/mapper/

[root@liujing mapper]# ls

control  mpatha  mpathap1

同時fdisk –l的命令下也多了兩個設備標識:

沒有配置多路徑時:

[root@liujing~]# fdisk -l

Disk /dev/sda: 146.8 GB, 146815733760 bytes

255 heads, 63 sectors/track, 17849 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x000a6cdd

Device Boot      Start         End      Blocks   Id  System

/dev/sda1   *           1          26      204800   83  Linux

Partition 1 does not end on cylinder boundary.

/dev/sda2              26         287     2097152   82  Linux swap / Solaris

Partition 2 does not end on cylinder boundary.

/dev/sda3             287       17850   141071360   83  Linux

Disk /dev/sdb: 3774 MB, 3774873600 bytes

117 heads, 62 sectors/track, 1016 cylinders

Units = cylinders of 7254 * 512 = 3714048 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 4096 bytes / 65536 bytes

Disk identifier: 0xac956c3a

Device Boot      Start         End      Blocks   Id  System

/dev/sdb1               1        1016     3685001   83  Linux

Partition 1 does not start on physical sector boundary.

Disk /dev/sde: 3774 MB, 3774873600 bytes

117 heads, 62 sectors/track, 1016 cylinders

Units = cylinders of 7254 * 512 = 3714048 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 4096 bytes / 65536 bytes

Disk identifier: 0xac956c3a

Device Boot      Start         End      Blocks   Id  System

/dev/sde1               1        1016     3685001   83  Linux

Partition 1 does not start on physical sector boundary.

兩個CAN網卡獲取到同一盤符:

/dev/sde和/dev/sdb.

配置後多了/dev/mapper/mpatha和/dev/mapper/mpathap1:

[root@localhost mapper]# fdisk -l

Disk /dev/sda: 146.8 GB, 146815733760 bytes

255 heads, 63 sectors/track, 17849 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x000a6cdd

Device Boot      Start         End      Blocks   Id  System

/dev/sda1   *           1          26      204800   83  Linux

Partition 1 does not end on cylinder boundary.

/dev/sda2              26         287     2097152   82  Linux swap / Solaris

Partition 2 does not end on cylinder boundary.

/dev/sda3             287       17850   141071360   83  Linux

Disk /dev/sdb: 3774 MB, 3774873600 bytes

117 heads, 62 sectors/track, 1016 cylinders

Units = cylinders of 7254 * 512 = 3714048 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 4096 bytes / 65536 bytes

Disk identifier: 0xac956c3a

Device Boot      Start         End      Blocks   Id  System

/dev/sdb1               1        1016     3685001   83  Linux

Partition 1 does not start on physical sector boundary.

Disk /dev/sde: 3774 MB, 3774873600 bytes

117 heads, 62 sectors/track, 1016 cylinders

Units = cylinders of 7254 * 512 = 3714048 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 4096 bytes / 65536 bytes

Disk identifier: 0xac956c3a

Device Boot      Start         End      Blocks   Id  System

/dev/sde1               1        1016     3685001   83  Linux

Partition 1 does not start on physical sector boundary.

Disk /dev/mapper/mpatha: 3774 MB, 3774873600 bytes

117 heads, 62 sectors/track, 1016 cylinders

Units = cylinders of 7254 * 512 = 3714048 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 4096 bytes / 65536 bytes

Disk identifier: 0xac956c3a

Device Boot      Start         End      Blocks   Id  System

/dev/mapper/mpathap1               1        1016     3685001   83  Linux

Partition 1 does not start on physical sector boundary.

Disk /dev/mapper/mpathap1: 3773 MB, 3773441024 bytes

255 heads, 63 sectors/track, 458 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 4096 bytes / 65536 bytes

Alignment offset: 1024 bytes

Disk identifier: 0x00000000

Disk /dev/mapper/mpathap1 doesn't contain a valid partition table

# multipath -F #刪除現有路徑  兩個新的路徑就會被刪除
# multipath -v2 #格式化路徑   格式化後又出現

3. multipath磁盤的基本操作

要對多路徑軟件生成的磁盤進行操作直接操作/dev/mapper/目錄下的磁盤就行.

在對多路徑軟件生成的磁盤進行分區之前最好運行一下pvcreate命令:

# pvcreate /dev/mapper/mpatha

# fdisk /dev/mapper/mpatha  分區時用這個目錄/dev/mapper/mpatha

用fdisk對多路徑軟件生成的磁盤進行分區保存時會有一個報錯,此報錯不用理會.

# ls -l /dev/mapper/

[root@liujing mnt]#  ls -l /dev/mapper/

total 0

crw-rw----. 1 root root 10, 58 Mar 10 19:10 control

lrwxrwxrwx. 1 root root      7 Mar 10 20:28 mpatha -> ../dm-0

lrwxrwxrwx. 1 root root      7 Mar 10 20:33 mpathap1 -> ../dm-1

的mpathap1就是我們對multipath磁盤進行的分區

# mkfs.ext4 /dev/mapper/mpathap1 #對mpath1p1分區格式化成ext4文件系統

# mount /dev/mapper/mpathap1 /mnt/ #掛載mpathap1分區

格式化和掛載時用/dev/mapper/mpathap1 

4. 分區磁盤:

上面有提到分區時用目錄/dev/mapper/mpatha

[root@liujing~]# fdisk /dev/mapper/mpatha

Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel

Building a new DOS disklabel with disk identifier 0xac956c3a.

Changes will remain in memory only, until you decide to write them.

After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

WARNING: DOS-compatible mode is deprecated. It's strongly recommended to

switch off the mode (command 'c') and change display units to

sectors (command 'u').

Command (m for help): n------------------------新建分區

Command action

e   extended

p   primary partition (1-4)

p-----------------------------主分區

Partition number (1-4): 1

First cylinder (1-1016, default 1):

Using default value 1

Last cylinder, +cylinders or +size{K,M,G} (1-1016, default 1016):

Using default value 1016

Command (m for help): w ---------------------寫入列表相當於保存

The partition table has been altered!

Calling ioctl() to re-read partition table.

Syncing disks.

注:如果同一臺設備的兩個node掛同樣的盤符,另一個盤符還需要再次寫入w就行。不需要n了。

5. 格式化:

[root@liujing ~]# mkfs.ext4 /dev/mapper/mpathap1

mke2fs 1.41.12 (17-May-2010)

/dev/sdd1 alignment is offset by 1024 bytes.

This may result in very poor performance, (re)-partitioning suggested.

Filesystem label=

OS type: Linux

Block size=4096 (log=2)

Fragment size=4096 (log=2)

Stride=1 blocks, Stripe width=16 blocks

230608 inodes, 921250 blocks

46062 blocks (5.00%) reserved for the super user

First data block=0

Maximum filesystem blocks=943718400

29 block groups

32768 blocks per group, 32768 fragments per group

7952 inodes per group

Superblock backups stored on blocks:

32768, 98304, 163840, 229376, 294912, 819200, 884736

Writing inode tables: done

Creating journal (16384 blocks): done

Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 33 mounts or

180 days, whichever comes first.  Use tune2fs -c or -i to override.

6. 掛載 /dev/mapper/mpathap1 到 /mnt

[root@liujing ~]# mount  /dev/mapper/mpathap1  /mnt

三、multipath的高級配置之前的配置都是用multipath的默認配置來完成multipath,比如映射設備的名稱,multipath負載均衡的方法都是默認設置。那有沒有按照我們自己定義的方法來配置multipath呢,答案是OK。

1、multipath.conf文件的配置

接下來的工作就是要編輯/etc/multipath.conf的配置文件

multipath.conf主要包括blacklist、multipaths、devices三部份的配置

blacklist配置

blacklist {

devnode "^sda"

}

Multipaths部分配置multipaths和devices兩部份的配置。

multipaths {

multipath {

wwid **************** #此值multipath -v3可以看到

alias iscsi-dm0 #映射後的別名,可以隨便取

path_grouping_policy multibus #路徑組策略

path_checker tur #決定路徑狀態的方法

path_selector "round-robin 0" #選擇那條路徑進行下一個IO操作的方法

}

}

Devices部分配置

devices {

device {

vendor "iSCSI-Enterprise" #廠商名稱

product "Virtual disk" #產品型號

path_grouping_policy multibus #默認的路徑組策略

getuid_callout "/sbin/scsi_id -g -u -s /block/%n" #獲得唯一設備號使用的默認程序

prio_callout      "/sbin/acs_prio_alua %d" #獲取有限級數值使用的默認程序

path_checker readsector0 #決定路徑狀態的方法

path_selector "round-robin 0" #選擇那條路徑進行下一個IO操作的方法

failback        immediate #故障恢復的模式

   no_path_retry      queue #在disable queue之前系統嘗試使用失效路徑的次數的數值

  rr_min_io       100 #在當前的用戶組中,在切換到另外一條路徑之前的IO請求的數目

}

}

下面是相關參數的標準文檔的介紹:

Attribute

Description

wwid

Specifies the WWID of the multipath device to which the multipath attributes apply. This parameter is mandatory for this section of themultipath.conf file.

alias

Specifies the symbolic name for the multipath device to which themultipath attributes apply. If you are using user_friendly_names, do not set this value tompathn; this may conflict with an automatically assigned user friendly name and give you incorrect device node names.

path_grouping_policy

Specifies the default path grouping policy to apply to unspecified multipaths. Possible values include:

failover = 1 path per priority group

multibus = all valid paths in 1 priority group

group_by_serial = 1 priority group per detected serial number

group_by_prio = 1 priority group per path priority value

group_by_node_name = 1 priority group per target node name

path_selector

Specifies the default algorithm to use in determining what path to use for the next I/O operation. Possible values include:

round-robin 0: Loop through every path in the path group, sending the same amount of I/O to each.

queue-length 0: Send the next bunch of I/O down the path with the least number of outstanding I/O requests.

service-time 0: Send the next bunch of I/O down the path with the shortest estimated service time, which is determined by dividing the total size of the outstanding I/O to each path by its relative throughput.

failback

Manages path group failback.

A value of immediate specifies immediate failback to the highest priority path group that contains active paths.

A value of manual specifies that there should not be immediate failback but that failback can happen only with operator intervention.

A value of followover specifies that automatic failback should be performed when the first path of a path group becomes active. This keeps a node from automatically failing back when another node requested the failover.

A numeric value greater than zero specifies deferred failback, expressed in seconds.

prio

Specifies the default function to call to obtain a path priority value. For example, the ALUA bits in SPC-3 provide an exploitableprio value. Possible values include:

const: Set a priority of 1 to all paths.

emc: Generate the path priority for EMC arrays.

alua: Generate the path priority based on the SCSI-3 ALUA settings.

tpg_pref: Generate the path priority based on the SCSI-3 ALUA settings, using the preferred port bit.

ontap: Generate the path priority for NetApp arrays.

rdac: Generate the path priority for LSI/Engenio RDAC controller.

hp_sw: Generate the path priority for Compaq/HP controller in active/standby mode.

hds: Generate the path priority for Hitachi HDS Modular storage arrays.

no_path_retry

A numeric value for this attribute specifies the number of times the system should attempt to use a failed path before disabling queueing.

A value of fail indicates immediate failure, without queueing.

A value of queue indicates that queueing should not stop until the path is fixed.

rr_min_io

Specifies the number of I/O requests to route to a path before switching to the next path in the current path group. This setting is only for systems running kernels older that 2.6.31. Newer systems should userr_min_io_rq. The default value is 1000.

rr_min_io_rq

Specifies the number of I/O requests to route to a path before switching to the next path in the current path group, using request-based device-mapper-multipath. This setting should be used on systems running current kernels. On systems running kernels older than 2.6.31, use rr_min_io. The default value is 1.

rr_weight

If set to priorities, then instead of sending rr_min_io requests to a path before callingpath_selector to choose the next path, the number of requests to send is determined byrr_min_io times the path's priority, as determined by the prio function. If set touniform, all path weights are equal.

flush_on_last_del

If set to yes, then multipath will disable queueing when the last path to a device has been deleted.

在我本地的一個完整的高級配置如下:

[root@liujing ~]# vi /etc/multipath.conf

blacklist {

devnode "^sda"

}

multipaths {

multipath {

wwid       360a98000646650724434697454546156

alias      mpathb_fcoe

path_grouping_policy    multibus

#path_checker            "directio"

prio                    "random"

path_selector           "round-robin 0"

}

}

devices {

device {

vendor     "NETAPP"

product    "LUN"

getuid_callout       "/lib/udev/scsi_id --whitelisted --device=/dev/%n"

#path_checker    "directio"

#path_selector             "round-robin 0"

failback             immediate

no_path_retry fail

}

}

其中 wwid,vendor,product, getuid_callout這些參數可以通過:multipath -v3命令來獲取。如果在/etc/multipath.conf中有設定各wwid 別名,別名會覆蓋此設定。


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章