首先介紹一下什麼是多路徑(multi-path)?先說說多路徑功能產生的背景,在多路徑功能出現之前,主機上的硬盤是直接掛接到一個總線(PCI)上,路徑是一對一的關係,也就是一條路徑指向一個硬盤或是存儲設備,這樣的一對一關係對於操作系統而言,處理相對簡單,但是缺少了可靠性。當出現了光纖通道網絡(Fibre Channle)也就是通常所說的SAN網絡時,或者由iSCSI組成的IPSAN環境時,由於主機和存儲之間通過光纖通道交換機或者多塊網卡及IP來連接時,構成了多對多關係的IO通道,也就是說一臺主機到一臺存儲設備之間存在多條路徑。當這些路徑同時生效時,I/O流量如何分配和調度,如何做IO流量的負載均衡,如何做主備。這種背景下多路徑軟件就產生了。
多路徑的主要功能就是和存儲設備一起配合實現如下功能:
1.故障的切換和恢復
2.IO流量的負載均衡
3.磁盤的虛擬化
在linux操作系統中,RedHat和Suse的2.6的內核中都自帶了免費的多路徑軟件包,ESX操作系統下也是自帶了免費的多路徑功能,而windows操作系統下,就需要購買一個叫MPIO的軟件lience才能使用multi-path多路徑功能。其他windows和ESX操作系統下的多路徑 功能都是圖形化界面比較簡單這裏就不多做介紹了,在這裏就是介紹一下linux環境下如何配置multi-path多路徑功能。
一、Linux下multipath相關工具和參數介紹:
1、device-mapper-multipath:即multipath-tools。主要提供multipathd和multipath等工具和 multipath.conf等配置文件。這些工具通過device mapper的ioctr的接口創建和配置multipath,設備創建的多路徑設備映射會在/dev /mapper中。
2、 device-mapper:主要包括兩大部分:內核部分和用戶部分。內核部分主要由device mapper核心(dm.ko)和一些target driver(md-multipath.ko)。核心完成設備的映射,而target根據映射關係和自身特點具體處理從mappered device 下來的i/o。同時,在覈心部分,提供了一個接口,用戶通過ioctr可和內核部分通信,以指導內核驅動的行爲,比如如何創建mappered device,這些divece的屬性等。linux device mapper的用戶空間部分主要包括device-mapper這個包。其中包括dmsetup工具和一些幫助創建和配置mappered device的庫。這些庫主要抽象,封裝了與ioctr通信的接口,以便方便創建和配置mappered device。multipath-tool的程序中就需要調用這些庫。
3、dm-multipath.ko和dm.ko:dm.ko是device mapper驅動。它是實現multipath的基礎。dm-multipath其實是dm的一個target驅動。
4、scsi_id: 包含在udev程序包中,可以在multipath.conf中配置該程序來獲取scsi設備的序號。通過序號,便可以判斷多個路徑對應了同一設備。這個是多路徑實現的關鍵。scsi_id是通過sg驅動,向設備發送EVPD page80或page83 的inquery命令來查詢scsi設備的標識。但一些設備並不支持EVPD 的inquery命令,所以他們無法被用來生成multipath設備。但可以改寫scsi_id,爲不能提供scsi設備標識的設備虛擬一個標識符,並輸出到標準輸出。multipath程序在創建multipath設備時,會調用scsi_id,從其標準輸出中獲得該設備的scsi id。在改寫時,需要修改scsi_id程序的返回值爲0。因爲在multipath程序中,會檢查該值來確定scsi id是否已經成功得到。
二、multipath在redhat 6.2中的基本配置:
1. 通過命令:lsmod |grep dm_multipath 檢查是否正常安裝成功。如果沒有輸出說明沒有安裝那麼通過yum功能安裝一下軟件包:yum –y install device-mapper device-mapper-multipath
接着通過命令:multipath –ll 查看多路徑狀態查看模塊是否加載成功
[root@liujing ~]# multipath –ll 查看多路徑狀態
Mar 10 19:18:28 | /etc/multipath.conf does not exist, blacklisting all devices.
Mar 10 19:18:28 | A sample multipath.conf file is located at
Mar 10 19:18:28 | /usr/share/doc/device-mapper-multipath-0.4.9/multipath.conf
Mar 10 19:18:28 | You can run /sbin/mpathconf to create or modify /etc/multipath.conf
Mar 10 19:18:28 | DM multipath kernel driver not loaded ----DM模塊沒有加載
如果模塊沒有加載成功請使用下列命初始化DM,或重啓系統
---Use the following commands to initialize and start DM for the first time:
# modprobe dm-multipath
# modprobe dm-round-robin
# service multipathd start
# multipath –v2
初始化完了之後再通過multipath -ll命令查看是否加載成功
[root@liujing ~]# multipath -ll
Mar 10 19:21:14 | /etc/multipath.conf does not exist, blacklisting all devices.
Mar 10 19:21:14 | A sample multipath.conf file is located at
Mar 10 19:21:14 | /usr/share/doc/device-mapper-multipath-0.4.9/multipath.conf
Mar 10 19:21:14 | You can run /sbin/mpathconf to create or modify /etc/multipath.conf
DM multipath kernel driver not loaded ----這個提示沒了說明DM模塊已加載成功。
從上面的提示可以看到,DM模塊是成功加載,但是/etc/下沒有multipath.conf 配置文件,下一步介紹如何配置multipath.conf 文件。
2. 配置multipath:
通過vi命令創建一個Multipath的配置文件路徑是/etc/multipath.conf ,在配置文件中添加multipath正常工作的最簡配置如下:
vi /etc/multipath.conf
blacklist {
devnode "^sda"
}
defaults {
user_friendly_names yes
path_grouping_policy multibus
failback immediate
no_path_retry fail
}
編輯完成後保存配置,同時通過命令:
# /etc/init.d/multipathd start #開啓mulitipath服務
如果出現無法開啓服務的情況,沒有提示OK的話如下:
[root@liujing mapper]# service multipathd start
Starting multipathd daemon: 沒有提示OK
重新開關一下服務就可以解決了。
[root@liujing mapper]# /etc/init.d/multipathd stop
Stopping multipathd daemon: [ OK ]
[root@localhost mapper]# /etc/init.d/multipathd start
Starting multipathd daemon: [ OK ] -----提示OK 正常開啓服務
通過命令查看:
[root@liujing mapper]# multipath -ll
mpatha (360a9800064665072443469563477396c) dm-0 NETAPP,LUN ----創建了一個lun
size=3.5G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=4 status=active
|- 1:0:0:0 sdb 8:16 active ready running ----多路徑下的兩個盤符sdb和sde.
`- 2:0:0:0 sde 8:64 active ready running
目錄/dev/mapper/ 下多了兩個文件夾mpatha 和mpathap1。
[root@liujing mapper]# cd /dev/mapper/
[root@liujing mapper]# ls
control mpatha mpathap1
同時fdisk –l的命令下也多了兩個設備標識:
沒有配置多路徑時:
[root@liujing~]# fdisk -l
Disk /dev/sda: 146.8 GB, 146815733760 bytes
255 heads, 63 sectors/track, 17849 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000a6cdd
Device Boot Start End Blocks Id System
/dev/sda1 * 1 26 204800 83 Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2 26 287 2097152 82 Linux swap / Solaris
Partition 2 does not end on cylinder boundary.
/dev/sda3 287 17850 141071360 83 Linux
Disk /dev/sdb: 3774 MB, 3774873600 bytes
117 heads, 62 sectors/track, 1016 cylinders
Units = cylinders of 7254 * 512 = 3714048 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0xac956c3a
Device Boot Start End Blocks Id System
/dev/sdb1 1 1016 3685001 83 Linux
Partition 1 does not start on physical sector boundary.
Disk /dev/sde: 3774 MB, 3774873600 bytes
117 heads, 62 sectors/track, 1016 cylinders
Units = cylinders of 7254 * 512 = 3714048 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0xac956c3a
Device Boot Start End Blocks Id System
/dev/sde1 1 1016 3685001 83 Linux
Partition 1 does not start on physical sector boundary.
兩個CAN網卡獲取到同一盤符:
/dev/sde和/dev/sdb.
配置後多了/dev/mapper/mpatha和/dev/mapper/mpathap1:
[root@localhost mapper]# fdisk -l
Disk /dev/sda: 146.8 GB, 146815733760 bytes
255 heads, 63 sectors/track, 17849 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000a6cdd
Device Boot Start End Blocks Id System
/dev/sda1 * 1 26 204800 83 Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2 26 287 2097152 82 Linux swap / Solaris
Partition 2 does not end on cylinder boundary.
/dev/sda3 287 17850 141071360 83 Linux
Disk /dev/sdb: 3774 MB, 3774873600 bytes
117 heads, 62 sectors/track, 1016 cylinders
Units = cylinders of 7254 * 512 = 3714048 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0xac956c3a
Device Boot Start End Blocks Id System
/dev/sdb1 1 1016 3685001 83 Linux
Partition 1 does not start on physical sector boundary.
Disk /dev/sde: 3774 MB, 3774873600 bytes
117 heads, 62 sectors/track, 1016 cylinders
Units = cylinders of 7254 * 512 = 3714048 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0xac956c3a
Device Boot Start End Blocks Id System
/dev/sde1 1 1016 3685001 83 Linux
Partition 1 does not start on physical sector boundary.
Disk /dev/mapper/mpatha: 3774 MB, 3774873600 bytes
117 heads, 62 sectors/track, 1016 cylinders
Units = cylinders of 7254 * 512 = 3714048 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0xac956c3a
Device Boot Start End Blocks Id System
/dev/mapper/mpathap1 1 1016 3685001 83 Linux
Partition 1 does not start on physical sector boundary.
Disk /dev/mapper/mpathap1: 3773 MB, 3773441024 bytes
255 heads, 63 sectors/track, 458 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Alignment offset: 1024 bytes
Disk identifier: 0x00000000
Disk /dev/mapper/mpathap1 doesn't contain a valid partition table
# multipath -F #刪除現有路徑 兩個新的路徑就會被刪除
# multipath -v2 #格式化路徑 格式化後又出現
3. multipath磁盤的基本操作
要對多路徑軟件生成的磁盤進行操作直接操作/dev/mapper/目錄下的磁盤就行.
在對多路徑軟件生成的磁盤進行分區之前最好運行一下pvcreate命令:
# pvcreate /dev/mapper/mpatha
# fdisk /dev/mapper/mpatha 分區時用這個目錄/dev/mapper/mpatha
用fdisk對多路徑軟件生成的磁盤進行分區保存時會有一個報錯,此報錯不用理會.
# ls -l /dev/mapper/
[root@liujing mnt]# ls -l /dev/mapper/
total 0
crw-rw----. 1 root root 10, 58 Mar 10 19:10 control
lrwxrwxrwx. 1 root root 7 Mar 10 20:28 mpatha -> ../dm-0
lrwxrwxrwx. 1 root root 7 Mar 10 20:33 mpathap1 -> ../dm-1
的mpathap1就是我們對multipath磁盤進行的分區
# mkfs.ext4 /dev/mapper/mpathap1 #對mpath1p1分區格式化成ext4文件系統
# mount /dev/mapper/mpathap1 /mnt/ #掛載mpathap1分區
格式化和掛載時用/dev/mapper/mpathap1
4. 分區磁盤:
上面有提到分區時用目錄/dev/mapper/mpatha
[root@liujing~]# fdisk /dev/mapper/mpatha
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0xac956c3a.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
switch off the mode (command 'c') and change display units to
sectors (command 'u').
Command (m for help): n------------------------新建分區
Command action
e extended
p primary partition (1-4)
p-----------------------------主分區
Partition number (1-4): 1
First cylinder (1-1016, default 1):
Using default value 1
Last cylinder, +cylinders or +size{K,M,G} (1-1016, default 1016):
Using default value 1016
Command (m for help): w ---------------------寫入列表相當於保存
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
注:如果同一臺設備的兩個node掛同樣的盤符,另一個盤符還需要再次寫入w就行。不需要n了。
5. 格式化:
[root@liujing ~]# mkfs.ext4 /dev/mapper/mpathap1
mke2fs 1.41.12 (17-May-2010)
/dev/sdd1 alignment is offset by 1024 bytes.
This may result in very poor performance, (re)-partitioning suggested.
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=1 blocks, Stripe width=16 blocks
230608 inodes, 921250 blocks
46062 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=943718400
29 block groups
32768 blocks per group, 32768 fragments per group
7952 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 33 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
6. 掛載 /dev/mapper/mpathap1 到 /mnt
[root@liujing ~]# mount /dev/mapper/mpathap1 /mnt
三、multipath的高級配置之前的配置都是用multipath的默認配置來完成multipath,比如映射設備的名稱,multipath負載均衡的方法都是默認設置。那有沒有按照我們自己定義的方法來配置multipath呢,答案是OK。
1、multipath.conf文件的配置
接下來的工作就是要編輯/etc/multipath.conf的配置文件
multipath.conf主要包括blacklist、multipaths、devices三部份的配置
blacklist配置
blacklist {
devnode "^sda"
}
Multipaths部分配置multipaths和devices兩部份的配置。
multipaths {
multipath {
wwid **************** #此值multipath -v3可以看到
alias iscsi-dm0 #映射後的別名,可以隨便取
path_grouping_policy multibus #路徑組策略
path_checker tur #決定路徑狀態的方法
path_selector "round-robin 0" #選擇那條路徑進行下一個IO操作的方法
}
}
Devices部分配置
devices {
device {
vendor "iSCSI-Enterprise" #廠商名稱
product "Virtual disk" #產品型號
path_grouping_policy multibus #默認的路徑組策略
getuid_callout "/sbin/scsi_id -g -u -s /block/%n" #獲得唯一設備號使用的默認程序
prio_callout "/sbin/acs_prio_alua %d" #獲取有限級數值使用的默認程序
path_checker readsector0 #決定路徑狀態的方法
path_selector "round-robin 0" #選擇那條路徑進行下一個IO操作的方法
failback immediate #故障恢復的模式
no_path_retry queue #在disable queue之前系統嘗試使用失效路徑的次數的數值
rr_min_io 100 #在當前的用戶組中,在切換到另外一條路徑之前的IO請求的數目
}
}
下面是相關參數的標準文檔的介紹:
Attribute | Description | |||||||||
wwid | Specifies the WWID of the multipath device to which the multipath attributes apply. This parameter is mandatory for this section of themultipath.conf file. | |||||||||
alias | Specifies the symbolic name for the multipath device to which themultipath attributes apply. If you are using user_friendly_names, do not set this value tompathn; this may conflict with an automatically assigned user friendly name and give you incorrect device node names. | |||||||||
path_grouping_policy |
| |||||||||
path_selector |
| |||||||||
failback |
| |||||||||
prio |
| |||||||||
no_path_retry |
| |||||||||
rr_min_io | Specifies the number of I/O requests to route to a path before switching to the next path in the current path group. This setting is only for systems running kernels older that 2.6.31. Newer systems should userr_min_io_rq. The default value is 1000. | |||||||||
rr_min_io_rq | Specifies the number of I/O requests to route to a path before switching to the next path in the current path group, using request-based device-mapper-multipath. This setting should be used on systems running current kernels. On systems running kernels older than 2.6.31, use rr_min_io. The default value is 1. | |||||||||
rr_weight | If set to priorities, then instead of sending rr_min_io requests to a path before callingpath_selector to choose the next path, the number of requests to send is determined byrr_min_io times the path's priority, as determined by the prio function. If set touniform, all path weights are equal. | |||||||||
flush_on_last_del | If set to yes, then multipath will disable queueing when the last path to a device has been deleted. |
在我本地的一個完整的高級配置如下:
[root@liujing ~]# vi /etc/multipath.conf
blacklist {
devnode "^sda"
}
multipaths {
multipath {
wwid 360a98000646650724434697454546156
alias mpathb_fcoe
path_grouping_policy multibus
#path_checker "directio"
prio "random"
path_selector "round-robin 0"
}
}
devices {
device {
vendor "NETAPP"
product "LUN"
getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
#path_checker "directio"
#path_selector "round-robin 0"
failback immediate
no_path_retry fail
}
}
其中 wwid,vendor,product, getuid_callout這些參數可以通過:multipath -v3命令來獲取。如果在/etc/multipath.conf中有設定各wwid 別名,別名會覆蓋此設定。