磁盤性能監控

一. 磁盤性能衡量指標

  • IOPS:Input/Output Operations per Second,即每秒能處理的I/O個數,用於表示塊存儲處理讀寫(輸出/輸入)的能力。
  • 吞吐量:吞吐量是指單位時間內可以成功傳輸的數據數量。
  • 阿里雲塊存儲性能:
參數 ES SD雲盤 SSD雲盤 高效雲盤 普通雲盤 SSD共享塊存儲 高效共享塊存儲
單盤最大容量 32768 GiB 32768 GiB 32768 GiB 2000 GiB 32768 GiB 32768 GiB
最大IOPS 1000000 25000* 5000 數百 30000 5000
最大吞吐量 4000 MBps 300 MBps* 140 MBps 30−40 MBps 512 MBps 160 MBps

二. 性能測試:

  • 可用dd測試塊存儲性能

例:

[root@jenkins tmp]# dd if=/dev/zero of=/tmp/testfile bs=1M count=2048

2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB) copied, 9.90132 s, 217 MB/s
  • 測試隨機寫IOPS,運行以下命令:
fio -direct=1 -iodepth=128 -rw=randwrite -ioengine=libaio -bs=4k -size=1G -numjobs=1 -runtime=1000 -group_reporting -filename=iotest -name=Rand_Write_Testing
  • 測試隨機讀IOPS,運行以下命令:
fio -direct=1 -iodepth=128 -rw=randread -ioengine=libaio -bs=4k -size=1G -numjobs=1 -runtime=1000 -group_reporting -filename=iotest -name=Rand_Read_Testing
  • 測試順序寫吞吐量,運行以下命令:
fio -direct=1 -iodepth=64 -rw=write -ioengine=libaio -bs=1024k -size=1G -numjobs=1 -runtime=1000 -group_reporting -filename=iotest -name=Write_PPS_Testing
  • 測試順序讀吞吐量,運行以下命令:
fio -direct=1 -iodepth=64 -rw=read -ioengine=libaio -bs=1024k -size=1G -numjobs=1 -runtime=1000 -group_reporting -filename=iotest -name=Read_PPS_Testing

各種參數的含義:

-direct=1       表示測試時忽略I/O緩存,數據直寫。
-iodepth=128    表示使用AIO時,同時發出I/O數的上限爲128。
-rw=randwrite   表示測試時的讀寫策略爲隨機寫(random writes)。作其它測試時可以設置爲:
                    randread(隨機讀random reads)
                    read(順序讀sequential reads)
                    write(順序寫sequential writes)
                    randrw(混合隨機讀寫mixed random reads and writes)
-ioengine=libaio    表示測試方式爲libaio(Linux AIO,異步I/O)。應用程序使用I/O通常有兩種方式:
                        同步
                        同步的I/O一次只能發出一個I/O請求,等待內核完成才返回。這樣對於單個線程iodepth總是小於1,但是可以透過多個線程併發執行來解決。通常會用16−32根線程同時工作將iodepth塞滿。

                        異步
                        異步的I/O通常使用libaio這樣的方式一次提交一批I/O請求,然後等待一批的完成,減少交互的次數,會更有效率。

-bs=4k          表示單次I/O的塊文件大小爲4 KB。未指定該參數時的默認大小也是4 KB。
                    測試IOPS時,建議將bs設置爲一個比較小的值,如本示例中的4k。
                    測試吞吐量時,建議將bs設置爲一個較大的值,如本示例中的1024k。
-size=1G        表示測試文件大小爲1 GiB。
-numjobs=1      表示測試線程數爲1。
-runtime=1000       表示測試時間爲1000秒。如果未配置,則持續將前述-size指定大小的文件,以每次-bs值爲分塊大小寫完。
-group_reporting    表示測試結果裏彙總每個進程的統計信息,而非以不同job彙總展示信息。
-filename=iotest    指定測試文件的名稱,比如iotest。測試裸盤可以獲得真實的硬盤性能,但直接測試裸盤會破壞文件系統結構,請在測試前提前做好數據備份。
-name=Rand_Write_Testing    表示測試任務名稱爲Rand_Write_Testing,可以隨意設定。

例:測試隨機寫IOPS

[root@lxk tmp]# fio -direct=1 -iodepth=128 -rw=randwrite -ioengine=libaio -bs=4k -size=1G -numjobs=1 -runtime=1000 -group_reporting -filename=iotest -name=Rand_Write_Testing
Rand_Write_Testing: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
fio-3.1
Starting 1 process
Rand_Write_Testing: Laying out IO file (1 file / 1024MiB)
Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=9696KiB/s][r=0,w=2424 IOPS][eta 00m:00s]
Rand_Write_Testing: (groupid=0, jobs=1): err= 0: pid=29264: Tue Oct  9 09:18:01 2018
  write: IOPS=2131, BW=8525KiB/s (8730kB/s)(1024MiB/122993msec)
    slat (usec): min=3, max=111464, avg=20.62, stdev=881.32
    clat (usec): min=495, max=635660, avg=60031.26, stdev=45220.47
     lat (usec): min=510, max=635664, avg=60052.28, stdev=45221.82
    clat percentiles (msec):
     |  1.00th=[    3],  5.00th=[    4], 10.00th=[    5], 20.00th=[    6],
     | 30.00th=[    8], 40.00th=[   34], 50.00th=[   91], 60.00th=[   94],
     | 70.00th=[   96], 80.00th=[   97], 90.00th=[  100], 95.00th=[  102],
     | 99.00th=[  115], 99.50th=[  125], 99.90th=[  209], 99.95th=[  313],
     | 99.99th=[  634]
   bw (  KiB/s): min= 4104, max=10544, per=100.00%, avg=8525.44, stdev=423.40, samples=245
   iops        : min= 1026, max= 2636, avg=2131.36, stdev=105.86, samples=245
  lat (usec)   : 500=0.01%, 750=0.03%, 1000=0.04%
  lat (msec)   : 2=0.18%, 4=5.16%, 10=28.87%, 20=3.94%, 50=2.20%
  lat (msec)   : 100=52.60%, 250=6.92%, 500=0.02%, 750=0.04%
  cpu          : usr=0.66%, sys=2.54%, ctx=30595, majf=0, minf=26
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwt: total=0,262144,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
  WRITE: bw=8525KiB/s (8730kB/s), 8525KiB/s-8525KiB/s (8730kB/s-8730kB/s), io=1024MiB (1074MB), run=122993-122993msec

Disk stats (read/write):
  vda: ios=1/262860, merge=0/13134, ticks=15/15500516, in_queue=15511938, util=99.95%

三. 系統級磁盤IO監控

1. top

top - 16:59:14 up 15:40,  2 users,  load average: 0.00, 0.00, 0.00
Tasks: 100 total,   1 running,  99 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.2%sy,  0.0%ni, 99.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2037260k total,  1342560k used,   694700k free,    69060k buffers
Swap:  4095996k total,        0k used,  4095996k free,  1018516k cached

上面的Cpu(s)中,0.0%wa爲CPU等待磁盤IO所佔的時間,若該值持續過高,則表示磁盤IO性是系統的瓶頸。

2. iostat

  • 用在磁盤IO監控中的參數
OPTIONS:
    -d     Display the device utilization report.
           顯示磁盤利用報告 
    -m     Display statistics in megabytes per second instead of blocks or kilobytes per second.  Data displayed are valid only with kernels 2.4 and later.
           某些使用block或Kilobytes爲單位的列強制使用megabytes爲單位
    -x     Display extended statistics.  
           顯示詳細狀態

CPU Utilization Report
    %iowait:Show the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
    CPU等待未完成的磁盤讀寫請求所耗費的CPU時鐘週期的百分比

Device Utilization Report
    rrqm/s
        The number of read requests merged per second that were queued to the device.
        每秒發送給設備排隊的讀請求的合併數量
    wrqm/s
        The number of write requests merged per second that were queued to the device.
        每秒發送給設備排隊的寫請求的合併數量
    r/s
        The number of read requests that were issued to the device per second.
        每秒讀請求數量
    w/s
        The number of write requests that were issued to the device per second.
        每秒寫請求數量
    rsec/s
        The number of sectors read from the device per second.
        每秒讀的磁盤扇區數量
    wsec/s
        The number of sectors written to the device per second.
        每秒寫到磁盤的扇區數量
    avgrq-sz
        The average size (in sectors) of the requests that were issued to the device.
        請求發送給設備的平均扇區大小
    avgqu-sz
        The average queue length of the requests that were issued to the device.
        發送給設備的請求的平均隊列長度
    await
        The average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
        I/O請求發送給目標磁盤設備所需的平均時間(以毫秒爲單位)。包括排隊等待和處理請求的時間。(即IO響應時長,一般應低於5s)
    svctm
        The average service time (in milliseconds) for I/O requests that were issued to the device. Warning! Do not trust this field any more. This field will be removed in a future sysstat version.
        分發給設備的 I/O 請求的平均服務時間。(單位是毫秒)警告!不要再相信這列值了。這一列將會在一個未來的版本中移除。
        一次 IO 請求的服務時間,對於單塊盤,完全隨機讀時,基本在 7ms 左右,即尋道 + 旋轉延遲時間
    %util
        Percentage of elapsed time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100%.
        分發給設備的 I/O 請求的運行時間所佔的百分比。(設備的帶寬利用率)這個值接近 100%表明設備飽和。

例:

[root@gitlab ~]# iostat -d -x 1 1
Linux 2.6.32-696.16.1.el6.x86_64 (gitlab)   10/09/2018  _x86_64_    (2 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda              22.57    10.39  108.92    2.50  7615.36   102.80    69.27     0.69    6.23    6.28    4.30   0.27   3.04

[root@gitlab ~]# iostat -x 1 1
Linux 2.6.32-696.16.1.el6.x86_64 (gitlab)   10/09/2018  _x86_64_    (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.37    0.00    1.75   16.51    0.00   79.36

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda              22.57    10.39  108.91    2.50  7614.57   102.80    69.27     0.69    6.23    6.28    4.30   0.27   3.04

[root@gitlab ~]# iostat
Linux 2.6.32-696.16.1.el6.x86_64 (gitlab)   10/09/2018  _x86_64_    (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.37    0.00    1.75   16.51    0.00   79.36

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
vda             111.41      7613.95       102.79  504368410    6809368

四. 進程級磁盤IO監控

1. iotop

Total DISK READ :   0.00 B/s | Total DISK WRITE :       0.00 B/s
Actual DISK READ:   0.00 B/s | Actual DISK WRITE:       0.00 B/s
   TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                                                                                       
     1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % systemd --switched-root --system --deserialize 22
     2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
     3 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
     5 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/0:0H]
     7 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/0]
     8 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_bh]
     9 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_sched]
    10 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [lru-add-drain]
...以下省略

選項:

-b  :批量顯示,無交互,主要用作記錄到文件。
-o  :只顯示有io操作的進程
-n #    :總共顯示幾次
-d #    :顯示的時間間隔
-u USERNAME     :顯示指定用戶打開進程的IO狀況
-p PID          :顯示指定進程的IO狀況

快捷鍵:

左右箭頭:改變排序方式,默認是按IO排序。
r:改變排序順序。
o:只顯示有IO輸出的進程。
p:進程/線程的顯示方式的切換。
a:顯示累積使用量。
q:退出。

例:顯示進程ID爲2372的IO狀況,輸出至終端,顯示一次(性能監控時常用)。

[root@lxk ~]# iotop -b -p 2372 -n 1
Total DISK READ :       0.00 B/s | Total DISK WRITE :       0.00 B/s
Actual DISK READ:       0.00 B/s | Actual DISK WRITE:       0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO    COMMAND
 2372 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/0:1]

2. pidstat

pidstat - Report statistics for Linux tasks.
OPTIONS
-C comm
        Display only tasks whose command name includes the string comm.  This string can be a regular expression.
        只顯示進程名中包含comm的進程,comm可以是正則表達式
-d      Report I/O statistics (kernels 2.6.20 and later only).  The following values may be displayed:
        報告I/O的統計信息(2.6.20版後內核版本支持該功能)。會顯示以下值:
        UID
            The real user identification number of the task being monitored.
            用戶ID號
        USER
            The name of the real user owning the task being monitored.
            運行進程的用戶名
        PID
            The identification number of the task being monitored.
            進程號
        kB_rd/s
            Number of kilobytes the task has caused to be read from disk per second.
            每秒此進程從磁盤讀取的千字節數
        kB_wr/s
            Number of kilobytes the task has caused, or shall cause to be written to disk per second.
            此進程已經或者將要寫入磁盤的每秒千字節數
        kB_ccwr/s
            Number of kilobytes whose writing to disk has been cancelled by the task. This may occur when the task truncates some dirty pagecache. In this case, some IO which another task has been accounted for will not be happening.
            由任務取消的寫入磁盤的千字節數
        Command
            The command name of the task.
            命令的名字
-u      Report CPU utilization. 報告cpu使用情況,會顯示以下內容:
        UID
            The real user identification number of the task being monitored.
            被監視任務的真實用戶標識號。
        USER
            The name of the real user owning the task being monitored.
            被監視任務用戶的真實用戶名
        PID
            The identification number of the task being monitored.
            被監控任務的pid號
        %usr
            Percentage of CPU used by the task while executing at the user level (application), with or without nice priority. Note that this field does NOT include time spent running a virtual processor.
            CPU在用戶空間使用情況(百分比)。包含或不包含nice優先級。這個字段不包括運行虛擬處理器的時間。
        %system
            Percentage of CPU used by the task while executing at the system level (kernel).
            任務在內核空間執行時使用的CPU百分比。
        %guest
            Percentage of CPU spent by the task in virtual machine (running a virtual processor).
            任務在虛擬機中消耗的CPU百分比(運行在虛擬處理器之上)
        %CPU
            Total percentage of CPU time used by the task. In an SMP environment, the task's CPU usage will be divided by the total number of CPU's if option -I has been entered on the command line.
            任務使用的總的CPU百分比。
        CPU
            Processor number to which the task is attached.
            任務在哪個CPU上執行
        Command
            The command name of the task.   執行任務的命令

        在報告任務及其所有子任務的全局統計數據時,可能會顯示以下值:
        UID             被監視任務的真實用戶標識號。
        USER            與子任務一起被監視的任務所屬的實際用戶的名稱。
        PID             與子任務一起被監視的任務所屬的用戶的PID號
        usr-ms          CPU在處理任務及其子任務時,在用戶空間花費的毫秒數。
        system-ms       CPU在處理任務及其子任務時,在內核空間花費的毫秒數。
        guest-ms        CPU在處理任務及其子任務時,在虛擬機上花費的毫秒數。
        Command         執行任務的命令
-l      Display the process command name and all its arguments.
        顯示進程的命令名和它的全部參數
-I      In an SMP environment, indicate that tasks CPU usage (as displayed by option -u ) should be divided by the total number of processors.
        顯示
-p { pid [,...] | SELF | ALL }
        Select tasks (processes) for which statistics are to be reported.  
        顯示指定pid進程的報告

-r      Report page faults and memory utilization.  報告頁面錯誤及內存使用量,會顯示以下值:
        minflt/s
            Total number of minor faults the task has made per second, those which have not required loading a memory page from disk.
            每秒次缺頁錯誤次數(minor page faults),次缺頁錯誤次數意即虛擬內存地址映射成物理內存地址產生的page fault次數
        majflt/s
            Total number of major faults the task has made per second, those which have required loading a memory page from disk.
            每秒主缺頁錯誤次數(major page faults),當虛擬內存地址映射成物理內存地址時,相應的page在swap中,這樣的page fault爲major page fault,一般在內存使用緊張時產生
        minflt-nr
            Total number of minor faults made by the task and all its children, and collected during the interval of time.
            在指定的時間間隔內收集的進程和其子進程的次缺頁錯誤次數
        majflt-nr
            Total number of major faults made by the task and all its children, and collected during the interval of time.
            在指定的時間間隔內收集的進程和其子進程的主缺頁錯誤次數
-t      Also display statistics for threads associated with selected tasks.This option adds the following values to the reports:
        顯示任務線程
        TGID
            The identification number of the thread group leader.
            線程組父進程的ID號
        TID
            The identification number of the thread being monitored.
            被監視線程的ID號。
-s      Report stack utilization.  報告堆棧的利用率
        StkSize
            The amount of memory in kilobytes reserved for the task as stack, but not necessarily used.
            爲任務保留的以千字節爲單位的內存量,但不一定使用。
        StkRef
            The amount of memory in kilobytes used as stack, referenced by the task.
            stack使用的總內存。

-w     Report task switching activity (kernels 2.6.23 and later only).  The following values may be displayed:
        cswch/s
            Total number of voluntary context switches the task made per second.  A voluntary context switch occurs when a task blocks because it requires a resource that is unavailable.
            每秒切換任務的自願上下文總數。當任務因資源不足而阻塞時,就會發生自願上下文切換。
        nvcswch/s
            Total number of non voluntary context switches the task made per second.  A involuntary context switch takes place when a task executes for the duration of its time slice and then is forced to relinquish the processor.
            每秒完成的非自願上下文切換任務的總數。當任務在其時間片期間執行,被迫放棄處理器時,會發生非自願的上下文切換。如大量進程在爭取CPU、進程時間片已經到等原因
EXAMPLES:
 ~]# pidstat 2 5
    Display five reports of CPU statistics for every active task in the system at two second intervals.
    每2秒顯示5次
 ~]# pidstat -r -p 1643 2 5
    Display five reports of page faults and memory statistics for PID 1643 at two second intervals.
    每2秒顯示5次PID號爲1643進程的頁面錯誤報告和內存統計數據。
 ~]# pidstat -C "fox|bird" -r -p ALL
    Display global page faults and memory statistics for all the processes whose command name includes the string "fox" or "bird".
    顯示進程名中包含fox或bird的所有進程的頁面錯誤和內存使用量
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章