linux deadline I/O調度算法分析筆記

linux deadline I/O調度算法分析筆記

deadline算法的核心就是在傳統的電梯算法中加入了請求超時的機制，該機制主要體現在兩點：(1)請求超時時，對超時請求的選擇。(2)沒有請求超時時，當掃描完電梯最後一個request後，準備返回時，對第一個request的選擇。基於以上兩點，平衡了系統i/o吞吐量和響應時間。此外，該算法開考慮到了讀操作對寫操作造成的飢餓。
算法核心數據結構：
struct deadline_data {
struct rb_root sort_list[2];
struct list_head fifo_list[2];

/*
* next in sort order. read, write or both are NULL
*/
struct request *next_rq[2];
unsigned int batching;  /* number of sequential requests made */
sector_t last_sector;  /* head position */
unsigned int starved;  /* times reads have starved writes */

/*
* settings that change how the i/o scheduler behaves
*/
int fifo_expire[2];
int fifo_batch;
int writes_starved;
int front_merges;
};
sort_list：按照request中的sector號大小，把每個request組織在以sort_list爲根的紅黑樹中。這樣方便快速查找。總共有讀寫兩棵樹。
fifo_list：按照超時先後順尋，把request鏈入filo_list，同樣也是分爲讀寫兩個隊列。
因此，任何一個request，在未提交給設備的請求隊列之前，都會同時存在於以上兩個結構中。
next_rq:指向sort_list中的下一個請求。
batching：調度算法可能連續提交多個請求，batching用於記錄當前連續提交的request數目Ｖ灰猙atching < fifo_batch，都可以繼續進行連續提交。
starved:提交讀request而造成寫飢餓的次數。如果starved超過writes_starved，則需要提交寫request，從而避免寫飢餓。

deadline I/O調度器中最重要的兩個方法是：deadline_add_request和deadline_dispatch_requests。它們分別向調度器隊列（非請求隊列）添加request和把調度器隊列中的請求分發到塊設備的請求隊列。

static void
deadline_add_request(struct request_queue *q, struct request *rq)
{
struct deadline_data *dd = q->elevator->elevator_data;
const int data_dir = rq_data_dir(rq);

deadline_add_rq_rb(dd, rq);

/*
* set expire time (only used for reads) and add to fifo list
*/
rq_set_fifo_time(rq, jiffies + dd->fifo_expire[data_dir]);
list_add_tail(&rq->queuelist, &dd->fifo_list[data_dir]);
}
deadline_add_request把請求分別加入紅黑樹和fifo。並記錄請求的超時時間。這兒借用了request的donelist的next字段。因爲在deadline調度隊列的請求，絕不可能被調入請求完成隊列：
#define rq_set_fifo_time(rq,exp) ((rq)->donelist.next = (void *) (exp))

static int deadline_dispatch_requests(struct request_queue *q, int force)
{
struct deadline_data *dd = q->elevator->elevator_data;
const int reads = !list_empty(&dd->fifo_list[READ]);
const int writes = !list_empty(&dd->fifo_list[WRITE]);
struct request *rq;
int data_dir;

/*
* batches are currently reads XOR writes
*/
if (dd->next_rq[WRITE])
rq = dd->next_rq[WRITE];
else
rq = dd->next_rq[READ];

/*
* at this point we are not running a batch. select the appropriate
* data direction (read / write)
*/

if (reads) {
BUG_ON(RB_EMPTY_ROOT(&dd->sort_list[READ]));

if (writes && (dd->starved++ >= dd->writes_starved))
goto dispatch_writes;

data_dir = READ;

goto dispatch_find_request;
}

/*
* there are either no reads or writes have been starved
*/

if (writes) {
dispatch_writes:
BUG_ON(RB_EMPTY_ROOT(&dd->sort_list[WRITE]));

dd->starved = 0;

data_dir = WRITE;

goto dispatch_find_request;
}

return 0;

dispatch_find_request:
/*
* we are not running a batch, find best request for selected data_dir
*/
if (deadline_check_fifo(dd, data_dir) || !dd->next_rq[data_dir]) {
  /*
   * A deadline has expired, the last request was in the other
   * direction, or we have run out of higher-sectored requests.
   * Start again from the request with the earliest expiry time.
   */
  rq = rq_entry_fifo(dd->fifo_list[data_dir].next);
} else {
  /*
   * The last req was the same dir and we have a next request in
   * sort order. No expired requests so continue on from here.
   */
  rq = dd->next_rq[data_dir];
}

dd->batching = 0;

dispatch_request:
/*
* rq is the selected appropriate request.
*/
dd->batching++;
deadline_move_request(dd, rq);

return 1;
}

deadline_dispatch_requests是deadline算法的核心。主要分爲三個部分：（1）確定是要分發讀還是寫 request。影響處理分發類型因素主要包括是否處於batching階段以及是否發生寫飢餓。（2）確定方向以後，根據讀寫方向找到該方向上下一個請求進行分發。影響定位下一個請求因素主要包括請求是否超時。（3）找到合適的request後，調用deadline_move_request分發給塊設備的請求隊列。
1.確定讀寫方向.
a.首先，根據是否處於batching來確定當前處理讀寫的方向。因爲如果處在batching過程中，就意味着調度程序需要連續處理同一方向的請求。這樣可以有效增加系統吞吐量。因此，根據batching的方向，可以確定當前處理請求的方向。而讀寫batching是互斥的：
/*
* batches are currently reads XOR writes
*/
if (dd->next_rq[WRITE])
rq = dd->next_rq[WRITE];
else
rq = dd->next_rq[READ];

通過這個判斷，保證了batching只會在某一方向進行，而不會交錯。因爲在deadline_move_request中有：
dd->next_rq[READ] = NULL;
dd->next_rq[WRITE] = NULL;
dd->next_rq[data_dir] = deadline_latter_request(rq);
也就是在batching方向的next_rq纔可能指向下一個request。

if (rq) {
  /* we have a "next request" */

  if (dd->last_sector != rq->sector)
   /* end the batch on a non sequential request */
   dd->batching += dd->fifo_batch;

  if (dd->batching < dd->fifo_batch)
   /* we are still entitled to batch */
   goto dispatch_request;
}
如果存在下一個request，也就是沒有掃描到電梯的末尾。則判斷該request是否和上一個request相連。如果相連，並且batching的 request數沒有超過fifo_batch，則當前這個request就是我們要分發的request。因此直接跳到最後，把request分發到設備請求隊列。此時將忽略寫飢餓和超時的處理。如果不連續，則要結束batching。
b.如果此時並不處於batching過程中，則根據是否造成寫飢餓“超標”來確定讀寫方向。
if (reads) {
  BUG_ON(RB_EMPTY_ROOT(&dd->sort_list[READ]));

if (writes && (dd->starved++ >= dd->writes_starved))
goto dispatch_writes;

data_dir = READ;

goto dispatch_find_request;
}

/*
* there are either no reads or writes have been starved
*/

if (writes) {
dispatch_writes:
BUG_ON(RB_EMPTY_ROOT(&dd->sort_list[WRITE]));

dd->starved = 0;

data_dir = WRITE;

goto dispatch_find_request;
}
調度器先處理read，也就是read請求優先。但在處理過程中考慮到了寫飢餓。如果此時還有寫請求，則寫飢餓計數+1，如果寫飢餓次數大於了writes_starved，則寫飢餓已經“超標”了，因此直接跳到 dispath_writes去處理寫請求。如果寫飢餓沒有“超標”，則繼續處理讀請求。

2.根據讀寫方向，找到當前要處理的請求：
dispatch_find_request:
/*
* we are not running a batch, find best request for selected data_dir
*/
if (deadline_check_fifo(dd, data_dir) || !dd->next_rq[data_dir]) {
  /*
   * A deadline has expired, the last request was in the other
   * direction, or we have run out of higher-sectored requests.
   * Start again from the request with the earliest expiry time.
   */
  rq = rq_entry_fifo(dd->fifo_list[data_dir].next);
} else {
  /*
   * The last req was the same dir and we have a next request in
   * sort order. No expired requests so continue on from here.
   */
  rq = dd->next_rq[data_dir];
}

dd->batching = 0;
在尋找指定方向上的請求時，考慮了請求的超時時間。這就是deadline的算法核心所在。調度器首先調用deadline_check_fifo來檢查隊列中隊首，也就是最老的一個請求是否超時。如果超時則指定當前處理的請求爲該超時的請求。但如果沒有超時，但已經掃描了電梯的末尾：!dd->next_rq[data_dir]。此時需要返回到電梯首部。但與傳統的電梯算法不同，deadline調度器不是返回到sector最小的request開始繼續掃描。而是返回到等待時間最久的那個requst，從那個request 開始，沿sector遞增方向繼續掃描。也就是說，如果超時，或者掃描到電梯尾，都會返回來處理等待最久的request，並從這個request開始繼續進行電梯掃描。當然，如果既沒有發生超時，也沒有掃描到電梯末尾，則沿sector遞增方向上的下一個request就是當前要處理的request。

3.找到要處理的request後，把它分發到塊設備的請求隊列。

整個deadline調度器比較簡潔，總共只有400多行。它充分考慮了batching，寫飢餓，請求超時這三大因素。在保證吞吐量的基礎上，有考慮到了響應延時。

linux deadline I/O調度算法分析筆記

C語言--右移左移

12款高效開源Wiki系統推薦，打造團隊知識管理利器

一個開源且全面的C#算法實戰教程

dotnet 基於 DirectML 控制檯運行 Phi-3 模型

自定義MyBatis插件

一款.NET開源、功能強大、跨平臺的繪圖庫 - OxyPlot

常用的 Git 指令

鼠標控制軟件有可能和虛擬機軟件產生衝突

sm4加密工具類

GRUB 啓動窺探

Linux 初始 RAM 磁盤（initrd）概述學習 initrd 的剖析、創建以及在 Linux 引導過程中的用法

Linux-2.6.20的cs8900驅動分析(一)

Linux I/O 模型－－－I/O複用：Select和Poll函數

電阻式觸摸屏的基本結構和驅動原理

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結