deadline算法的核心就是在傳統的電梯算法中加入了請求超時的機制,該機制主要體現在兩點:(1)請求
超時時,對超時請求的選擇。(2)沒有請求超時時,當掃描完電梯最後一個request後,準備返回時,對第一個request的選擇。基於以上兩點,平
衡了系統i/o吞吐量和響應時間。此外,該算法開考慮到了讀操作對寫操作造成的飢餓。
算法核心數據結構:
struct deadline_data {
struct rb_root sort_list[2];
struct list_head fifo_list[2];
/*
* next in sort order. read, write or both are NULL
*/
struct request *next_rq[2];
unsigned int batching; /* number of sequential requests made */
sector_t last_sector; /* head position */
unsigned int starved; /* times reads have starved writes */
/*
* settings that change how the i/o scheduler behaves
*/
int fifo_expire[2];
int fifo_batch;
int writes_starved;
int front_merges;
};
sort_list:按照request中的sector號大小,把每個request組織在以sort_list爲根的紅黑樹中。這樣方便快速查找。總共有讀寫兩棵樹。
fifo_list:按照超時先後順尋,把request鏈入filo_list,同樣也是分爲讀寫兩個隊列。
因此,任何一個request,在未提交給設備的請求隊列之前,都會同時存在於以上兩個結構中。
next_rq:指向sort_list中的下一個請求。
batching:調度算法可能連續提交多個請求,batching用於記錄當前連續提交的request數目V灰猙atching < fifo_batch,都可以繼續進行連續提交。
starved:提交讀request而造成寫飢餓的次數。如果starved超過writes_starved,則需要提交寫request,從而避免寫飢餓。
deadline I/O調度器中最重要的兩個方法是:deadline_add_request和deadline_dispatch_requests。它們分別向調度器隊列(非請求隊列)添加request和把調度器隊列中的請求分發到塊設備的請求隊列。
static void
deadline_add_request(struct request_queue *q, struct request *rq)
{
struct deadline_data *dd = q->elevator->elevator_data;
const int data_dir = rq_data_dir(rq);
deadline_add_rq_rb(dd, rq);
/*
* set expire time (only used for reads) and add to fifo list
*/
rq_set_fifo_time(rq, jiffies + dd->fifo_expire[data_dir]);
list_add_tail(&rq->queuelist, &dd->fifo_list[data_dir]);
}
deadline_add_request把請求分別加入紅黑樹和fifo。並記錄請求的超時時間。這兒借用了request的donelist的next字段。因爲在deadline調度隊列的請求,絕不可能被調入請求完成隊列:
#define rq_set_fifo_time(rq,exp) ((rq)->donelist.next = (void *) (exp))
static int deadline_dispatch_requests(struct request_queue *q, int force)
{
struct deadline_data *dd = q->elevator->elevator_data;
const int reads = !list_empty(&dd->fifo_list[READ]);
const int writes = !list_empty(&dd->fifo_list[WRITE]);
struct request *rq;
int data_dir;
/*
* batches are currently reads XOR writes
*/
if (dd->next_rq[WRITE])
rq = dd->next_rq[WRITE];
else
rq = dd->next_rq[READ];
if (rq) {
/* we have a "next request" */
if (dd->last_sector != rq->sector)
/* end the batch on a non sequential request */
dd->batching += dd->fifo_batch;
if (dd->batching < dd->fifo_batch)
/* we are still entitled to batch */
goto dispatch_request;
}
/*
* at this point we are not running a batch. select the appropriate
* data direction (read / write)
*/
if (reads) {
BUG_ON(RB_EMPTY_ROOT(&dd->sort_list[READ]));
if (writes && (dd->starved++ >= dd->writes_starved))
goto dispatch_writes;
data_dir = READ;
goto dispatch_find_request;
}
/*
* there are either no reads or writes have been starved
*/
if (writes) {
dispatch_writes:
BUG_ON(RB_EMPTY_ROOT(&dd->sort_list[WRITE]));
dd->starved = 0;
data_dir = WRITE;
goto dispatch_find_request;
}
return 0;
dispatch_find_request:
/*
* we are not running a batch, find best request for selected data_dir
*/
if (deadline_check_fifo(dd, data_dir) || !dd->next_rq[data_dir]) {
/*
* A deadline has expired, the last request was in the other
* direction, or we have run out of higher-sectored requests.
* Start again from the request with the earliest expiry time.
*/
rq = rq_entry_fifo(dd->fifo_list[data_dir].next);
} else {
/*
* The last req was the same dir and we have a next request in
* sort order. No expired requests so continue on from here.
*/
rq = dd->next_rq[data_dir];
}
dd->batching = 0;
dispatch_request:
/*
* rq is the selected appropriate request.
*/
dd->batching++;
deadline_move_request(dd, rq);
return 1;
}
deadline_dispatch_requests是deadline算法的核心。主要分爲三個部分:(1)確定是要分發讀還是寫
request。影響處理分發類型因素主要包括是否處於batching階段以及是否發生寫飢餓。(2)確定方向以後,根據讀寫方向找到該方向上下一個請
求進行分發。影響定位下一個請求因素主要包括請求是否超時。(3)找到合適的request後,調用deadline_move_request分發給塊
設備的請求隊列。
1.確定讀寫方向.
a.首先,根據是否處於batching來確定當前處理讀寫的方向。因爲如果處在batching過
程中,就意味着調度程序需要連續處理同一方向的請求。這樣可以有效增加系統吞吐量。因此,根據batching的方向,可以確定當前處理請求的方向。而讀
寫batching是互斥的:
/*
* batches are currently reads XOR writes
*/
if (dd->next_rq[WRITE])
rq = dd->next_rq[WRITE];
else
rq = dd->next_rq[READ];
通過這個判斷,保證了batching只會在某一方向進行,而不會交錯。因爲在deadline_move_request中有:
dd->next_rq[READ] = NULL;
dd->next_rq[WRITE] = NULL;
dd->next_rq[data_dir] = deadline_latter_request(rq);
也就是在batching方向的next_rq纔可能指向下一個request。
if (rq) {
/* we have a "next request" */
if (dd->last_sector != rq->sector)
/* end the batch on a non sequential request */
dd->batching += dd->fifo_batch;
if (dd->batching < dd->fifo_batch)
/* we are still entitled to batch */
goto dispatch_request;
}
如
果存在下一個request,也就是沒有掃描到電梯的末尾。則判斷該request是否和上一個request相連。如果相連,並且batching的
request數沒有超過fifo_batch,則當前這個request就是我們要分發的request。因此直接跳到最後,把request分發到設
備請求隊列。此時將忽略寫飢餓和超時的處理。如果不連續,則要結束batching。
b.如果此時並不處於batching過程中,則根據是否造成寫飢餓“超標”來確定讀寫方向。
if (reads) {
BUG_ON(RB_EMPTY_ROOT(&dd->sort_list[READ]));
if (writes && (dd->starved++ >= dd->writes_starved))
goto dispatch_writes;
data_dir = READ;
goto dispatch_find_request;
}
/*
* there are either no reads or writes have been starved
*/
if (writes) {
dispatch_writes:
BUG_ON(RB_EMPTY_ROOT(&dd->sort_list[WRITE]));
dd->starved = 0;
data_dir = WRITE;
goto dispatch_find_request;
}
調度器先處理read,也就是read請求優先。但在處理過
程中考慮到了寫飢餓。如果此時還有寫請求,則寫飢餓計數+1,如果寫飢餓次數大於了writes_starved,則寫飢餓已經“超標”了,因此直接跳到
dispath_writes去處理寫請求。如果寫飢餓沒有“超標”,則繼續處理讀請求。
2.根據讀寫方向,找到當前要處理的請求:
dispatch_find_request:
/*
* we are not running a batch, find best request for selected data_dir
*/
if (deadline_check_fifo(dd, data_dir) || !dd->next_rq[data_dir]) {
/*
* A deadline has expired, the last request was in the other
* direction, or we have run out of higher-sectored requests.
* Start again from the request with the earliest expiry time.
*/
rq = rq_entry_fifo(dd->fifo_list[data_dir].next);
} else {
/*
* The last req was the same dir and we have a next request in
* sort order. No expired requests so continue on from here.
*/
rq = dd->next_rq[data_dir];
}
dd->batching = 0;
在尋找指定方向上的請求時,考慮了請求的超時時間。這就是deadline的算法核心所
在。調度器首先調用deadline_check_fifo來檢查隊列中隊首,也就是最老的一個請求是否超時。如果超時則指定當前處理的請求爲該超時的請
求。但如果沒有超時,但已經掃描了電梯的末尾:!dd->next_rq[data_dir]。此時需要返回到電梯首部。但與傳統的電梯算法不
同,deadline調度器不是返回到sector最小的request開始繼續掃描。而是返回到等待時間最久的那個requst,從那個request
開始,沿sector遞增方向繼續掃描。也就是說,如果超時,或者掃描到電梯尾,都會返回來處理等待最久的request,並從這個request開始繼
續進行電梯掃描。當然,如果既沒有發生超時,也沒有掃描到電梯末尾,則沿sector遞增方向上的下一個request就是當前要處理的request。
3.找到要處理的request後,把它分發到塊設備的請求隊列。
整個deadline調度器比較簡潔,總共只有400多行。它充分考慮了batching,寫飢餓,請求超時這三大因素。在保證吞吐量的基礎上,有考慮到了響應延時。