在WinSock上使用IOCP——建議

 

在WinSock上使用IOCP
本文章假設你已經理解WindowsNT的I/O模型以及I/O完成端口(IOCP),並且比較熟悉將要用到的API,如果你打算學習IOCP,請參考Jeffery Richter的Advanced Windows(第三版),第15章I/O設備,裏面有極好的關於完成端口的討論以及對即將使用API的說明。
IOCP提供了一個用於開發高效率和易擴展程序的模型。Winsock2提供了對IOCP的支持,並在WindowsNT平臺得到了完整的實現。然而IOCP是所有WindowsNT I/O模型中最難理解和實現的,爲了幫助你使用IOCP設計一個更好的Socket服務,本文提供了一些訣竅。

Tip 1:使用Winsock2 IOCP函數例如WSASend和WSARecv,如同Win32文件I/O函數,例如WriteFile和ReadFile。
微軟提供的Socket句柄是一個可安裝文件系統(IFS)句柄,因此你可以使用Win32的文件I/O函數調用這個句柄,然而,將Socket句柄和文件系統聯繫起來,你不得不陷入很多的Kernal/User模式轉換的問題中,例如線程的上下文轉換,花費的代價還包括參數的重新排列導致的性能降低。
因此你應該使用只被Winsock2中IOCP允許的函數來使用IOCP。在ReadFile和WriteFile中會發生的額外的參數重整以及模式轉換隻會發生在一種情況下,那就是如果句柄的提供者並沒有將自己的WSAPROTOCOL_INFO結構中的DwServiceFlags1設置爲XP1_IFS_HANDLES。
註解:即使使用WSASend和WSARecv,這些提供者仍然具有不可避免的額外的模式轉換,當然ReadFile和WriteFile需要更多的轉換。

TIP 2: 確定併發工作線程數量和產生的工作線程總量。
併發工作線程的數量和工作線程的數量並不是同一概念。你可以決定IOCP使用最多2個的併發線程以及包括10個工作線程的線程池。工作線程池擁有的線程多於或者等於併發線程的數量時,工作線程處理隊列中一個封包的時候可以調用win32的Wait函數,這樣可以無延遲的處理隊列中另外的封包。
如果隊列中有正在等待被處理的封包,系統將會喚醒一個工作線程處理他,最後,第一個線程確認正在休眠並且可以被再次調用,此時,可調用線程數量會多於IOCP允許的併發線程數量(例如,NumberOFConcurrentThreads)。然而,當下一個線程調用GetQueueCompletionStatus並且進入等待狀態,系統不會喚醒他。一般來說,系統會試圖保持你設定的併發工作線程數量。
一般來講,每擁有一個CPU,在IOCP中你可以使用一個併發工作線程,要做到這點,當你第一次初始化IOCP的時候,可以在調用CreateIOCompletionPort的時候將NumberOfConcurrentThreads設置爲0。

TIP 3:將一個提交的I/O操作和完成封包的出列聯繫起來。
當對一個封包進行出列,可以調用GetQueuedCompletionStatus返回一個完成Key和一個複合的結構體給I/O。你可以分別的使用這兩個結構體來返回一個句柄和一個I/O操作信息,當你將IOCP提供的句柄信息註冊給Socket,那麼你可以將註冊的Socket句柄當做一個完成Key來使用。爲每一個I/O的"extend"操作提供一個包含你的應用程序IO狀態信息的複合結構體。當然,必須確定你爲每個的I/O提供的是唯一的複合結構體。當I/O完成的時候,會返回一個指向結構體的指針。

TIP 4:I/O完成封包隊列的行爲
IOCP中完成封包隊列的等待次序並不決定於Winsock2 I/O調用產生的順序。如果一個Winsock2的I/O調用返回了SUCCESS或者IO_PENDING,那麼他保證當I/O操作完成後,完成封包會進入IOCP的等待隊列,而不管Socket句柄是否已經關閉。如果你關閉了socket句柄,那麼將來調用WSASend,WSASendTo,WSARecv和WSARecvFrom會失敗並返回一個不同於SUCCES或者IO_PENDING的代碼,這時將不會產生一個完成封包。而在這種情況下,前一次使用GetQueuedCompletionStatus提交的I/O操作所得到的完成封包,會顯示一個失敗的信息。
如果你刪除了IOCP本身,那麼不會有任何I/O請求發送給IOCP,因爲IOCP的句柄已經不可用,儘管系統底層的IOCP核心結構並不會在所有已提交I/O請求完成之前被移除。

TIP5:IOCP的清除
很重要的一件事是使用複合I/O時候的IOCP清除:如果一個I/O操作尚未完成,那麼千萬不要釋放該操作創建的複合結構體。HasOverlappedIoCompleted函數可以幫助你檢查一個I/O操作是否已經完成。
關閉服務一般有兩種情況,第一種你並不關心尚未結束的I/O操作的完成狀態,你只希望儘可能快的關閉他。第二種,你打算關閉服務,但是你需要獲知未結束I/O操作的完成狀態。
第一種情況你可以調用PostQueueCompletionStatus(N次,N等於你的工作線程數量)來提交一個特殊的完成封包,他通知所有的工作線程立即退出,關閉所有socket句柄和他們關聯的複合結構體,然後關閉完成端口(IOCP)。在關閉複合結構體之前使用HasOverlappedIOCompleted檢查他的完成狀態。如果一個socket關閉了,所有基於他的未結束的I/O操作會很快的完成。
在第二種情況,你可以延遲工作線程的退出來保證所有的完成封包可以被適當的出列。你可以首先關閉所有的socket句柄和IOCP。可是,你需要維護一個未完成I/O的數字,以便你的線程可以知道可以安全退出的時間。儘管當隊列中有很多完成封包在等待的時候,活動的工作線程不能立即退出,但是在IOCP服務中使用全局I/O計數器並且使用臨界區保護他的代價並不會象你想象的那樣昂貴。

INFO: Design Issues When Using IOCP in a Winsock Server

適用於

This article was previously published under Q192800

SUMMARY

This article assumes you already understand the I/O model of the Windows NT I/O Completion Port (IOCP) and are familiar with the related APIs. If you want to learn IOCP, please see Advanced Windows (3rd edition) by Jeffery Richter, chapter 15 Device I/O for an excellent discussion on IOCP implementation and the APIs you need to use it.



An IOCP provides a model for developing very high performance and very scalable server programs. Direct IOCP support was added to Winsock2 and is fully implemented on the Windows NT platform. However, IOCP is the hardest to understand and implement among all Windows NT I/O models. To help you design a better socket server using IOCP, a number of tips are provided in this article.

MORE INFORMATION

TIP 1: Use Winsock2 IOCP-capable functions, such as WSASend and WSARecv, over Win32 file I/O functions, such as WriteFile and ReadFile.



Socket handles from Microsoft-based protocol providers are IFS handles so you can use Win32 file I/O calls with the handle. However, the interactions between the provider and file system involve many kernel/user mode transition, thread context switches, and parameter marshals that result in a significant performance penalty. You should use only Winsock2 IOCP- capable functions with IOCP.



The additional parameter marshals and mode transitions in ReadFile and WriteFile only occur if the provider does not have XP1_IFS_HANDLES bit set in dwServiceFlags1 of its WSAPROTOCOL_INFO structure.



NOTE: These providers have an unavoidable additional mode transition, even in the case of WSASend and WSARecv, although ReadFile and WriteFile will have more of them.



TIP 2: Choose the number of the concurrent worker threads allowed and the total number of the worker threads to spawn.



The number of worker threads and the number of concurrent threads that the IOCP uses are not the same thing. You can decide to have a maximum of 2 concurrent threads used by the IOCP and a pool of 10 worker threads. You have a pool of worker threads greater than or equal to the number of concurrent threads used by the IOCP so that a worker thread handling a dequeued completion packet can call one of the Win32 "wait" functions without delaying the handling of other queued I/O packets.



If there are completion packets waiting to be dequeued, the system will wake up another worker thread. Eventually, the first thread satisfies it's Wait and it can be run again. When this happens, the number of the threads that can be run is higher than the concurrency allowed on the IOCP (for example, NumberOfConcurrentThreads). However, when next worker thread calls GetQueueCompletionStatus and enters wait status, the system does not wake it up. In other words, the system tries to keep your requested number of concurrent worker threads.



Typically, you only need one concurrent worker thread per CPU for IOCP. To do this, enter 0 for NumberOfConcurrentThreads in the CreateIoCompletionPort call when you first create the IOCP.



TIP 3: Associate a posted I/O operation with a dequeued completion packet.



GetQueuedCompletionStatus returns a completion key and an overlapped structure for the I/O when dequeuing a completion packet. You should use these two structures to return per handle and per I/O operation information, respectively. You can use your socket handle as the completion key when you register the socket with the IOCP to provide per handle information. To provide per I/O operation "extend" the overlapped structure to contain your application-specific I/O-state information. Also, make sure you provide a unique overlapped structure for each overlapped I/O. When an I/O completes, the same pointer to the overlapped I/O structure is returned.



TIP 4: I/O completion packet queuing behavior.



The order in which I/O completion packets are queued in the IOCP is not necessarily the same order the Winsock2 I/O calls were made. Additionally, if a Winsock2 I/O call returns SUCCESS or IO_PENDING, it is guaranteed that a completion packet will be queued to the IOCP when the I/O completes, regardless of whether the socket handle is closed. After you close a socket handle, future calls to WSASend, WSASendTo, WSARecv, or WSARecvFrom will fail with a return code other than SUCCESS or IO_PENDING, which will not generate a completion packet. The status of the completion packet retrieved by GetQueuedCompletionStatus for I/O previously posted could indicate a failure in this case.



If you delete the IOCP itself, no more I/O can be posted to the IOCP because the IOCP handle itself is invalid. However, the system's underlying IOCP kernel structures do not go away until all successfully posted I/Os are completed.



TIP 5: IOCP cleanup.



The most important thing to remember when performing ICOP cleanup is the same when using overlapped I/O: do not free an overlapped structure if the I/O for it has not yet completed. The HasOverlappedIoCompleted macro allows you to detect if an I/O has completed from its overlapped structure.



There are typically two scenarios for shutting down a server. In the first scenario, you do not care about the completion status of outstanding I/Os and you just want to shut down as fast as you can. In the second scenario, you want to shut down the server, but you do need to know the completion status of each outstanding I/O.



In the first scenario, you can call PostQueueCompletionStatus (N times, where N is the number of worker threads) to post a special completion packet that informs the worker thread to exit immediately, close all socket handles and their associated overlapped structures, and then close the completion port. Again, make sure you use HasOverlappedIoCompleted to check the completion status of an overlapped structure before you free it. If a socket is closed, all outstanding I/O on the socket eventually complete quickly.



In the second scenario, you can delay exiting worker threads so that all completion packets can be properly dequeued. You can start by closing all socket handles and the IOCP. However, you need to maintain a count of the number of outstanding I/Os so that your worker thread can know when it is safe to exit the thread. The performance penalty of having a global I/O counter protected with a critical section for an IOCP server is not as bad as might be expected because the active worker thread does not switch out if there are more completion packets waiting in the queue.

FROM: http://vieri.blogdriver.com/vieri/

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章