【原文鏈接】https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
Writing your own downloader middleware
Each middleware component is a Python class that defines one or more of the following methods:
class scrapy.downloadermiddlewares.
DownloaderMiddleware
Note
Any of the downloader middleware methods may also return a deferred (延遲的).
process_request
(request, spider)
This method is called for each request that goes through the download middleware.
process_request()
should either: return None
, return a Response
object, return a Request
object, or raise IgnoreRequest
.
If it returns None
, Scrapy 會繼續處理這個請求, 一直執行其他所有的中間件,直到最終合適的下載器 handler 被調用並且請求被處理 (且響應被下載).
If it returns a Response
object, Scrapy 不會調用任何其它 process_request()
或 process_exception()
方法, 或 the appropriate 下載函數; 而是會返回 that response. 安裝好的中間件的 process_response()
方法對每個響應都會被調用.
If it returns a Request
object, Scrapy 會停止調用 process_request()
方法,並且會 reschedule 被返回的請求. 一旦新被返回的請求被執行,合適的中間件 chain will be called on the downloaded response.
If it raises an IgnoreRequest
exception, 被安裝的下載器中間件的 process_exception()
方法會被調用. 如果他們中沒有能夠處理異常的, 請求的 errback 方法會被調用 (Request.errback
). If no code handles the raised exception, it is ignored and not logged (unlike other exceptions).
Parameters: |
---|
process_response(略)
process_exception
(request, exception, spider)
當一個下載 handler or a process_request()
(from a downloader middleware) 拋出異常 (including an IgnoreRequest
exception)時,Scrapy 會調用 process_exception()
方法.
process_exception()
應該返回: either None
, a Response
object, or a Request
object.
If it returns None
, Scrapy 會繼續處理這個異常, 執行安裝的中間件的其他 process_exception()
方法, 直到沒有中間件被剩下,然後默認的異常處理 kicks in.
If it returns a Response
object, 已安裝中間件的 process_response()
method chain is started, and Scrapy won’t bother calling any other process_exception()
methods of middleware.
If it returns a Request
object, 被返回的請求 is rescheduled to 下載 in the future. 這會停止中間件的
process_exception()
方法的執行 the same as returning a response would.
Parameters: |
---|
from_crawler
(cls, crawler),
如果有此方法,該類方法會被調用創建一箇中間件實例 from a Crawler
. 它必須返回中間件的一個新實例. Crawler 對象對所有 Scrapy 核心組件,比如 settings 和 signals 提供 access; 它是中間件 access them and hook its functionality into Scrapy 的一種方式.
Parameters: |
crawler ( |
---|