報錯內容如下:
2019-09-27 13:32:17 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://movie.douban.com/robots.txt> (referer: None)
2019-09-27 13:32:17 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://movie.douban.com/top250> (referer: None)
2019-09-27 13:32:18 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://movie.douban.com/top250>: HTTP status code is not handled or not allowed
403爲訪問被拒絕,問題出在我們的USER_AGENT上。
解決辦法:
打開我們要爬取的網站,打開控制檯,找一個請求看看:
複製這段user-agent,打開根目錄 items.py文件,粘貼進去:
重新編譯運行爬蟲:
問題解決~