Twisted 版本不對
安裝Python環境:
sudo apt-get install
python-dev
安裝scrapy:
sudo pip install scrapy
安裝twisted:
pip install Twisted==16.4.1
創建新項目:
scrapy startproject name
運行爬蟲:
scrapy crawl spidernamesudo apt-get install libmysqlclient-dev
sudo pip install MySQL-Python安裝scrapy-random-useragent
sudo pip install scrapy-random-useragent
配置setting.py:
- DOWNLOADER_MIDDLEWARES = {
- 'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,
- 'random_useragent.RandomUserAgentMiddleware': 400
- }
- USER_AGENT_LIST = "/path/to/useragents.txt"
創建useragents.txt文件
- "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1
- (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1
- Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11
- (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11
- Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.6
- (KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6
- Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6
- (KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6
- Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.1
- (KHTML, like Gecko) Chrome/19.77.34.5 Safari/537.1
- Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5
- (KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5
- Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.5
- (KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5
- Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3
- (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3
- Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3
- (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3
- Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/536.3
- (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3
- Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3
- (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3
- Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3
- (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3
- Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3
- (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3
- Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3
- (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3
- Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3
- (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3
- Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3
- (KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3
- Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.24
- (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24
- Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.24
- (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24
scrapy polipo
middlewares.py:- # Importing base64 library because we'll need it ONLY in case if the proxy we are going to use requires authentication
- import base64
- # Start your middleware class
- class ProxyMiddleware(object):
- # overwrite process request
- def process_request(self, request, spider):
- # Set the location of the proxy
- request.meta['proxy'] = "http://127.0.0.1:8118"
setting.py:
- DOWNLOADER_MIDDLEWARES = {
- 'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110,
- 'amazon.middlewares.ProxyMiddleware': 100,
- }
scrapy http status codes
request加上meta={'handle_httpstatus_list': range(400,600)}原文地址:http://blog.csdn.net/u013596119/article/details/71245802
http://blog.csdn.net/u013596119/article/details/71246334