日常操作
1、創建:scrapy startproject pac(項目名稱)
2、創建一個爬蟲: scrapy genspider qsbk "qiushibaike.com"(名字)(要爬取地址)
3、設置:settings> >
ROBOTSTXT_OBEY = False
DOWNLOAD_DELAY = 3
DEFAULT_REQUEST_HEADERS = { 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
}
#存儲內容
ITEM_PIPELINES = {
'pac.pipelines.PacPipeline': 300,
}
DOWNLOAD_DELAY = 1 --下載延遲
------中間件(寫反爬蟲class=A)
DOWNLOADER_MIDDLEWARES = {
'fangz.middlewares.A': 543,
}
4、設置:
》》文件
(1)contains()----包含
lis = response.xpath("//div[contains(@class,'nl_con')]/ul/li")
(2)yield scrapy.Request(url,callback,meta)
def parse_page1(self, response):
a = 你好
b= 不好
url = "www.example....."
yield scrapy.Request(url,callback=self.parse_page2,meta={'item':(a,b)})
def parse_page2(self, response):
item = response.meta.get('item')
#爬取內容
lis = response.xpath("//div[contains(@class,'nl_con')]/ul/li")
yield scrapy.Request(url,callback=self.parse_esf,meta={"info":(province,city)})#2手房
(self,response):
5、文件
》》pipelines-->數據存儲
------------部署服務器----
1cmd :pip freeze > requirements.txt
多個txt,文件裏面包括需要下載的附件
2,發送給服務器:rz --選中txt包
3,pip install -r requirements.txt
-------------創建虛擬環境 pip install virtualenwrapper-----
1,mkvirtualenv -p /usr/bin/python3 minzi :創建p3,名:minzi 虛擬環境
2,pip install -r requirements.txt
----------redis 分佈式開發-----
1,安-裝 :pip install scrapy-redis
》pac.py:
from scrapy_redis.spiders import RedisSpider
具體https://www.cnblogs.com/zhangyangcheng/articles/8150483.html