1 403错误
scrapy默认是遵守爬虫准则的,即settings.py里面,ROBOTSTXT_OBEY = True,改为False
2 防爬机制,需要伪装成游览器
找到scrapy库的安装目录,如D:\Python\Lib\site-packages\scrapy\settings
找到里面的default_settings.py
找到USER_AGENT:
改成:USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:76.0) Gecko/20100101 Firefox/76.0'
3 如何检查自己的XPATH路径是否正确
调试方法,scrapy shell 测试的url===》scrapy shell http://www.baidu.com
4 ValueError: unsupported format character 'C'
https://blog.csdn.net/xlsj228/article/details/106379997