scrapy-splash模擬鼠標點擊
跟網上其他教程一樣,配置好scrapy
和splash
,
網上的教程大多都沒提及這一點,都是用的render.html
,但是這個沒法執行lua_source
的腳本
重寫 start_requests
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import scrapy
from scrapy_splash import SplashRequest
from scrapy_splash import SplashMiddleware
# 模擬點擊採用js的方式
script = """
function main(splash, args)
splash.images_enabled = false
assert(splash:go(args.url))
assert(splash:wait(1))
js = string.format("document.querySelector('#sxxz > li:nth-child(3) > a').click();", args.page)
splash:runjs(js)
assert(splash:wait(1))
return splash:html()
end
"""
class TestSpider(scrapy.Spider):
name = 'sspider'
start_urls = ['http://www.gdzwfw.gov.cn/portal/branch-hall?orgCode=006940060#']
def start_requests(self):
for url in self.start_urls:
# endpoint其他教程都是寫的render.html,但是模擬點擊需要修改爲 '/execute'
yield SplashRequest(url=url, callback=self.parse_m, endpoint='execute', args={
'wait': 10, 'images': 0, 'lua_source': script
})
def parse_m(self, response):
# print(response.text)
# print(response.encoding)
print(response.xpath('//*[@id="branch-tab1"]').extract()[0])
重點就是 endpoint
這個參數
- render.html:是 Return the HTML of the javascript-rendered page. (返回javascript呈現頁面的HTML。)
- render.png:是 Return an image (in PNG format) of the javascript-rendered page.(返回javascript呈現頁面的圖像(PNG格式)。)
- render.jpeg
- render.har
- render.json
- execute: Execute a custom rendering script and return a result.(執行自定義渲染腳本並返回結果。)
詳細可參考 官方文檔