pyspider採集例子

原創

2019-02-22 16:26

#!/usr/bin/env python
# -*- encoding: utf-8 -*-
# Created on 2017-04-04 10:35:52
# Project: retries

from pyspider.libs.base_handler import *
import re

class Handler(BaseHandler):
    crawl_config = {
    }

    def on_start(self):
        self.crawl('http://www.mofangge.com/', callback=self.index_page)

    @config(priority=4)
    def index_page(self, response):
        for each in response.doc('a[href^="http"]').items():
            if re.match("http://www.mofangge.com/qlist/\w+/", each.attr.href, re.U):
                self.crawl(each.attr.href, callback=self.list_page)
                
    @config(priority=3)
    def list_page(self, response):
        for each in response.doc('.seoleftul A').items():
            self.crawl(each.attr.href, callback=self.detail_page)

    @config(priority=2)
    def detail_page(self, response):
        for each in response.doc('td a').items():
            self.crawl(each.attr.href, callback=self.detail_page)
        return {
            "url": response.url,
            "question": response.doc('#q_indexkuai2 table').html(),
            "answer": response.doc('#q_indexkuai3 table').html(),
            "subject": response.doc('body > div.content > div.nagetivebanner1 > div > span > a:nth-child(2)').html(),
        }

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

pyspider採集例子

pyspider採集例子

ubuntu phantomjs安裝(PhantomJS崩潰可以按這個重裝解決)

pyspider採集例子

我的友情鏈接

Ubuntu部署python3-flask-nginx-uwsgi-supervisor完美

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結