python3下使用scrapy實現模擬用戶登錄與cookie存儲 —— 基礎篇(馬蜂窩)

python3下使用scrapy實現模擬用戶登錄與cookie存儲 —— 基礎篇(馬蜂窩)

1. 背景

2. 環境

  • 系統:win7
  • python 3.6.1
  • scrapy 1.4.0

3. 標準的模擬登陸步驟

  • 第一步:首先進入用戶登錄的頁面,拿到一些登錄所需的參數(比如說知乎網站,登陸頁面裏的 _xsrf)。
  • 第二步:將這些參數,和賬戶密碼,一起post到服務器,登錄。
  • 第三步:檢查用戶登錄是否成功。
  • 第四步:如果用戶登錄失敗,排查錯誤,重新啓動登錄程序。
  • 第五步:如果用戶登錄成功,按照正常流程爬取網站頁面。
# 以馬蜂窩網站登錄爲例,講解如何模擬用戶登錄
# 保持登錄狀態,訪問其他頁面


# 爬蟲文件:mafengwoSpider.py
# -*- coding: utf-8 -*-

import scrapy
import datetime
import re

class mafengwoSpider(scrapy.Spider):
    # 定製化設置
    custom_settings = {
        'LOG_LEVEL': 'DEBUG',       # Log等級,默認是最低級別debug
        'ROBOTSTXT_OBEY': False,    # default Obey robots.txt rules
        'DOWNLOAD_DELAY': 2,        # 下載延時,默認是0
        'COOKIES_ENABLED': True,    # 默認enable,爬取登錄後的數據時需要啓用。 會增加流量,因爲request和response中會多攜帶cookie的部分
        'COOKIES_DEBUG': True,      # 默認值爲False,如果啓用,Scrapy將記錄所有在request(Cookie 請求頭)發送的cookies及response接收到的cookies(Set-Cookie 接收頭)。
        'DOWNLOAD_TIMEOUT': 25,     # 下載超時,既可以是爬蟲全局統一控制,也可以在具體請求中填入到Request.meta中,Request.meta['download_timeout']
    }

    name = 'mafengwo'
    allowed_domains = ['mafengwo.cn']
    host = "http://www.mafengwo.cn/"
    username = "13725168940"            # 螞蜂窩帳號
    password = "aaa00000000"          # 馬蜂窩密碼
    headerData = {
        "Referer": "https://passport.mafengwo.cn/",
        'User-Agent': "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36",
    }


    # 爬蟲運行的起始位置
    # 第一步:爬取馬蜂窩登錄頁面
    def start_requests(self):
        print("start mafengwo clawer")
        # 馬蜂窩登錄頁面
        mafengwoLoginPage = "https://passport.mafengwo.cn/"
        loginIndexReq = scrapy.Request(
            url = mafengwoLoginPage,
            headers = self.headerData,
            callback = self.parseLoginPage,
            dont_filter = True,     # 防止頁面因爲重複爬取,被過濾了
        )
        yield loginIndexReq


    # 第二步:分析登錄頁面,取出必要的參數,然後發起登錄請求POST
    def parseLoginPage(self, response):
        print(f"parseLoginPage: url = {response.url}")
        # 如果這個登錄頁面含有一些登錄必備的信息,那麼就在這個函數裏面進行信息提取( response.text )

        loginPostUrl = "https://passport.mafengwo.cn/login/"
        # FormRequest 是Scrapy發送POST請求的方法
        yield scrapy.FormRequest(
            url = loginPostUrl,
            headers = self.headerData,
            method = "POST",
            # post的具體數據
            formdata = {
                "passport": self.username,
                "password": self.password,
                # "other": "other",
            },
            callback = self.loginResParse,
            dont_filter = True,
        )

    # 第三步:分析登錄結果,然後發起登錄狀態的驗證請求
    def loginResParse(self, response):
        print(f"loginResParse: url = {response.url}")

        # 通過訪問個人中心頁面的返回狀態碼來判斷是否爲登錄狀態
        # 這個頁面,只有登錄過的用戶,才能訪問。否則會被重定向(302) 到登錄頁面
        routeUrl = "http://www.mafengwo.cn/plan/route.php"
        # 下面有兩個關鍵點
        # 第一個是header,如果不設置,會返回500的錯誤
        # 第二個是dont_redirect,設置爲True時,是不允許重定向,用戶處於非登錄狀態時,是無法進入這個頁面的,服務器返回302錯誤。
        #       dont_redirect,如果設置爲False,允許重定向,進入這個頁面時,會自動跳轉到登錄頁面。會把登錄頁面抓下來。返回200的狀態碼
        yield scrapy.Request(
            url = routeUrl,
            headers = self.headerData,
            meta={
                'dont_redirect': True,      # 禁止網頁重定向302, 如果設置這個,但是頁面又一定要跳轉,那麼爬蟲會異常
                # 'handle_httpstatus_list': [301, 302]      # 對哪些異常返回進行處理
            },
            callback = self.isLoginStatusParse,
            dont_filter = True,
        )


    # 第五步:分析用戶的登錄狀態, 如果登錄成功,那麼接着爬取其他頁面
    # 如果登錄失敗,爬蟲會直接終止。
    def isLoginStatusParse(self, response):
        print(f"isLoginStatusParse: url = {response.url}")

        # 如果能進到這一步,都沒有出錯的話,那麼後面就可以用登錄狀態,訪問後面的頁面了
        # ………………………………
        # 不需要存儲cookie
        # 其他網頁爬取
        # ………………………………
        yield scrapy.Request(
            url = "https://www.mafengwo.cn/travel-scenic-spot/mafengwo/10045.html",
            headers=self.headerData,
            # 如果不指定callback,那麼默認會使用parse函數
        )


    # 正常的分析頁面請求
    def parse(self, response):
        print(f"parse: url = {response.url}, meta = {response.meta}")


    # 請求錯誤處理:可以打印,寫文件,或者寫到數據庫中
    def errorHandle(self, failure):
        print(f"request error: {failure.value.response}")


    # 爬蟲運行完畢時的收尾工作,例如:可以打印信息,可以發送郵件
    def closed(self, reason):
        # 爬取結束的時候可以發送郵件
        finishTime = datetime.datetime.now()
        subject = f"clawerName had finished, reason = {reason}, finishedTime = {finishTime}"
  • 登錄成功的Log:
E:\Miniconda\python.exe E:/documentCode/scrapyMafengwo/start.py
2018-03-19 17:03:54 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: scrapyMafengwo)
2018-03-19 17:03:54 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'scrapyMafengwo', 'NEWSPIDER_MODULE': 'scrapyMafengwo.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['scrapyMafengwo.spiders']}
2018-03-19 17:03:54 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.logstats.LogStats']
2018-03-19 17:03:54 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-03-19 17:03:54 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-03-19 17:03:54 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-03-19 17:03:54 [scrapy.core.engine] INFO: Spider opened
2018-03-19 17:03:54 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-03-19 17:03:54 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6024
start mafengwo clawer
2018-03-19 17:03:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://passport.mafengwo.cn/> (referer: https://passport.mafengwo.cn/)
parseLoginPage: url = https://passport.mafengwo.cn/
2018-03-19 17:03:57 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET http://www.mafengwo.cn> from <POST https://passport.mafengwo.cn/login/>
2018-03-19 17:03:58 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.mafengwo.cn> (referer: None)
loginResParse: url = http://www.mafengwo.cn
2018-03-19 17:03:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.mafengwo.cn/plan/route.php> (referer: https://passport.mafengwo.cn/)
isLoginStatusParse: url = http://www.mafengwo.cn/plan/route.php
2018-03-19 17:04:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.mafengwo.cn/travel-scenic-spot/mafengwo/10045.html> (referer: https://passport.mafengwo.cn/)
parse: url = https://www.mafengwo.cn/travel-scenic-spot/mafengwo/10045.html, meta = {'depth': 3, 'download_timeout': 25.0, 'download_slot': 'www.mafengwo.cn', 'download_latency': 0.2569999694824219}
subject = clawerName had finished, reason = finished, finishedTime = 2018-03-19 17:04:01.638400
2018-03-19 17:04:01 [scrapy.core.engine] INFO: Closing spider (finished)
2018-03-19 17:04:01 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 3251,
 'downloader/request_count': 5,
 'downloader/request_method_count/GET': 4,
 'downloader/request_method_count/POST': 1,
 'downloader/response_bytes': 38259,
 'downloader/response_count': 5,
 'downloader/response_status_count/200': 4,
 'downloader/response_status_count/302': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2018, 3, 19, 9, 4, 1, 638400),
 'log_count/DEBUG': 6,
 'log_count/INFO': 7,
 'request_depth_max': 3,
 'response_received_count': 4,
 'scheduler/dequeued': 5,
 'scheduler/dequeued/memory': 5,
 'scheduler/enqueued': 5,
 'scheduler/enqueued/memory': 5,
 'start_time': datetime.datetime(2018, 3, 19, 9, 3, 54, 707400)}
2018-03-19 17:04:01 [scrapy.core.engine] INFO: Spider closed (finished)

Process finished with exit code 0
  • 登錄失敗的Log:
2018-03-19 17:05:06 [scrapy.core.engine] INFO: Spider opened
2018-03-19 17:05:06 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-03-19 17:05:06 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6024
start mafengwo clawer
2018-03-19 17:05:07 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://passport.mafengwo.cn/> (referer: https://passport.mafengwo.cn/)
parseLoginPage: url = https://passport.mafengwo.cn/
2018-03-19 17:05:08 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.mafengwo.cn/> from <POST https://passport.mafengwo.cn/login/>
2018-03-19 17:05:10 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://passport.mafengwo.cn/> (referer: https://passport.mafengwo.cn/)
loginResParse: url = https://passport.mafengwo.cn/
2018-03-19 17:05:10 [scrapy.core.engine] DEBUG: Crawled (302) <GET http://www.mafengwo.cn/plan/route.php> (referer: https://passport.mafengwo.cn/)
2018-03-19 17:05:10 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <302 http://www.mafengwo.cn/plan/route.php>: HTTP status code is not handled or not allowed
2018-03-19 17:05:10 [scrapy.core.engine] INFO: Closing spider (finished)
2018-03-19 17:05:10 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 2234,
 'downloader/request_count': 4,
 'downloader/request_method_count/GET': 3,
 'downloader/request_method_count/POST': 1,
 'downloader/response_bytes': 5044,
 'downloader/response_count': 4,
 'downloader/response_status_count/200': 2,
 'downloader/response_status_count/302': 2,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2018, 3, 19, 9, 5, 10, 368900),
 'httperror/response_ignored_count': 1,
 'httperror/response_ignored_status_count/302': 1,
 'log_count/DEBUG': 5,
 'log_count/INFO': 8,
 'request_depth_max': 2,
 'response_received_count': 3,
 'scheduler/dequeued': 4,
 'scheduler/dequeued/memory': 4,
 'scheduler/enqueued': 4,
 'scheduler/enqueued/memory': 4,
 'start_time': datetime.datetime(2018, 3, 19, 9, 5, 6, 871900)}
2018-03-19 17:05:10 [scrapy.core.engine] INFO: Spider closed (finished)
subject = clawerName had finished, reason = finished, finishedTime = 2018-03-19 17:05:10.368900

Process finished with exit code 0
  • 對比一下,就可以看到,在驗證用戶登錄狀態這個步驟時,如果用戶處於非登錄狀態,而且又不允許頁面重定向(302)到登錄頁面,那麼爬蟲就會在這個地方終止,不再繼續往後爬取
loginResParse: url = https://passport.mafengwo.cn/
2018-03-19 17:05:10 [scrapy.core.engine] DEBUG: Crawled (302) <GET http://www.mafengwo.cn/plan/route.php> (referer: https://passport.mafengwo.cn/)
2018-03-19 17:05:10 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <302 http://www.mafengwo.cn/plan/route.php>: HTTP status code is not handled or not allowed

4. 注意事項

  • settings設置
'ROBOTSTXT_OBEY': False,    # default Obey robots.txt rules,因爲很多網站都禁止爬蟲爬取
'DOWNLOAD_DELAY': 2,        # 下載延時,默認是0,防止過快,導致IP和帳號被封
'COOKIES_ENABLED': True,    # 默認enable,爬取登錄後的數據時需要啓用
  • header的配置
# 需要有,否則服務器會拒絕請求
headerData = {
    "Referer": "https://passport.mafengwo.cn/",
    'User-Agent': "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36",
    }
  • 下載中間件配置:middleware.py
# 由於要保持用戶登錄狀態,所以用戶使用的user-agent,IP地址,都不要變。
# 要不然容易導致用戶數據異常,賬戶被封。
# 這些設置,都在middleware.py中,所以尤其需要注意

5. cookie的本地存儲與使用

  • 在驗證用戶登錄成功之後,可以選擇把cookie保存下來。然後在下次登錄時,可以直接使用這個cookie登錄(當然,並不推薦這種方式

5.1. 把cookie保存在本地

# 文件mafengwoSpider.py

# 將cookie保存到文件中
def convertToCookieFormat(cookieLstInfo, cookieFileName):
    '''
    CookieReq = [b'PHPSESSID=427jcfptrsogeg7onenojvqmp0; mfw_uuid=5ab0adb9-177d-a7d3-a47a-9522417e0652; oad_n=a%3A3%3A%7Bs%3A3%3A%22oid%22%3Bi%3A1029%3Bs%3A2%3A%22dm%22%3Bs%3A20%3A%22passport.mafengwo.cn%22%3Bs%3A2%3A%22ft%22%3Bs%3A19%3A%222018-03-20+14%3A44%3A09%22%3B%7D; __today_login=1; mafengwo=d336513fb8fc6edd490db9725739bb85_94281374_5ab0adbac4ba51.24002232_5ab0adbac4ba92.98161419; uol_throttle=94281374; mfw_uid=94281374']
    :param cookieLstInfo:
    :return:
    '''
    cookieDict = {}
    if len(cookieLstInfo) > 0:
        # bs = str(b, encoding = "utf8")
        cookieStr = str(cookieLstInfo[0], encoding="utf8")
        print(f"cookieStr = {cookieStr}")
        for cookieItemStr in cookieStr.split(";"):
            cookieItem = cookieItemStr.strip().split("=")
            print(f"cookieItemStr = {cookieItemStr}, cookieItem = {cookieItem}")
            cookieDict[cookieItem[0].strip()] = cookieItem[1].strip()
        print(f"cookieDict = {cookieDict}")

        # 將cookie寫入到文件中,方便後面使用
        with open(cookieFileName, 'w') as f:
            for cookieKey, cookieValue in cookieDict.items():
                f.write(str(cookieKey) + ':' + str(cookieValue) + '\n')
        return cookieDict

# 第五步:分析用戶的登錄狀態, 如果登錄成功,那麼接着爬取其他頁面
# 如果登錄失敗,爬蟲會直接終止。
def isLoginStatusParse(self, response):
    print(f"isLoginStatusParse: url = {response.url}")

    # 查詢網址的Cookie
    # 發出請求的Cookie, 事實上是要存儲這個cookie,因爲當用戶登錄成功之後,
    # 以後,就會將cookie信息放到請求中,帶給服務器,來表明自己的身份
    CookieReq = response.request.headers.getlist('Cookie')
    print(CookieReq = {CookieReq}')
    cookieFileName = "mafengwoCookies.txt"
    cookieDict = convertToCookieFormat(Cookie, cookieFileName)

    # 響應Cookie
    Cookie = response.headers.getlist('Set-Cookie')
    print(f"Set-Cookie = {Cookie}")

    # 如果能進到這一步,都沒有出錯的話,那麼後面就可以用登錄狀態,訪問後面的頁面了
    # ………………………………
    # 不需要存儲cookie
    # 其他網頁爬取
    # ………………………………
    yield scrapy.Request(
        url = "https://www.mafengwo.cn/travel-scenic-spot/mafengwo/10045.html",
        headers=self.headerData,
        # 如果不指定callback,那麼默認會使用parse函數
    )
  • 存儲結果如下
# 文件:mafengwoCookies.txt

PHPSESSID:vperarhkjekdsv5mut4vjk9ri0
mfw_uuid:5ab0bcc6-0279-cbef-673e-15fd2c0b73c5
oad_n:a%3A3%3A%7Bs%3A3%3A%22oid%22%3Bi%3A1029%3Bs%3A2%3A%22dm%22%3Bs%3A20%3A%22passport.mafengwo.cn%22%3Bs%3A2%3A%22ft%22%3Bs%3A19%3A%222018-03-20+15%3A48%3A22%22%3B%7D
__today_login:1
mafengwo:926d677d880bf9c3981934bb3d710b8c_94281374_5ab0bcc8e795c0.78689785_5ab0bcc8e79637.22817262
uol_throttle:94281374
mfw_uid:94281374

5.2. 讀取cookie使用

  • 這個部分,當然,也可以直接用瀏覽器登錄,然後從瀏覽器中拿到cookie,然後作爲登錄的憑證。
# 從文件中,把cookie信息取出來
def getCookieFromFile(cookieFileName):
    '''
        PHPSESSID:nkv0d5g29bde1ni5p9bha8cq04
        mfw_uuid:5ab0b3a3-22ac-61f1-ba72-db5a070c7e5d
        oad_n:a%3A3%3A%7Bs%3A3%3A%22oid%22%3Bi%3A1029%3Bs%3A2%3A%22dm%22%3Bs%3A20%3A%22passport.mafengwo.cn%22%3Bs%3A2%3A%22ft%22%3Bs%3A19%3A%222018-03-20+15%3A09%3A23%22%3B%7D
        __today_login:1
        mafengwo:7e7cd3cffefcc05d3cbb217172a2d9fa_94281374_5ab0b3a5ac8007.33269268_5ab0b3a5ac8053.87485829
        uol_throttle:94281374
        mfw_uid:94281374
    :param cookieFileName:
    :return:
    '''
    cookieDict = {}
    f = open(cookieFileName, "r")  # 打開文件
    for line in f.readlines():
        print(f"line = {line}")
        if line != "":
            cookieItem = line.split(":")
            cookieDict[cookieItem[0].strip()] = cookieItem[1].strip()
    f.close()  # 關閉文件
    return cookieDict


# 爬蟲運行的起始位置
def start_requests(self):
    print("start mafengwo clawer")
    cookieFileName = "mafengwoCookies.txt"
    cookieDict = getCookieFromFile(cookieFileName)

    # 通過訪問個人中心頁面的返回狀態碼來判斷是否爲登錄狀態
    # 這個頁面,只有登錄過的用戶,才能訪問。否則會被重定向(302) 到登錄頁面
    routeUrl = "http://www.mafengwo.cn/plan/route.php"
    # 下面有兩個關鍵點
    # 第一個是header,如果不設置,會返回500的錯誤
    # 第二個是dont_redirect,設置爲True時,是不允許重定向,用戶處於非登錄狀態時,是無法進入這個頁面的,服務器返回302錯誤。
    #       dont_redirect,如果設置爲False,允許重定向,進入這個頁面時,會自動跳轉到登錄頁面。會把登錄頁面抓下來。返回200的狀態碼
    yield scrapy.Request(
        url=routeUrl,
        headers=self.headerData,
        cookies=cookieDict,
        meta={
            # 'dont_redirect': True,    # 禁止網頁重定向302, 如果設置這個,但是頁面又一定要跳轉,那麼爬蟲會異常
            # 'handle_httpstatus_list': [301, 302]      # 對哪些異常返回進行處理
        },
        callback=self.isLoginStatusParse,
        dont_filter=True,
    )
  • 需要說明的是:
  • 第一,如果cookie是能用的,那確實很方便。
  • 第二,但是如果一旦cookie失效了,那麼這個cookie就會在所有的requests中流轉,不但無法訪問rout頁面,同時也無法訪問重定向(302)後的登錄頁面,爬蟲也就異常終止了(這也是不推薦使用cookie登錄的原因)。如下:
line = #mfw_uid:9474669944

2018-03-20 15:58:09 [scrapy.downloadermiddlewares.cookies] DEBUG: Sending cookies to: <GET http://www.mafengwo.cn/plan/route.php>
Cookie: #PHPSESSID=vperarhkjekdsv5mut4vjk9ri0; #mfw_uuid=5ab0bcc6-0279-cbef-673e-15fd2c0b73c5; #oad_n=a%3A3%3A%7Bs%3A3%3A%22oid%22%3Bi%3A1029%3Bs%3A2%3A%22dm%22%3Bs%3A20%3A%22passport.mafengwo.cn%22%3Bs%3A2%3A%22ft%22%3Bs%3A19%3A%222018-03-20+15%3A48%3A22%22%3B%7D; #__today_login=1; #mafengwo=926d677d880bf9c3981934bb3d710b8c_94281374_5ab0bcc8e795c0.78689785_5ab0bcc8e79637.22817262; #uol_throttle=94281374; #mfw_uid=94281374

2018-03-20 15:58:09 [scrapy.downloadermiddlewares.cookies] DEBUG: Received cookies from: <302 http://www.mafengwo.cn/plan/route.php>
Set-Cookie: PHPSESSID=25kotnplj2fl5ftd0m6gari4b6; path=/; domain=.mafengwo.cn; HttpOnly

Set-Cookie: mfw_uuid=5ab0bfef-bfc3-a0d8-da65-a49fe77e191a; expires=Wed, 20-Mar-2019 08:01:51 GMT; Max-Age=31536000; path=/; domain=.mafengwo.cn

Set-Cookie: oad_n=a%3A3%3A%7Bs%3A3%3A%22oid%22%3Bi%3A1029%3Bs%3A2%3A%22dm%22%3Bs%3A15%3A%22www.mafengwo.cn%22%3Bs%3A2%3A%22ft%22%3Bs%3A19%3A%222018-03-20+16%3A01%3A51%22%3B%7D; expires=Tue, 27-Mar-2018 08:01:51 GMT; Max-Age=604800; path=/; domain=.mafengwo.cn

2018-03-20 15:58:09 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.mafengwo.cn?return_url=http%3A%2F%2Fwww.mafengwo.cn%2Fplan%2Froute.php> from <GET http://www.mafengwo.cn/plan/route.php>
2018-03-20 15:58:09 [scrapy.downloadermiddlewares.cookies] DEBUG: Sending cookies to: <GET https://passport.mafengwo.cn?return_url=http%3A%2F%2Fwww.mafengwo.cn%2Fplan%2Froute.php>
Cookie: #PHPSESSID=vperarhkjekdsv5mut4vjk9ri0; #mfw_uuid=5ab0bcc6-0279-cbef-673e-15fd2c0b73c5; #oad_n=a%3A3%3A%7Bs%3A3%3A%22oid%22%3Bi%3A1029%3Bs%3A2%3A%22dm%22%3Bs%3A20%3A%22passport.mafengwo.cn%22%3Bs%3A2%3A%22ft%22%3Bs%3A19%3A%222018-03-20+15%3A48%3A22%22%3B%7D; #__today_login=1; #mafengwo=926d677d880bf9c3981934bb3d710b8c_94281374_5ab0bcc8e795c0.78689785_5ab0bcc8e79637.22817262; #uol_throttle=94281374; #mfw_uid=94281374; PHPSESSID=25kotnplj2fl5ftd0m6gari4b6; mfw_uuid=5ab0bfef-bfc3-a0d8-da65-a49fe77e191a; oad_n=a%3A3%3A%7Bs%3A3%3A%22oid%22%3Bi%3A1029%3Bs%3A2%3A%22dm%22%3Bs%3A15%3A%22www.mafengwo.cn%22%3Bs%3A2%3A%22ft%22%3Bs%3A19%3A%222018-03-20+16%3A01%3A51%22%3B%7D

2018-03-20 15:58:12 [scrapy.core.engine] DEBUG: Crawled (400) <GET https://passport.mafengwo.cn?return_url=http%3A%2F%2Fwww.mafengwo.cn%2Fplan%2Froute.php> (referer: https://passport.mafengwo.cn/)
2018-03-20 15:58:12 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <400 https://passport.mafengwo.cn?return_url=http%3A%2F%2Fwww.mafengwo.cn%2Fplan%2Froute.php>: HTTP status code is not handled or not allowed
2018-03-20 15:58:12 [scrapy.core.engine] INFO: Closing spider (finished)
發佈了73 篇原創文章 · 獲贊 244 · 訪問量 69萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章