在瀏覽器中獲取用戶的cookie信息

在使用爬蟲時,如果爬取簡單的網頁信息時是比較簡單的

例如:


import requests
from bs4 import BeautifulSoup


user = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36"
headers = {"User-Agent": user}
reponse = requests.get("https://www.baidu.com/",headers=headers)
print(reponse.status_code)
reponse.encoding = reponse.apparent_encoding
soup = BeautifulSoup(reponse.text,'html.parser')
print(soup)

通過這樣的方式就可以獲得一個結構化的網頁

有一些網站稍微複雜一些,需要通過用戶的cookie信息才能訪問,這個時候就需要在,找到用戶的cookie信息了,在你需要爬取的網頁上登錄賬號,按F12打開開發者模式,刷新頁面信息,先點擊network 然後點擊左邊的第一個數據,這個時候在邊就會出現,headers等信息,在headers中就可以找到賬號的cookie信息了

如:
在這裏插入圖片描述

代碼:

import requests
from bs4 import BeautifulSoup

user = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36"

cookie = "uuid_tt_dd=10_6647180090-1565705226664-828296; dc_session_id=10_1565705226664.653162; smidV2=201909021440594a9f393f93f293496a0b3490a2ecb61500d17bcb038fb0ee0; UserName=weixin_43654083; UserInfo=0beba30009e74a71b81d1eca59e9d1d6; UserToken=0beba30009e74a71b81d1eca59e9d1d6; UserNick=%E5%86%85%E5%B8%88%E5%A4%A7%E6%A0%91%E8%8E%93%E5%B0%8F%E9%98%9F; AU=391; UN=weixin_43654083; BT=1570070714264; p_uid=U000000; Hm_ct_6bcd52f51e9b3dce32bec4a3997715ac=6525*1*10_6647180090-1565705226664-828296!1788*1*PC_VC!5744*1*weixin_43654083; __gads=Test; firstDie=1; Hm_lvt_eb5e3324020df43e5f9be265a8beb7fd=1574508727; Hm_ct_eb5e3324020df43e5f9be265a8beb7fd=5744*1*weixin_43654083!6525*1*10_6647180090-1565705226664-828296; announcement=%257B%2522isLogin%2522%253Atrue%252C%2522announcementUrl%2522%253A%2522https%253A%252F%252Fblogdev.blog.csdn.net%252Farticle%252Fdetails%252F103053996%2522%252C%2522announcementCount%2522%253A0%252C%2522announcementExpire%2522%253A3600000%257D; Hm_lvt_6bcd52f51e9b3dce32bec4a3997715ac=1574559007,1574559191,1574559204,1574559745; Hm_lpvt_6bcd52f51e9b3dce32bec415ac=1574561316; dc_tos=q1gbac"

headers={"User-Agent":user,"Cookie":cookie}

reponse = requests.get("網址",headers=headers)

print(reponse.status_code)

reponse.encoding = reponse.apparent_encoding

soup = BeautifulSoup(reponse.text,'html.parser')

print(soup)

這樣就可以獲得一個需要登錄網站的結構化頁面了

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章