在浏览器中获取用户的cookie信息

在使用爬虫时,如果爬取简单的网页信息时是比较简单的

例如:


import requests
from bs4 import BeautifulSoup


user = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36"
headers = {"User-Agent": user}
reponse = requests.get("https://www.baidu.com/",headers=headers)
print(reponse.status_code)
reponse.encoding = reponse.apparent_encoding
soup = BeautifulSoup(reponse.text,'html.parser')
print(soup)

通过这样的方式就可以获得一个结构化的网页

有一些网站稍微复杂一些,需要通过用户的cookie信息才能访问,这个时候就需要在,找到用户的cookie信息了,在你需要爬取的网页上登录账号,按F12打开开发者模式,刷新页面信息,先点击network 然后点击左边的第一个数据,这个时候在边就会出现,headers等信息,在headers中就可以找到账号的cookie信息了

如:
在这里插入图片描述

代码:

import requests
from bs4 import BeautifulSoup

user = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36"

cookie = "uuid_tt_dd=10_6647180090-1565705226664-828296; dc_session_id=10_1565705226664.653162; smidV2=201909021440594a9f393f93f293496a0b3490a2ecb61500d17bcb038fb0ee0; UserName=weixin_43654083; UserInfo=0beba30009e74a71b81d1eca59e9d1d6; UserToken=0beba30009e74a71b81d1eca59e9d1d6; UserNick=%E5%86%85%E5%B8%88%E5%A4%A7%E6%A0%91%E8%8E%93%E5%B0%8F%E9%98%9F; AU=391; UN=weixin_43654083; BT=1570070714264; p_uid=U000000; Hm_ct_6bcd52f51e9b3dce32bec4a3997715ac=6525*1*10_6647180090-1565705226664-828296!1788*1*PC_VC!5744*1*weixin_43654083; __gads=Test; firstDie=1; Hm_lvt_eb5e3324020df43e5f9be265a8beb7fd=1574508727; Hm_ct_eb5e3324020df43e5f9be265a8beb7fd=5744*1*weixin_43654083!6525*1*10_6647180090-1565705226664-828296; announcement=%257B%2522isLogin%2522%253Atrue%252C%2522announcementUrl%2522%253A%2522https%253A%252F%252Fblogdev.blog.csdn.net%252Farticle%252Fdetails%252F103053996%2522%252C%2522announcementCount%2522%253A0%252C%2522announcementExpire%2522%253A3600000%257D; Hm_lvt_6bcd52f51e9b3dce32bec4a3997715ac=1574559007,1574559191,1574559204,1574559745; Hm_lpvt_6bcd52f51e9b3dce32bec415ac=1574561316; dc_tos=q1gbac"

headers={"User-Agent":user,"Cookie":cookie}

reponse = requests.get("网址",headers=headers)

print(reponse.status_code)

reponse.encoding = reponse.apparent_encoding

soup = BeautifulSoup(reponse.text,'html.parser')

print(soup)

这样就可以获得一个需要登录网站的结构化页面了

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章