要登錄的網站
:https://www.1point3acres.com/bbs/
找到form
中的action
查看提交表單的目的地址:
https://www.1point3acres.com/bbs/member.php?mod=logging&action=login&loginsubmit=yes&infloat=yes&lssubmit=yes&inajax=1
登錄後,查看表單數據
作爲提交參數:
最後就是查看頭像的位置:
利用BeautifulSoup
先找到div
,在獲取其子節點得到img
中的src
屬性
import requests
from bs4 import BeautifulSoup
header = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36'
}
form_data = {
'username' : 'dave_lzw2020',
'password' : "Password123456.",
'quickforward' : 'yes',
'handlekey' : 'ls'
}
session = requests.Session()
html = session.post(
'https://www.1point3acres.com/bbs/member.php?mod=logging&action=login&loginsubmit=yes&infloat=yes&lssubmit=yes&inajax=1',
headers=header,
data=form_data
)
# print(html.text)
resp = session.get('https://www.1point3acres.com/bbs/',headers=header).text
# print(resp)
ht = BeautifulSoup(resp,'lxml')
div_node = ht.find('div',{'class':'avt y'})
print(div_node)
chnodes = div_node.children
print(chnodes)
img_src = [chnode.find('img')['src'] for chnode in chnodes if chnode.find('img') is not None]
print(img_src)
for src in img_src:
img_content = session.get(src,headers=header,verify=False).content
src = src.lstrip('https://').replace(r'/','-')
print(src)
with open('{src}.jpg'.format_map(vars()) , 'wb+') as f :
f.write(img_content)
# vars() : 返回對象object的屬性和屬性值的字典對象,如果沒有參數,就打印當前調用位置的屬性和屬性值 類似locals()
報錯及注意事項:
1.form_data
填寫務必正確,不然登陸失敗後訪問用戶頁面一直顯示
Access denied | www.1point3acres.com used Cloudflare to restrict
,讓我一直在找如何繞過Cloudflare
,
後面將post
返回的頁面打印出來才發現是密碼輸入錯誤
,根本沒有登陸成功。
2.報錯:[SSL: CERTIFICATE_VERIFY_FAILED]
,在get
裏面加一個verify=False
即可。如下:
img_content = session.get(src,headers=header,verify=False).content