看了下我們學校圖書館,順手爬個已借書籍
一開始把學校想得沒那麼簡單,嘗試模擬登錄,提交表單裏面出現了一堆奇奇怪怪的東西,查了一下發現是.net生成的,又花了一會時間去抓這個參數
最後,emmmm,沒毛病,篩子系統果然cookies就能過,真是暴力
import requests
from bs4 import BeautifulSoup
def main():
headers = {
'User-Agent': 'Mozilla / 5.0(X11;Linux x86_64) AppleWebKit / 537.36(KHTML, like Gecko) Ubuntu Chromium / 68.0.3440.106 Chrome / 68.0.3440.106 Safari / 537.36',
'Cookie': 'ASP.NET_SessionId=dnlcydvvqwnc3yax1ymja2ji',
}
wb_data = requests.get('http://218.196.244.90:8080/Borrowing.aspx', headers = headers)
soup = BeautifulSoup(wb_data.text, 'lxml')
titles = soup.select('#ctl00_ContentPlaceHolder1_GridView1_ctl0{}_HyperLink1'.format(str(3)) #format從2開始到你借的書數量+1)
print(titles)
main()
這裏沒對輸出處理
[<a href="Book.aspx?id=0199151729" id="ctl00_ContentPlaceHolder1_GridView1_ctl03_HyperLink1" style="color:#980000;color: #800000; font-weight: 700; font-size: small;" title="海邊的卡夫卡">海邊的卡夫卡</a>]
有空再繼續阿