基於requests的爬蟲基礎

1、首先安裝requests模塊   pip install requests

2、給出url和 headers的參數:

3、瀏覽器中按F12  進入network ,刷新頁面,然後點擊

基礎完整代碼如下:

import requests  #導入模塊
url = 'https://www.baidu.com/'  #url地址

headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"}  # 請求頭,用來模擬瀏覽器
#response = requests.get(url=url,headers=headers).text   #返回純文本,可能亂碼


response = requests.get(url=url,headers=headers).content.decode("utf-8","ignore") #返回指定編碼格式

## 加入參數ignore可以忽略部分編碼


with open("baidu.html","w",encoding="utf-8") as f:   #寫入文件
    f.write(response)

url帶參數完整代碼如下:

第一種方式:

import requests
url = 'https://www.baidu.com/s?wd=哈士奇'
headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"}
response = requests.get(url=url,headers=headers).content.decode("utf-8")
with open("hashiqi.html","w",encoding="utf-8") as f:
    f.write(response)

第二種方式:

import requests
url = 'https://www.baidu.com/s?'
params={"wd":"邊"}

headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"}
response = requests.get(url=url,params=params,headers=headers).content.decode("utf-8")

with open("bian1.html","w",encoding="utf-8") as f:
    f.write(response)

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章