基於requests的爬蟲基礎

原創

Tany_Python

2020-07-03 14:57

1、首先安裝requests模塊 pip install requests

2、給出url和 headers的參數：

3、瀏覽器中按F12 進入network ，刷新頁面，然後點擊

基礎完整代碼如下：

import requests  #導入模塊
url = 'https://www.baidu.com/'  #url地址

headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"}  # 請求頭，用來模擬瀏覽器
#response = requests.get(url=url,headers=headers).text   #返回純文本，可能亂碼


response = requests.get(url=url,headers=headers).content.decode("utf-8","ignore") #返回指定編碼格式

## 加入參數ignore可以忽略部分編碼


with open("baidu.html","w",encoding="utf-8") as f:   #寫入文件
    f.write(response)

url帶參數完整代碼如下：

第一種方式：

import requests
url = 'https://www.baidu.com/s?wd=哈士奇'
headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"}
response = requests.get(url=url,headers=headers).content.decode("utf-8")
with open("hashiqi.html","w",encoding="utf-8") as f:
    f.write(response)

第二種方式：

import requests
url = 'https://www.baidu.com/s?'
params={"wd":"邊"}

headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"}
response = requests.get(url=url,params=params,headers=headers).content.decode("utf-8")

with open("bian1.html","w",encoding="utf-8") as f:
    f.write(response)

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

基於requests的爬蟲基礎

基礎完整代碼如下：

url帶參數完整代碼如下：

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

Python 安裝庫指令大全

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

基於 Milvus + LlamaIndex 實現高級 RAG

【2024-05-21】以茶會友

字符串方法大全

The method is not allowed for the requested URL

基於requests的爬蟲基礎

數組的保存和讀取

pandas數據處理進階（二）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結