網絡請求模塊

requests

介紹

requests 模塊是可以模仿瀏覽器發送請求獲取響應
requests 模塊在python2,與python3中通用
requests模塊能夠自動幫助我們解壓網頁內容

requests模塊的安裝

pip install requests

如果你本地有python2,和python3兩個環境，你想裝在python3中，建議使用下面這種方式安裝

pip3 install requests

requests模塊的使用

基本使用

使用方式

# 導入模塊
import requests
# 定義請求地址
url = 'http://www.baidu.com'
# 發送 GET 請求獲取響應
response = requests.get(url)
# 獲取響應的 html 內容
html = response.text

代碼講解
response 常用屬性
- response.text 返回響應內容，響應內容爲 str 類型
- respones.content 返回響應內容,響應內容爲 bytes 類型
- response.status_code 返回響應狀態碼
- response.request.headers 返回請求頭
- response.headers 返回響應頭
- response.cookies 返回響應的 RequestsCookieJar 對象
response.content 轉換 str 類型

# 獲取字節數據
content = response.content
# 轉換成字符串類型
html = content.decode('utf-8')

response.cookies 操作

# 返回 RequestsCookieJar 對象
cookies = response.cookies
# RequestsCookieJar 轉 dict
requests.utils.dict_from_cookiejar(cookies)
# dict 轉 RequestsCookieJar
requests.utils.cookiejar_from_dict()
# 對cookie進行操作,把一個字典添加到cookiejar中
requests.utils.add_dict_to_cookiejar()

自定義請求頭

使用方式

# 導入模塊
import requests
# 定義請求地址
url = 'http://www.baidu.com'
# 定義自定義請求頭
headers = {
  "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
}
# 發送自定義請求頭
response = requests.get(url,headers=headers)
# 獲取響應的 html 內容
html = response.text

代碼講解

發送請求時添加 headers 參數作爲自定義請求頭

發送 GET 請求

使用方式

# 導入模塊
import requests
# 定義請求地址
url = 'http://www.baidu.com/s'
# 定義自定義請求頭
headers = {
  "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
}
# 定義 GET 請求參數
params = {
  "kw":"hello"
}
# 使用 GET 請求參數發送請求
response = requests.get(url,headers=headers,params=params)
# 獲取響應的 html 內容
html = response.text

代碼講解

發送請求時 params 參數作爲 GET 請求參數

發送 POST 請求

使用方式

# 導入模塊
import requests
# 定義請求地址
url = 'http://www.baidu.com'
# 定義自定義請求頭
headers = {
  "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
}
# 定義post請求參數
data = {
  "kw":"hello"
}

# 使用 POST 請求參數發送請求
response = requests.post(url,headers=headers,data=data)
# 獲取響應的 html 內容
html = response.text

代碼講解

發送請求時 data 參數作爲 POST 請求參數

保存圖片

使用方式

# 導入模塊
import requests
# 下載圖片地址
url = "http://docs.python-requests.org/zh_CN/latest/_static/requests-sidebar.png"
# 發送請求獲取響應
response = requests.get(url)
# 保存圖片
with open('image.png','wb') as f:
  f.write(response.content)

代碼講解

保存圖片時後綴名和請求的後綴名一致

保存必須使用 response.content 進行保存文件

使用代理服務器

作用
- 讓服務器以爲不是同一個客戶端在請求
- 防止我們的真實地址被泄露，防止被追究
使用代理的過程

代理分類
透明代理(Transparent Proxy)：透明代理雖然可以直接“隱藏”你的IP地址，但是還是可以查到你是誰。
匿名代理(Anonymous Proxy)：匿名代理比透明代理進步了一點：別人只能知道你用了代理，無法知道你是誰。
混淆代理(Distorting Proxies)：與匿名代理相同，如果使用了混淆代理，別人還是能知道你在用代理，但是會得到一個假的IP地址，僞裝的更逼真
高匿代理(Elite proxy或High Anonymity Proxy)：可以看出來，高匿代理讓別人根本無法發現你是在用代理，所以是最好的選擇。

在使用的使用，毫無疑問使用高匿代理效果最好

從使用的協議：代理ip可以分爲http代理，https代理，socket代理等，使用的時候需要根據抓取網站的協議來選擇

使用方式

# 導入模塊
import requests
# 定義請求地址
url = 'http://www.baidu.com'
# 定義自定義請求頭
headers = {
  "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
}
# 定義 代理服務器
proxies = {
  "http":"http://IP地址:端口號",
  "https":"https://IP地址:端口號"
}
# 使用 POST 請求參數發送請求
response = requests.get(url,headers=headers,proxies=proxies)
# 獲取響應的 html 內容
html = response.text

代碼講解

發送請求時 proxies 參數設置代理

發送請求攜帶 Cookies

使用方式

直接在自定義請求頭中攜帶 Cookie

通過請求參數攜帶 Cookie 對象

代碼

# 導入模塊
import requests
# 定義請求地址
url = 'http://www.baidu.com'
# 定義自定義請求頭
headers = {
  "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
  # 方式一：直接在請求頭中攜帶Cookie內容
  "Cookie": "Cookie值"
}
# 方式二：定義 cookies 值
cookies = {
  "xx":"yy"
}
# 使用 POST 請求參數發送請求
response = requests.get(url,headers=headers,cookies=cookies)
# 獲取響應的 html 內容
html = response.text

代碼講解

發送請求時 cookies 參數攜帶 Cookies

錯誤證書處理

問題描述
使用方式

# 導入模塊
import requests

url = "https://www.12306.cn/mormhweb/"
# 設置忽略證書
response = requests.get(url,verify=False)

代碼講解

發送請求時 verify 參數設置爲 False 表示不驗證CA證書

超時處理

使用方式

# 導入模塊
import requests

url = "https://www.baidu.com"
# 設置忽略證書
response = requests.get(url,timeout=5)

代碼講解

發送請求時 timeout 參數設置爲超時秒數

重試處理

使用方式

#!/usr/bin/python3
# -*- coding: utf-8 -*-
'''
可以使用第三方模塊 retrying 模塊
1. pip install retrying

'''
import requests
# 1. 導入模塊
from retrying import retry

# 2. 使用裝飾器進行重試設置
# stop_max_attempt_number 表示重試次數
@retry(stop_max_attempt_number=3)
def parse_url(url):
    print("訪問url:",url)
    headers = {
        "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36"
    }
    proxies = {
        "http":"http://124.235.135.210:80"
    }
    # 設置超時參數
    response = requests.get(url,headers=headers,proxies=proxies,timeout=5)
    return response.text

if __name__ == '__main__':
    url = "http://www.baidu.com"
    try:
        html = parse_url(url)
        print(html)
    except Exception as e:
        # 把 url 記錄到日誌文件中，未來進行手動分析，然後對url進行重新請求
        print(e)

代碼講解
安裝 retrying 模塊

retrying 模塊可以通過裝飾器模式對某個函數進行監控，如果該函數引發異常就會觸發重試操作

pip install retrying

對需要重試的函數進行裝飾器設置

通過 @retry(stop_max_attempt_number=重試次數) 參數設置重試次數

# 1. 導入模塊
from retrying import retry
# 2. 裝飾器設置重試函數
@retry(stop_max_attempt_number=3)
def exec_func():
    pass

urllib

python3 中使用urllib網絡庫

#!/usr/bin/python3
# -*- coding: utf-8 -*-
import urllib.request

# 2. 發起網絡請求
# 2.1. 定義請求地址
url = "https://github.com"
# 2.2. 自定義請求頭
headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36",
    "Referer": "https://github.com/",
    "Host": "github.com"
}

# 定義請求對象
req = urllib.request.Request(
    url=url,
    headers=headers
)

# 發送請求
resp = urllib.request.urlopen(req)

# 處理響應
with open('github.txt', 'wb') as f:
    f.write(resp.read())

urllib使用注意事項

如果使用在URL中需要進行轉義

 #!/usr/bin/python3
 # -*- coding: utf-8 -*-

 # 1. 導入模塊
 import urllib.request
 import urllib.parse

 # 2. 發起請求獲取響應

 wd = input("請輸入查詢內容：")

 # 2.1 定義請求地址
 url = "https://www.baidu.com/s?wd="
 # 2.2 定義自定義請求頭
headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36",
    "Referer": "https://github.com/",
    "Host": "github.com"
}
 # 2.3 定義請求對象
 request = urllib.request.Request(
     url=url + urllib.parse.quote(wd),
     headers=headers
 )
 # 2.4 發送請求
 response = urllib.request.urlopen(request)

 # 3. 處理響應
 with open('02.html','wb') as f:
     f.write(response.read())
response.read()

返回值是字節串，獲取字符串內容需要進行 decode

 html = response.read().decode('utf-8')

轉載自https://github.com/Kr1s77/Python-crawler-tutorial-starts-from-zero

Python網絡請求模塊Requests的使用

網絡請求模塊

requests

介紹

requests模塊的安裝

requests模塊的使用

基本使用

自定義請求頭

發送 GET 請求

發送 POST 請求

保存圖片

使用代理服務器

發送請求攜帶 Cookies

錯誤證書處理

超時處理

重試處理

urllib

urllib使用注意事項

一個簡單的校園網登錄程序 || 爬蟲+tkinter

GridView佈局初試-Flutter

Java學習筆記（語言基礎及面向對象）

bottomNavigationBar導航欄-Flutter

返回頁面數據-Flutter

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結