Python3實現HTTP請求

文章目錄

1 urllib實現

2 request實現

1 urllib實現

關於urllib、urllib2和urllib3的區別可以查看。python3中，urllib被打包成一個包，所擁有的模塊如下：

名稱	作用
urllib.request	打開和讀取url
urllib.error	處理request引起的異常
urllib.parse	解析url
urllib.robotparser	解析robots.txt文件

1.1 完整請求與響應模型的實現

urllib2提供一個基礎函數urlopen，通過向指定的URL發出請求來獲取數據，最簡單的形式如下：

# coding: utf-8
import warnings
warnings.filterwarnings('ignore')
from urllib import request

"""響應"""
res = request.urlopen('http://www.zhihu.com') #可以設置timeout，例如timeout=2
html = res.read()
print(html)

輸出：

b'<!doctype html>\n<html lang="zh" data-hairline="true" data-theme="light"><head><meta charSet="utf-8"/><title data-react...'

以上代碼可以分爲兩步：

# coding: utf-8
import warnings
warnings.filterwarnings('ignore')
from urllib import request

"""請求"""
req = request.Request('http://www.zhihu.com')
"""響應"""
res = request.urlopen(req)
html = res.read()
print(html)

以上的兩者方法都是GET請求，接下來對POST請求進行說明：

# coding: utf-8
import warnings
warnings.filterwarnings('ignore')
from urllib import request

url = 'https://www.xxx.com//login'
postdata = {b'username': b'miao', 
            b'password': b'123456'}
"""請求"""
req = request.Request(url, postdata)
"""響應"""
res = request.urlopen(req)
html = res.read()
print(html)

這個自己試試就行。

1.2 請求頭headers處理

下面的例子對添加請求頭信息進行說明，包括設置User-Agent和Referer：

# coding: utf-8
import warnings
warnings.filterwarnings('ignore')
from urllib import request

url = 'https://www.xxx.com//login'
postdata = {b'username': b'xxx', 
            b'password': b'******'}
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
referer = 'https://www.github.com'
herders = {'User-Agent': user_agent, 'Referer': referer}
"""請求"""
req = request.Request(url, postdata, herders)
"""響應"""
res = request.urlopen(req)
html = res.read()
print(html)

請求頭信息也可以用add_header來添加：

# coding: utf-8
import warnings
warnings.filterwarnings('ignore')
from urllib import request

url = 'https://www.xxxxxx.com//login'
postdata = {b'username': b'xxx', 
            b'password': b'******'}
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
referer = 'https://www.github.com'
req = request.Request(url, postdata)

"""修改"""
req.add_header('User-Agent', user_agent)
req.add_header('Referer', referer)

res = request.urlopen(req)
html = res.read()
print(html)

注意：.
對某些header要特別注意，服務器會針對這些header進行檢查，例如：

User-Agent：有些服務器或Proxy會通過該值來判斷是否是瀏覽器發出的請求
Content-Type：在使用REST接口時，服務器會檢查該值，用來確定HEEP Body的內容該怎樣解析，在使用服務器提供的RESTful或SOAP服務時，該值的設置錯誤會導致服務器拒絕服務。常見的取值如下：

application/xml (在XML RPC，如RESTful/SOAP調用時使用
application/json (在JSON RPC調用時使用)
application/x-www-form-urlencoded (瀏覽器提交Web表單時使用）

Referer：服務器有時會檢查防盜鏈。

1.3 Cookie處理

如果需要得到某個Cookie的值，可以採取如下做法：

# coding: utf-8
import warnings
warnings.filterwarnings('ignore')
from urllib import request
from http import cookiejar

cookie = cookiejar.CookieJar()
opener = request.build_opener(request.HTTPCookieProcessor(cookie))
"""響應"""
res = opener.open('http://www.zhihu.com')
for item in cookie:
    print(item.name + ": " + item.value)

輸出：

_xsrf: 467z...
_zap: 4f91...
KLBRSID: ed2a...

當然可以按自己的需要手動添加Cookie的內容：

# coding: utf-8
import warnings
warnings.filterwarnings('ignore')
from urllib import request

cookie = ('Cookie', 'email=' + '[email protected]')
opener = request.build_opener()
opener.addheaders = [cookie]
"""請求"""
req = request.Request('http://www.zhihu.com')
"""響應"""
res = opener.open(req)
print(res.headers)
retdata = res.read()

輸出：

Date: Tue, 09 Jun 2020 06:45:54 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 49014
Connection: close
Server: CLOUD ELB 1.0.0...

1.4 獲取HTTP響應碼

對於200OK來說，只需使用urlopen返回對象的getcode()即可獲得HTTP的響應碼。但是對於其他響應碼，則會拋出異常：

# coding: utf-8
import warnings
warnings.filterwarnings('ignore')
from urllib import request

try:
    """響應"""
    res = request.urlopen('http://www.zhihu.com')
    print(res.getcode())
except request.HTTPError as e:
    if hasattr(e, 'code'):
        print("Error code: ", e.code)

輸出：

1.5 重定向

以下代碼將檢查是否出現了重定向動作：

# coding: utf-8
import warnings
warnings.filterwarnings('ignore')
from urllib import request

try:
    """響應"""
    res = request.urlopen('http://www.zhihu.com')
    print(res.geturl())
except request.HTTPError as e:
    if hasattr(e, 'code'):
        print("Error code: ", e.code)

輸出：

https://www.zhihu.com/signin?next=%2F

如果不想重定向，則可以自定義HTTPRedirectHandler類：

# coding: utf-8
import warnings
warnings.filterwarnings('ignore')
from urllib import request

class RedirectHandler(request.HTTPRedirectHandler):
    def http_error_301(self, req, fp, code, msg, headers):
        pass
    
    def http_error_302(self, req, fp, code, msg, headers):
        result = request.HTTPRedirectHandler.http_error_301(self, req, fp, code, msg, headers)
        result.status = code
        result.newurl = result.geturl()
        return result
    
opener = request.build_opener(RedirectHandler)
res = opener.open('http://www.zhihu.cn')
print(res)

輸出：

<http.client.HTTPResponse object at 0x000001BEAC776160>

1.6 Proxy的設置

示例如下：

# coding: utf-8
import warnings
warnings.filterwarnings('ignore')
from urllib import request

proxy = request.ProxyHandler({'http': '127.0.0.1: 8087'})
opener = request.build_opener(proxy)
res = opener.open('http://www.zhihu.com/')
print(res.read())

輸出：

2 request實現

2.1 完整請求與響應模型的實現

1）GET請求：

# coding: utf-8
import warnings
warnings.filterwarnings('ignore')
import requests

res = requests.get('http://www.zhihu.com')
print(res.content)

2）POST請求：

# coding: utf-8
import warnings
warnings.filterwarnings('ignore')
import requests

postdata = {'key' : 'value'}
res = requests.post('http://www.zhihu.com', data=postdata)
print(res.content)

HTTP中其他請求方式示例如下：

requests.put (‘http://www.xxxxxx.com/put’，data={‘key’:‘value’})
requests.delete (‘http://www.xxxxxx.com/delete’)
requests.head (‘http://www.xxxxxx.com/get’)
requests.options (‘http://www.xxxxxx.com/get’)

3）複雜URL的輸入，除了使用完整的URL，requests還提供了以下方式：

# coding: utf-8
import warnings
warnings.filterwarnings('ignore')
import requests

payload = {'Keywords': 'bolg:qiyeboy', 'pageindex': 1}
"""可設置timeout"""
res = requests.get('http://www.zhihu.com', params=payload)
print(res.url)

輸出：

https://www.zhihu.com/?Keywords=bolg%3Aqiyeboy&pageindex=1

2.2 響應與編碼

以res = requests.get(‘http://www.zhihu.com’) 爲例，其返回值中：

res.content：字節形式
res.text：文本形式
res.encoding：根據HTTP頭猜測的網頁編碼格式

這裏使用第三方庫chardet來進行字符串 / 文件編碼檢測：

# coding: utf-8
import warnings
warnings.filterwarnings('ignore')
import requests
import chardet

res = requests.get('http://www.zhihu.com')
"""
detect返回字典，包括：
    - 'encoding'：編碼形式 
    - 'confidence'：檢測精確度
    - 'language'：超文本標記語言
"""
ret_dic = chardet.detect(res.content)
"""使用檢測到的編碼形式解碼"""
res.encoding = ret_dic['encoding']
print(ret_dic)
print(res.text)

輸出：

{'encoding': 'ascii', 'confidence': 1.0, 'language': ''}
<html>

<head><title>400 Bad Request</title></head>

<body bgcolor="white">

<center><h1>400 Bad Request</h1></center>

<hr><center>openresty</center>

</body>

</html>

2.3 請求頭headers處理

# coding: utf-8
import warnings
warnings.filterwarnings('ignore')
import requests

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers = {'User-Agent': user_agent}
res = requests.get('http://www.zhihu.com', headers=headers)
print(res.content)

2.4 響應碼code和請求頭headers處理

# coding: utf-8
import warnings
warnings.filterwarnings('ignore')
import requests

res = requests.get('http://www.baidu.com')

"""
res.status_code：獲取響應碼
res.status_code == requests.codes.ok：判斷相應碼
"""
if res.status_code == requests.codes.ok:
    print("響應碼：", res.status_code)
    print("響應頭：", res.headers)
    print("字段獲取：", res.headers.get('content-type'))
else:
	"""
	當相應碼是4XX或5XX時，raise_for_status()會拋出異常
	當相應碼是200時，raise_for_status()返回None
	"""
    res.raise_for_status()

輸出：

響應碼： 200
響應頭： {'Cache-Control': 'private, no-cache, no-store, proxy-revalidate, no-transform', 'Connection': 'keep-alive', 'Content-Encoding': 'gzip', 'Content-Type': 'text/html', 'Date': 'Tue, 09 Jun 2020 13:42:42 GMT', 'Last-Modified': 'Mon, 23 Jan 2017 13:27:52 GMT', 'Pragma': 'no-cache', 'Server': 'bfe/1.0.8.18', 'Set-Cookie': 'BDORZ=27315; max-age=86400; domain=.baidu.com; path=/', 'Transfer-Encoding': 'chunked'}
字段獲取： text/html

2.5 Cookie處理

1）自動Cookie：

# coding: utf-8
import warnings
warnings.filterwarnings('ignore')
import requests

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers={'User-Agent':user_agent}
res = requests.get('http://www.baidu.com', headers=headers)

for cookie in res.cookies.keys():
    print(cookie + ": " + res.cookies.get(cookie))

輸出：

BAIDUID: D285BF54C9CC968744699A9B4F843D60:FG=1
BIDUPSID: D285BF54C9CC9687F9E45D28DB4C9F33
H_PS_PSSID: 1456_31326_21100_31069_31765_31673_30823
PSTM: 1591710519
BDSVRTM: 0
BD_HOME: 1

2）自定義Cookie：

# coding: utf-8
import warnings
warnings.filterwarnings('ignore')
import requests

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers={'User-Agent':user_agent}
"""自定義"""
cookies = dict(name='guangtouqiang', age='18')
res = requests.get('http://www.baidu.com', headers=headers, cookies=cookies)

print(res.text)

3）自動處理Cookie：

# coding: utf-8
import warnings
warnings.filterwarnings('ignore')
import requests

login_url = 'http://www.zhihu.com/login'
s = requests.Session()
datas = {'name': 'guangtouqiang', 'passwd': '123456'}
"""
遊客模式，服務器先分配一個cookie， 如果沒有這一步，系統會認爲時非法用戶
allow_redirects=True表示允許重定向，如果重定向，則可通過res.history查看歷史信息
"""
s.get(login_url, allow_redirects=True) 
"""驗證成功，權限將升級到會員權限"""
res = s.post(login_url, data=datas, allow_redirects=True)
print(res.text)

輸出：

<html>

<head><title>400 Bad Request</title></head>

<body bgcolor="white">

<center><h1>400 Bad Request</h1></center>

<hr><center>openresty</center>

</body>

</html>

Python3實現HTTP請求

文章目錄

1 urllib實現

1.1 完整請求與響應模型的實現

1.2 請求頭headers處理

1.3 Cookie處理

1.4 獲取HTTP響應碼

1.5 重定向

1.6 Proxy的設置

2 request實現

2.1 完整請求與響應模型的實現

2.2 響應與編碼

2.3 請求頭headers處理

2.4 響應碼code和請求頭headers處理

2.5 Cookie處理

2.6 重定向和歷史信息

微服務實踐之使用 Visual Studio 2022 調試Dapr 應用程序

wpf附加屬性理解 WPF附加屬性

論文閱讀 (七)：Multi-Instance Dimensionality Reduction (2010 MIDR)

beautifulsoup4的使用

論文閱讀 (五)：Scalable Multi-Instance Learning (miFV2014)

Python線程與進程

論文閱讀 (二)：Multi-instance learning with key instance shift (MIKI2017)

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結