什麼是Urllib
內置的一個http請求庫,不需要額外的安裝,不需要了解底層到底怎麼實現。
- urllib.request 請求模塊
- urllib.error 異常處理模塊
- urllib.parse url解析模塊
- urllib.robotparser
import urllib2
response = urllib2.urlopen('http://www.baidu.com')
什麼是Requests
基於urllib3,Python實現的簡單易用的http請求庫
相關用法
- 基本GET請求
import requests
response = requests.get('http://httpbin.org/get')
print response.text
請求結果:
{
"args": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Connection": "close",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.19.1"
},
"origin": "223.104.213.74",
"url": "http://httpbin.org/get"
}
- 帶參數的GET請求,字典方式傳值
import requests
data = {
'name': 'lt',
'age': 18
}
response = requests.get('http://httpbin.org/get', params=data)
print response.text
請求結果:
{
"args": {
"age": "18",
"name": "lt"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Connection": "close",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.19.1"
},
"origin": "223.104.213.74",
"url": "http://httpbin.org/get?age=18&name=lt"
}
- 解析Json
import requests
data = {
'name': 'lt',
'age': 18
}
response = requests.get('http://httpbin.org/get', params=data)
print response.json()
- 獲取二進制數據
import requests
response = requests.get('https://ss1.bdstatic.com/kvoZeXSm1A5BphGlnYG/skin_zoom/178.jpg?2')
with open('e:/aaa.jpg', 'wb') as f:
f.write(response.content)
f.close()
- 添加一個headers,僞裝
不加headers,返回400:
import requests
response = requests.get('https://www.zhihu.com/')
print response.status_code
加了之後,返回200:
import requests
headers = {
'user-agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'
}
response = requests.get('https://www.zhihu.com/', headers = headers)
print response.status_code
- POST請求
import requests
data = {
'aaaa' : 'bbbb'
}
response = requests.post('http://httpbin.org/post', data = data)
print response.json()
響應
response屬性
- status_code
- headers
- cookies
- url
- history
狀態碼的判斷
import requests
response = requests.post('http://httpbin.org/post')
print response.status_code == 200
高級操作
- 文件上傳
import requests
files = {'file': open('e:/aaa.jpg', 'rb')}
response = requests.post('http://httpbin.org/post',files = files)
print response.json()
- 獲取cookies
import requests
response = requests.get('http://www.baidu.com')
print response.cookies
for key,value in response.cookies.items():
print(key + ' = ' + value)
- 會話維持(用作登錄驗證)
如果是:
import requests
requests.get('http://httpbin.org/cookies/set/number/123456')
response = requests.get('http://httpbin.org/cookies')
print response.text
返回{"cookies":{}}
改爲:
import requests
s = requests.session()
s.get('http://httpbin.org/cookies/set/number/123456')
response = s.get('http://httpbin.org/cookies')
print response.text
返回:
{"cookies":{"number":"123456"}}
- 證書驗證
證書是不合法的,這種情況下會報
requests.exceptions.SSLError
import requests
response = requests.get('https://www.12306.cn')
print response.status_code
改爲
import requests
import urllib3
urllib3.disable_warnings() #消除警告
response = requests.get('https://www.12306.cn', verify=False)
print response.status_code
- 代理設置
import requests
proxies = {
'http':'http://127.0.0.1:8743',
'https':'https://127.0.0.1:9743'
}
response = requests.get('https://www.taobao.com', proxies = proxies)
print response.status_code
- 超時的設置
import requests
try:
requests.get('https://www.taobao.com/', timeout=0.1)
except requests.exceptions.ConnectTimeout:
print 'ConnectTimeout'
except requests.exceptions.Timeout:
print 'Timeout'
- 認證設置
auth屬性