1、HTTP請求
1.1 請求方法
1.2 請求頭部
2、爬蟲基礎-Requests庫入門
2.1 Requests庫的安裝
2.2 Requests庫的請求方法
import requests
# get 獲取
response = requests.get('https://www.douban.com/')
# post 提交
requests.post('https://www.douban.com/')
2.3 Requests庫的響應對象
2.4 響應狀態碼
418 反爬蟲
200 正常登錄
import requests
url = 'https://www.douban.com/search'
r = requests.get(url)
# 狀態碼
code = r.status_code
print(code)
沒有定製頭部文件,被反爬蟲了
2.5 定製請求頭部
# headers 頭部信息
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'}
# 網址
url = 'https://www.douban.com/search'
# get 獲取
r = requests.get(url, headers=headers)
2.6 重定向與超時
# timeout=3 3秒內網頁無反應拋出timeout異常
r = requests.get(url, headers=headers, timeout=3)
# 重定向 ,重新定位到網頁,相當於重新訪問,刷新
r.history
2.7 傳遞URL參數
import requests
# headers 頭部信息
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'}
# 網址
url = 'https://www.douban.com/search'
payload = {'q': 'python', 'cat': '1001'}
# get 獲取
# timeout=3 3秒內網頁無反應拋出timeout異常
r = requests.get(url, headers=headers, timeout=3,params=payload)
url = r.url
print(url)
2.7.1 更改cat
1、搜索全部,不加cat
import requests
# headers 頭部信息
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'}
# 網址
url = 'https://www.douban.com/search'
payload = {'q': 'python'}
# get 獲取
# timeout=3 3秒內網頁無反應拋出timeout異常
r = requests.get(url, headers=headers, timeout=3,params=payload)
url = r.url
print(url)
2、搜索圖片,cat=1025
import requests
# headers 頭部信息
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'}
# 網址
url = 'https://www.douban.com/search'
payload = {'q': 'python', 'cat': '1025'}
# get 獲取
# timeout=3 3秒內網頁無反應拋出timeout異常
r = requests.get(url, headers=headers, timeout=3,params=payload)
url = r.url
print(url)