我要爬爬蟲(2)-實例化處理器，代理及cookie

實例化處理器

當我們請求一個帶有彈出窗口驗證的網頁，例如http://httpbin.org/basic-auth/user/passwd

HTTPPasswordMgrWithDefaultRealm函數可以加入用戶名和密碼信息。
使用HTTPBasicAuthHandler函數實例化處理器。
使用build_opener函數可以實例化一個opener，是上節學習的urlopen()的一般性方法。urlopen()相當於封裝了最常用的請求方法，而爲了實現更高級的功能，我們需要更深層的配置，更底層的實例來操作。

from urllib.request import HTTPBasicAuthHandler,HTTPPasswordMgrWithDefaultRealm,build_opener
from urllib.error import URLError
username='Tom'
password='123'
url='http://httpbin.org/basic-auth/user/passwd'
p=HTTPPasswordMgrWithDefaultRealm()
p.add_password(None,url,username,password)
handler=HTTPBasicAuthHandler(p)
opener=build_opener(handler)#這裏build_opener的參數是處理器對象
try:
    response=opener.open(url)
    print(response.read().decode('utf-8'))
except URLError as e:
    print(e.reason)

使用代理

在網上隨便找了個免費代理，發送請求到http://httpbin.org/get

from urllib.error import URLError
from urllib.request import ProxyHandler,build_opener,Request
proxy_server=ProxyHandler({
    'http':'http://115.48.205.33:20798',
    'https':'http://115.48.205.33:20798'
})
opener=build_opener(proxy_server)
headers={
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36'
}
url='http://httpbin.org/get'
#url='http://www.baidu.com'
req=Request(url,headers=headers)
try:
    response=opener.open(req)
    print(response.read().decode('utf-8'))
except URLError as e:
    print(e.reason)

可以看到響應

origin那一項是代理IP.
若代理不可用，則會拋出異常。
[WinError 10061] 由於目標計算機積極拒絕，無法連接。

使用Cookies

cookie和session均是用戶登陸的憑證。其中cookie保存在客戶端，即瀏覽器裏；session保存在服務器端。
登錄服務器時，會產生cookie數據保存用戶的登陸信息。將cookie加入請求一併發送至服務器，識別通過後可以進入用戶登陸狀態，返回只有登錄才能訪問的內容。通過保存cookie在本地文件，需要的時候載入，便可以實現免登錄操作。

獲取Cookies

使用http.cookiejar模塊的CookieJar()函數可以實例化一個cookie對象,
cookie=http.cookiejar.CookieJar()
使用urllib.request模塊的HTTPCookieProcessor()函數可以實例化一個帶有cookie的處理器。
handler=urllib.request.HTTPCookieProcessor(cookie)

import  http.cookiejar,urllib.request
cookie=http.cookiejar.CookieJar()
handler=urllib.request.HTTPCookieProcessor(cookie)
opener=urllib.request.build_opener(handler)
url='http://www.baidu.com'
response=opener.open(url)
for item in cookie:
    print(item.name+'='+item.value)
print(cookie)

返回了cookie數據，格式爲name:value.

保存Cookies

爲了將cookie保存到本地，方便隨時載入，要用到http.cookiejar模塊的MozillaCookieProcessor()函數,
cookie=http.cookiejar.LWPCookieJar(filename) cookie.save(ignore_discard=True,ignore_expires=True)
這裏ignore_discard意爲忽視被丟棄，ignore_expires意爲忽視過期，均設置爲True可以讓被丟棄的，過期的cookie仍能被保存下來。

import http.cookiejar,urllib.request
file='LWP_cookie.txt'
cookie=http.cookiejar.LWPCookieJar(file)
handler=urllib.request.HTTPCookieProcessor(cookie)
url='http://www.baidu.com'
opener=urllib.request.build_opener(handler)
response=opener.open(url)
cookie.save(ignore_discard=True,ignore_expires=True)

使用MozillCookieJar()函數保存的cookie就是下圖的Mozilla格式。

而使用LWPCookieJar()函數保存的cookie就是下圖的LWP格式。

載入Cookies

什麼格式保存的cookie就用什麼方法實例化的cookie對象載入。
cookie.load(file)
可以看到MozillaCookieJar()函數在保存cookie時，應傳入文件名作爲參數；在載入cookie時，是不傳參數。LWPCookieJar()類似。

import http.cookiejar,urllib.request
file='cookie.txt'
cookie=http.cookiejar.MozillaCookieJar()
cookie.load(file)
handler=urllib.request.HTTPCookieProcessor(cookie)
url='http://www.baidu.com'
opener=urllib.request.build_opener(handler)
response=opener.open(url)
print(response.read().decode('utf-8'))

可以返回目標網站的源碼。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

我要爬爬蟲(2)-實例化處理器，代理及cookie

實例化處理器

使用代理

使用Cookies

獲取Cookies

保存Cookies

載入Cookies

Java子類訪問父類私有變量的思考

我要爬爬蟲(5)-正則表達式

我要爬爬蟲（15）用appium爬取手機QQ音樂歌名

我要爬爬蟲(3)-解析鏈接，Robots協議

我要爬爬蟲(4)-初識requests模塊

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結