Python3中request模塊訪問網頁以及客戶端僞裝

在python3中我們使用request模塊訪問一個網頁,可以選擇對文件的讀寫或者urllib.request.urlretrieve()方法將我們瀏覽的頁面保存到本地。
方法1:
url_list=["http://www.bundcredit.com","http://www.baidu.com","http://www.winnerlook.com","http://www.winnertoke.com"]
for urlinfo in url_list:
file=urllib.request.urlopen(urlinfo)
data=file.read()
with open(str(urlinfo).split(".")[1]+".html","wb") as fileinfo:
fileinfo.write(data)

方法2:
filename=urllib.request.urlretrieve("http://www.cniao5.com/course/sz.html",filename=str(fileline)
檢查Web服務器Nginx的訪問日誌:
IP地址 時間 訪問方法 訪問協議 訪問狀態等
180.156.222.228 - - [26/Nov/2017:20:02:02 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
模擬瀏覽器-Headers屬性1:
import urllib.request
import re
url="http://www.bundcredit.com"
headers = ("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0")
opener = urllib.request.build_opener()
opener.addheaders=[headers]
data=opener.open(url).read()
with open( "1.html", "wb") as fileinfo:
fileinfo.write(data)

僞裝後的請求:
180.156.222.228 - - [26/Nov/2017:20:57:22 +0800] "GET / HTTP/1.1" 200 4462 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" "-"
180.156.222.228 - - [26/Nov/2017:20:57:22 +0800] "GET / HTTP/1.1" 200 4462 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" "-"
180.156.222.228 - - [26/Nov/2017:20:57:22 +0800] "GET / HTTP/1.1" 200 4462 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" "-"
180.156.222.228 - - [26/Nov/2017:20:57:22 +0800] "GET / HTTP/1.1" 200 4462 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" "-"

模擬瀏覽器—Headers屬性2
url="http://www.bundcredit.com"
req=urllib.request.Request(url)
req.add_header("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0")
data=urllib.request.urlopen(req).read()
print(data)

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章