03Python爬蟲---延時以及GET和POST請求

原創

2020-02-24 21:26

一、超時設置

import urllib.request
# 如果想網站不出現超時異常就可以將timeout設置時間延長
for i in range(1, 100):  # 循環99次

    try:

        file = urllib.request.urlopen("http://yum.iqianyue.com",timeout=1)  # 超時設置爲1s

        data = file.read()

        print(len(data))

    except Exception as e:

        print("出現異常-->"+str(e))

二、GET請求

import urllib.request

keywd = 'hello'

url = 'http://www.baidu.com/s?wd='+keywd

req = urllib.request.Request(url)  # 構建一個Request對象

data = urllib.request.urlopen(req).read()  # 打開對應的對象

fhandle = open("/home/zyb/crawler/myweb/part4/4.html", "wb")

fhandle.write(data)

fhandle.close()

注意：需要優化的地方關鍵詞爲中文時，則會報錯UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position 10-11: ordinal not in range(128)

優化

url = 'http://www.baidu.com/s?wd='

key = "有道"

key_code = urllib.request.quote(key)  # 對關鍵詞部分進行編碼

url_all = url+key_code

req = urllib.request.Request(url_all)  # 構建一個Request對象

data = urllib.request.urlopen(req).read()  # 打開對應的對象

fhandle = open("/home/zyb/crawler/myweb/part4/5.html", "wb")

fhandle.write(data)

fhandle.close()

注意：
1. 必須爲GET請求
2. 以URL爲參數構建Request對象
3. 通過urlopen()打開構建的Request對象

三、POST請求

我們以www.iqianyue.com網站爲例
爬取思路:
1. 設置好URL地址
2. 構建表單數據,通過urllib.parse.urlencode對數據進行編碼處理
3. 創建Request對象，參數包括URL和傳遞的數據
4. 使用add_header()添加頭信息，模擬瀏覽器爬取
5. 使用urllib.request.urlopen()打開對象Request，完成信息的傳遞
6. 後續處理

import urllib.parse

url = "http://www.iqianyue.com/mypost/"

postdata = urllib.parse.urlencode({
    'name': "zhouyanbing",
    'pass': "zyb1121"
}).encode('utf-8')  # 將數據使用urlencode編碼處理後要使用encode設置爲utf-8編碼

req = urllib.request.Request(url,postdata)

req.add_header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36")

data = urllib.request.urlopen(req).read()

fhandle = open("/home/zyb/crawler/myweb/part4/6.html", "wb")

fhandle.write(data)

fhandle.close()

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

03Python爬蟲---延時以及GET和POST請求

一、超時設置

二、GET請求

優化

三、POST請求

釘釘打卡速度慢

Nginx R31 doc 官方文檔-01-nginx 如何安裝

Qt/C++音視頻開發74-合併標籤圖形/生成yolo運算結果圖形/文字和圖形合併成一個/水印濾鏡

挑戰程序設計競賽 2.2章習題 POJ - 3617 Best Cow Line 貪心

字節面試：MySQL什麼時候鎖表？如何防止鎖表？

.NET8連接SQL SERVER 2008 R2 報：證書鏈是由不受信任的頒發機構頒發的

golang開發環境搭建(win10)

python計算機視覺學習筆記——PIL庫的用法

Golang初學：獲取程序內存使用情況，std runtime

06Python爬蟲---正則表達式05之實戰

05Python爬蟲---小結

07Python爬蟲---Cookie實戰

08Python爬蟲---正則和Cookie小結

前端學習OneDay--JS ES6之let和const

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結