python爬蟲常見異常及處理方法

原創

woyaojinqu

2018-08-24 13:26

在編寫python爬蟲時經常會遇到異常中斷的情況，導致爬蟲意外終止，一個理想的爬蟲應該能夠在遇到這些異常時繼續運行。下面就談談這幾種常見異常及其處理方法：

異常1：requests.exceptions.ProxyError

對於這個錯誤，stackoverflow給出的解釋是
The ProxyError exception is not actually the requests.exceptions exception; it an exception with the same name from the embedded urllib3 library, and it is wrapped in a MaxRetryError exception.
翻譯過來就是這個錯誤實際上不是requests.exceptions中的異常，這是嵌入到urllib2庫中的同名異常，這個異常是封裝在MaxRetryError當中的。補充一點，通常在代理服務器不通時出現這個異常。
異常2：requests.exceptions.ConnectionError

對於這個錯誤，stackoverflow給出的解釋是
In the event of a network problem (e.g. DNS failure, refused connection, etc), Requests will raise a ConnectionError exception.
翻譯過來就是說這是網絡問題出現的異常事件（如DNS錯誤，拒絕連接，等等），這是Requests庫中自帶的異常
一種解決辦法是捕捉基類異常，這種方法可以處理所有的異常情況:
try:
r = requests.get(url, params={’s’: thing})
except requests.exceptions.RequestException as e: # This is the correct syntax
print e
sys.exit(1)
另外一種解決辦法是分別處理各種異常，這裏面有三種異常：
try:
r = requests.get(url, params={’s’: thing})
except requests.exceptions.Timeout:
except requests.exceptions.TooManyRedirects:
except requests.exceptions.RequestException as e:
print e
sys.exit(1)
異常3：requests.exceptions.ChunkedEncodingError

對於這個錯誤，stackoverflow給出的解釋是
The link you included in your question is simply a wrapper that executes urllib’s read() function, which catches any incomplete read exceptions for you. If you don’t want to implement this entire patch, you could always just throw in a try/catch loop where you read your links.
問題中給出的鏈接是執行urllib’s庫的read函數時，捕捉到了讀取不完整數據導致的異常。如果你不想實現這個完整的不動，只要在讀取你的鏈接時拋出一個try/catch循環即可：
try:
page = urllib2.urlopen(urls).read()
except httplib.IncompleteRead, e:
page = e.partial

對於上面的異常，還有一個比較簡單易用的解決方法，就是直接在處理異常時返回函數原型，這樣就可以在捕捉到異常後繼續運行下去，直到不出現異常爲止，具體的實現方法如下：

def myfunc(para)
try:
     your code
except your except:
    print(your except)
    return myfunc

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python爬蟲常見異常及處理方法

前端使用 Konva 實現可視化設計器（13）- 折線 - 最優路徑應用【思路篇】

在(CListView)列表視圖中添加右鍵菜單的方法

How to Install Studio 3T(原文翻譯)

畢業論文的寫作方法總結

流暢的python學習筆記第一篇之爲什麼要學習這本書

Anaconda下安裝chardet

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結