urllib2庫是涉及到url資源請求的常用庫
官方文檔:urllib2 — extensible library for opening URLs
常用函數:
urllib2.urlopen(url [, data [, timeout ][, cafile][, capath][, cadefault ][, context ])
例子:url:可以是string,也可以是Request對象
timeout:設置請求超時
返回的對象有geturl()、info()、read()方法
geturl()方法獲取連接地址
info()方法獲取返回網頁信息
read()方法獲取返回網頁內容
import urllib2 url = 'http://www.csdn.net/' html = urllib2.urlopen(url, timeout=5)
urllib2.Request(url [, data][, headers][, origin_req_host][, unverifiable])
例子:url:爲合法的url,string
headers:瀏覽器頭
import urllib2 url="http://www.csdn.net/" headers = {"User-Agent":"Mozilla/4.0;MSTE 6.0; Windows NT 5.1"} req = urllib2.Request(url, headers=headers) html = urllib2.urlopen(req)
錯誤處理:
URLError
import urllib2 try: html = urllib2.urlopen("http://www.csdn.net/") except urllib2.URLError, e: print e.reason
HTTPError
SocketErrorimport urllib2 try: html = urllib2.urlopen("http://www.csdn.net") except urllib2.HTTPError, e: print e.code print e.reason
import socket try: html = urllib2.urlopen("http://www.csdn.net") except urllib2.SocketError, e: print e.reason
連接超時捕獲
import urllib2 import socket try: urllib2.urlopen("http://example.com", timeout = 1) except urllib2.URLError, e: if isinstance(e.reason, socket.timeout): print "There was an error: %r" % e