Python學習總結（四）——網絡爬蟲urllib庫函數

原創

2020-06-15 03:11

#coding=utf-8
'''import urllib
b=urllib.urlopen('http://www.baidu.com')
b.read()'''
'''
Urllib是python內置的HTTP請求庫
包括以下模塊
urllib.request            請求模塊
urllib.error              異常處理模塊
urllib.parse              url解析模塊
urllib.robotparser        robots.txt解析模塊
'''
#urllib request
#get請求方式

import urllib.request
import re#正則表達式

#response = urllib.request.urlopen('http://www.12306.com')
'''urlopen一般常用的有三個參數，它的參數如下：
   urllib.requeset.urlopen(url,data,timeout)
   response.read()可以獲取到網頁的內容，如果沒有read()，將返回如下內容
   response.info()遠程服務器頭信息
   response.getcode() 返回爲200找到服務器，返回404沒有找到  5開頭，服務器bug
   urllib.request.urlretrieve(address,filename) 從網絡下載網頁內容到本地
 '''
#print(response)
#print(response.read().decode('utf-8'))
#print(response.info())#遠程服務器頭信息
#print(response.getcode())#返回爲200找到服務器，返回404沒有找到  5開頭，服務器bug
#re=urllib.request.urlretrieve("https://img-blog.csdn.net/20180104135540208?watermark/2/text")
#print(re)
#data請求方'
res=urllib.request.urlopen('http://www.douyu.com')
str=res.read().decode('utf-8')
imgList=re.findall(r'src="(.*?\.jpg)"',str)
print(imgList[0])
url=imgList[0]
urllib.request.urlretrieve(url,'C:\\Users\\Administrator\\Desktop\\1.jpg')




'''import urllib.parse
import urllib.request

data = bytes(urllib.parse.urlencode({'word': 'hello'}), encoding='utf8')
print(data)
response = urllib.request.urlopen('http://httpbin.org/post', data=data)
print(response.read())'''

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Python學習總結（四）——網絡爬蟲urllib庫函數

《日本蠟燭圖》讀書筆記 & 技術分析回測

Python多線程編程深度探索：從入門到實戰

《期貨-市場技術分析》讀書筆記

mongodb處理json數據很好

頂級 Javaer 都在用的 20 個類庫，真香！

[轉帖]cpupower

google瀏覽器插件開發

35K*14 薪，入職了！這公司只要不裁員，我能一直呆下去！

C++學習總結（九）——C++數組array vector tuple

C++學習總結（十三）——類中的指針與引用，以及new，delete的原理

Python學習總結（四）——網絡爬蟲urllib庫函數

C++學習總結（十六）——類的繼承，多繼承，虛基類的繼承

C++學習總結（四）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結