urllib庫解析

原創

2020-06-21 23:54

# coding:utf-8
import urllib.request

url = "https://www.baidu.com"

response = urllib.request.urlopen(url)

print(response)  # 內存地址

print(response.url)  # 打印url

print(response.status)  # 200

#print(response.headers)

#print(response.read().decode("utf-8"))  # 打印網頁

# with open("a.html", "w", encoding="utf-8") as fp:
#     fp.write(response.read().decode("utf-8"))


# with open("b.html", "wb") as fp:
#     fp.write(response.read())


html = response.read()
print(type(html))  # <class 'bytes'>

html2 = response.read().decode("utf-8")
print(type(html2))  # <class 'str'>
"""
文件編碼
在python 3 中字符是以Unicode的形式存儲的，
當然這裏所說的存儲是指存儲在計算機內存當中，
如果是存儲在硬盤裏，Python 3的字符是以bytes形式存儲，
也就是說如果要將字符寫入硬盤，就必須對字符進行encode。
對上面這段話再解釋一下，如果要將str寫入文件，
如果以‘w’模式寫入，則要求寫入的內容必須是str類型；
如果以‘wb’形式寫入，則要求寫入的內容必須是bytes類型。
"""
"""
網頁編碼和文件編碼方法差不多，
如下urlopen下載下來的網頁read()且用decoding(‘utf-8’)解碼，
那就必須以‘w’的方式寫入文件。
如果只是read()而不用encoding(‘utf-8’)進行編碼，一定要以‘wb’方式寫入
"""

urllib.request：發送request和獲取request的結果
urllib.error：包含urllib.request產生的異常
urllib.parse：用來解析和處理url
urllib.robotparse：用來解析頁面的robots.txt文件

urllib.request 模塊提供了最基本的構造 HTTP 請求的方法，利用它可以模擬瀏覽器的一個請求發起過程，同時它還帶有處理 authenticaton （授權驗證），redirections（重定向)，cookies(瀏覽器Cookies）以及其它內容。

報錯：SyntaxError: Non-UTF-8 code starting with
代碼第一行加：#coding:utf-8

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

urllib庫解析

DAPPER 事務 TRANSACTION

re學習（二）

urllib庫解析

urllib.request.urlretrieve及添加headers

整數轉換爲字符串（附上力扣代碼）

re爬取糗圖

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結