urllib庫中的urllib.parsel解析模塊使用

urlib.parse模塊，主要是對url數據進行解析，分解，組合等操作。目前urllib.parse模塊下主要有urllib.parse.urlpase，urllib.parse.urlunparse,urlliib.parse.urljoin和urlencode常用幾個方法。

1.urlparse()的使用

urlparse模塊主要是把url拆分爲6部分，並返回元組。urlparse將url分爲6個部分，返回一個包含6個字符串項目的元組：協議、位置、路徑、參數、查詢、片段。解析url的urlparpse函數使用，參數格式如下：

urllib.parse.urlparse(urlstring, scheme='', allow_fragments=True)

1.1.urlparse()只有一個參數urlstring的使用

from urllib.parse import urlparse

result = urlparse('http://www.baidu.com/index.html;user?id=5#comment')
print(type(result), result)

'''結果如下：
<class 'urllib.parse.ParseResult'> ParseResult(scheme='http', netloc='www.baidu.com', 
path='/index.html', params='user', query='id=5', fragment='comment')

'''

如上代碼輸出結果所示：其中 scheme 是協議,netloc 是域名服務器，path 相對路徑，params是參數，query是查詢的條件。

1.2.urlparse()，scheme參數的使用，解析協議

from urllib.parse import urlparse

result = urlparse('www.baidu.com/index.html;user?id=5#comment', scheme='https')
print(result)

'''將url按照https的協議進行解析，輸入的url沒有帶協議版本
ParseResult(scheme='https', netloc='', path='www.baidu.com/index.html', params='user', query='id=5', fragment='comment')'''

2.如果輸入的url已經帶協議版本了，按實際的協議解析,如下儘管指定https,實際按http解析
from urllib.parse import urlparse

result = urlparse('http://www.baidu.com/index.html;user?id=5#comment', scheme='https')
print(result)
'''結果如下:
ParseResult(scheme='http', netloc='www.baidu.com', path='/index.html', params='user', query='id=5', fragment='comment')
'''

1.3.urlparse的allow_fragments參數使用

#演示1:
from urllib.parse import urlparse

result = urlparse('http://www.baidu.com/index.html;user?id=5#comment', allow_fragments=False)
print(result)
'''結果如下
ParseResult(scheme='http', netloc='www.baidu.com', path='/index.html', params='user', query='id=5#comment', fragment='')
'''

#演示2.
from urllib.parse import urlparse

result = urlparse('http://www.baidu.com/index.html#comment', allow_fragments=False)
print(result)
'''結果如下：
ParseResult(scheme='http', netloc='www.baidu.com', path='/index.html#comment', params='', query='', fragment='')
'''

2.urlunparse是urlparse功能的相對作用

#1.對網頁解析，使用urlparse

from urllib.parse import urlparse

result = urlparse('https://www.baidu.com/s?wd=urlparse&rsv_spt=1&rsv_iqid=0x953bd4980021e01a&issp=1&f=8&rsv_bp=1&rsv_idx=2&ie=utf-8&rqlang=cn&tn=baiduhome_pg&rsv_enter=1&oq=urrparse&rsv_t=45167nYI8NDE6%2Bb1WvuUFOa44byBJFoinf0m87edhrxTkQZS9Miqh5laqUbkoGFI5ACl&inputT=3153&rsv_pq=8065196e001fc0c7&rsv_sug3=23&bs=urrparse')
print(type(result), result)

'''解析結果如下：

<class 'urllib.parse.ParseResult'> ParseResult(scheme='https', netloc='www.baidu.com', path='/s', params='', query='wd=urlparse&rsv_spt=1&rsv_iqid=0x953bd4980021e01a&issp=1&f=8&rsv_bp=1&rsv_idx=2&ie=utf-8&rqlang=cn&tn=baiduhome_pg&rsv_enter=1&oq=urrparse&rsv_t=45167nYI8NDE6%2Bb1WvuUFOa44byBJFoinf0m87edhrxTkQZS9Miqh5laqUbkoGFI5ACl&inputT=3153&rsv_pq=8065196e001fc0c7&rsv_sug3=23&bs=urrparse', fragment='')

'''

#2.對上面解析的網頁數據進行urlunparse操作
from urllib.parse import urlunparse

data = ['https', 'www.baidu.com', '/s', '', 'wd=urlparse&rsv_spt=1&rsv_iqid=0x953bd4980021e01a&issp=1&f=8&rsv_bp=1&rsv_idx=2&ie=utf-8&rqlang=cn&tn=baiduhome_pg&rsv_enter=1&oq=urrparse&rsv_t=45167nYI8NDE6%2Bb1WvuUFOa44byBJFoinf0m87edhrxTkQZS9Miqh5laqUbkoGFI5ACl&inputT=3153&rsv_pq=8065196e001fc0c7&rsv_sug3=23&bs=urrparse', '']
print(urlunparse(data))

'''urlunparse結果如下：

https://www.baidu.com/s?wd=urlparse&rsv_spt=1&rsv_iqid=0x953bd4980021e01a&issp=1&f=8&rsv_bp=1&rsv_idx=2&ie=utf-8&rqlang=cn&tn=baiduhome_pg&rsv_enter=1&oq=urrparse&rsv_t=45167nYI8NDE6%2Bb1WvuUFOa44byBJFoinf0m87edhrxTkQZS9Miqh5laqUbkoGFI5ACl&inputT=3153&rsv_pq=8065196e001fc0c7&rsv_sug3=23&bs=urrparse


'''

3.urljoin對多個url進行合併

合併的原則是以後面的url爲準，如果後面的有則留下，如果沒有則從前面的取值補充。

from urllib.parse import urljoin

print(urljoin('http://www.baidu.com', 'FAQ.html'))
print(urljoin('http://www.baidu.com', 'https://cuiqingcai.com/FAQ.html'))
print(urljoin('http://www.baidu.com/about.html', 'https://cuiqingcai.com/FAQ.html'))
print(urljoin('http://www.baidu.com/about.html', 'https://cuiqingcai.com/FAQ.html?question=2'))
print(urljoin('http://www.baidu.com?wd=abc', 'https://cuiqingcai.com/index.php'))
print(urljoin('http://www.baidu.com', '?category=2#comment'))
print(urljoin('www.baidu.com', '?category=2#comment'))
print(urljoin('www.baidu.com#comment', '?category=2'))

'''結果如下：
http://www.baidu.com/FAQ.html
https://cuiqingcai.com/FAQ.html
https://cuiqingcai.com/FAQ.html
https://cuiqingcai.com/FAQ.html?question=2
https://cuiqingcai.com/index.php
http://www.baidu.com?category=2#comment
www.baidu.com?category=2#comment
www.baidu.com?category=2
'''

4.urlencode把字典對象轉換成get請求參數

from urllib.parse import urlencode

params = {
    'name': 'germey',
    'age': 22
}
base_url = 'http://www.baidu.com?'
url = base_url + urlencode(params)
print(url)

'''測試結果如下：
http://www.baidu.com?name=germey&age=22
'''

urllib庫中的urllib.parsel解析模塊使用

urlib.parse模塊，主要是對url數據進行解析，分解，組合等操作。目前urllib.parse模塊下主要有urllib.parse.urlpase，urllib.parse.urlunparse,urlliib.parse.urljoin和urlencode常用幾個方法。

1.urlparse()的使用

2.urlunparse是urlparse功能的相對作用

3.urljoin對多個url進行合併

4.urlencode把字典對象轉換成get請求參數

Python常用庫urllib中urllib.request模塊使用詳解

黎明前的黑夜靜悄悄

windows下python常用庫的安裝

python爬蟲requests的庫使用詳解

urllib庫中的urllib.parsel解析模塊使用

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結