Python爬蟲學習日誌（7）

原創

2020-06-15 07:29

正則表達式

正則表達式

1.概念

2.Re庫的基本使用

正則表達式

RE:regular expression 或 regex

1.概念

正則表達式的作用

通用的字符串表達框架
簡潔表達一組字符串的表達式
針對字符串表達簡潔和特徵思想的工具
判斷某字符串的特徵歸屬
表達文本類型的特徵（病毒、入侵等）

正則表達式的使用

編譯：將符合正則表達式語法的字符串轉換成正則表達式特徵

正則表達式的語法：由字符和操作符構成

正則表達式的常用操作符

正則表達式語法實例

經典正則表達式實例

2.Re庫的基本使用

正則表達式的表示類型

raw string 類型（原生字符串類型）
原生字符串類型在字符串類型前加小寫的 ‘ r ’。（轉義符 ‘\’）
string 類型，更繁瑣

當正則表達式包含轉義符時，使用raw string

Re庫主要功能函數

Re庫的等價用法

編譯後的優勢：一次編譯，可以多次操作。
編譯後的對象可以調用上述六個主要功能函數。

Re庫主要功能函數的示例

import re
#search
match = re.search(r'[1-9]\d{5}', 'BIT 100081')
if match:
    print("search: ",match.group(0))
#match
match = re.match(r'[1-9]\d{5}', '100081 BIT')
if match:
    print("match: ",match.group(0))
#findall
ls = re.findall(r'[1-9]\d{5}', 'BIT100081 TSU100084')
print("findall: ",ls)
#split
ls2 = re.split(r'[1-9]\d{5}', 'BIT100081 TSU100084', maxsplit= 1)
print("split: ",ls2)
#finditer
for m in re.finditer(r'[1-9]\d{5}', 'BIT100081 TSU100084'):
    if m:
        print("finditer: ",m.group(0))
#sub
ls3 = re.sub(r'[1-9]\d{5}', ':zipcode', 'BIT100081 TSU100084')
print("sub: ",ls3)

輸出結果：

search:  100081
match:  100081
findall:  ['100081', '100084']
split:  ['BIT', ' TSU100084']
finditer:  100081
finditer:  100084
sub:  BIT:zipcode TSU:zipcode

Re庫的Match對象

Match對象的屬性
Match對象的方法

Match對象示例

import re
#search
m = re.search(r'[1-9]\d{5}', 'BIT100081 TSU100084')
print(type(m))

print(m.string)
print(m.re)
print(m.pos, m.endpos)
print(m.group(0))
print(m.start(), m.end(), m.span())

輸出結果：

<class 're.Match'>
BIT100081 TSU100084
re.compile('[1-9]\\d{5}')
0 19
100081
3 9 (3, 9)

Re庫的貪婪匹配和最小匹配

貪婪匹配
最小匹配（後面添加 “？”）

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Python爬蟲學習日誌（7）

正則表達式

正則表達式

1.概念

正則表達式的作用

正則表達式的使用

正則表達式的語法：由字符和操作符構成

正則表達式的常用操作符

正則表達式語法實例

經典正則表達式實例

2.Re庫的基本使用

正則表達式的表示類型

Re庫主要功能函數

Re庫的等價用法

Re庫主要功能函數的示例

Re庫的Match對象

Re庫的貪婪匹配和最小匹配

python gdal 安裝使用（Windows， python 3.6.8）

Python爬蟲學習日誌（1）

Python爬蟲學習日誌（2）

Python爬蟲學習日誌（4）

Python爬蟲學習日誌（5）

Python3：正則表達式的應用

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結