深入理解 Python 的 re 模塊

re 模塊中常用的函數

re.compile()

用法：
re.compile() 用於編譯正則表達式，生成一個正則表達式模式對象，具有各種操作的方法。
re.compile(pattern, flags=0)
示例：

>>> import re
>>> p = re.compile(r'ab*')
>>> p
re.compile(r'ab*')
>>> dir(p)
['__class__', '__copy__', '__deepcopy__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'findall', 'finditer', 'flags', 'fullmatch', 'groupindex', 'groups', 'match', 'pattern', 'scanner', 'search', 'split', 'sub', 'subn']

能夠看到編譯成正則表達式對象後，提供了很多的方法來進行匹配操作。

re.match()

用法：
re.match() 從字符串的起始位置匹配，如果起始位置匹配不成功，則 match() 就返回 none。如果匹配成功，則可通過 group(num) 或 groups()獲取匹配結果。
re.match(pattern, string, flags=0)
示例:

>>> import re
>>> p = re.compile(r'[a-z]+')
>>> m = p.match('123abc')
>>> m
>>> print(m)
None
>>> 
>>> m = p.match('abc')
>>> m
<_sre.SRE_Match object; span=(0, 3), match='abc'>
>>> print(m)
<_sre.SRE_Match object; span=(0, 3), match='abc'>
>>> m.group()
'abc'
>>> m.start()
0
>>> m.end()
3
>>> m.span()
(0, 3)

說明：
group()：返回匹配的結果
start()：返回匹配開始的位置
end()：返回匹配結果的位置
span()：返回匹配（開始，結束）位置的元組

re.search()

用法：
re.search() 對整個字符串進行匹配並返回第一個成功的匹配字符串，否則返回 None
re.search(pattern, string, flags=0)
示例：

>>> import re
>>> p = re.compile(r'[a-z]+')
>>> s = p.search('123abc')
>>> print(s)
<_sre.SRE_Match object; span=(3, 6), match='abc'>
>>> s.group()
'abc'

re.findall()

用法：
在字符串中匹配所有滿足正則表達式的字符串，並返回一個列表，如果沒有找到匹配的，則返回空列表。
re.findall(pattern, string, flags=0)
示例：

>>> import re
>>> p = re.compile(r'\d+')
>>> p.findall('12a34b56c')
['12', '34', '56']

re.finditer()

用法：
在字符串中匹配所有滿足正則表達式的字符串，但 finditer 把它們作爲一個迭代器返回。
finditer(pattern, string, flags=0)
示例：

>>> import re
>>> re.compile(r'1a2b3c')
re.compile('1a2b3c')
>>> p = re.compile(r'\d+')
>>> p.finditer('12a34b56c')
<callable_iterator object at 0x7f168613bdd8>
>>> for m in p.finditer('12a34b56c'):
...     print(m.group(), m.span())
... 
12 (0, 2)
34 (3, 5)
56 (6, 8)

re.sub()

用法：
re.sub() 用於替換字符串中的匹配項。
re.subn() 與 re.sub() 基本一樣，不同的是它返回的是一個元組，包含新的字符串和替換次數
re.sub(pattern, repl, string, count=0, flags=0)
re.sub() 參數的解釋如下：
pattern : 正則中的模式字符串。
repl : 替換的字符串，也可爲一個函數。
string : 要被查找替換的原始字符串。
count : 模式匹配後替換的最大次數，默認 0 表示替換所有的匹配。
示例一：

>>> import re
>>> re.sub(r'[a-z]+', '456', 'abc123')
'456123'
>>> 
>>> p = re.compile(r'[a-z]+')
>>> p.sub('456', 'abc123')
'456123'

示例二：

>>> import re
>>> p = re.compile(r'[a-z]+')
>>> p.sub('456', 'abc123')
'456123'
>>> 
>>> p.subn('456', 'abc123')
('456123', 1)
>>> p.subn('456', 'abc123abc')
('456123456', 2)

re.split()

用法：
re.split() 方法按正則表達式的匹配拆分字符串。如果在RE中使用捕獲括號，則它們的內容也將作爲結果列表的一部分返回。
re.split(pattern, string, maxsplit=0, flags=0)
re.split() 參數解釋如下：
pattern : 正則表達式
string : 字符串
maxsplit : 顯示分隔的次數，默認爲0，不限制分割次數。
flags : 標誌位
示例：

>>> import re
>>> p = re.compile(r'\W+')
>>> p.split('This is a test')
['This', 'is', 'a', 'test']
>>> p.split('This is a test', maxsplit=1)
['This', 'is a test']

re 模塊中 flag 的理解與使用

flag即標誌位，主要是用於控制正則表達式的匹配方式。分別如下：
re.I(全拼：IGNORECASE): 忽略大小寫。
re.M(全拼：MULTILINE): 多行模式，改變 ^ 和 $ 的行爲。
re.S(全拼：DOTALL)： . 能匹配包含換行符在內的任意字符。
re.L(全拼：LOCALE): 使預定字符類 \w \W \b \B \s \S 取決於當前區域設定。
re.U(全拼：UNICODE): 使預定字符類 \w \W \b \B \s \S \d \D 取決於 unicode 定義的字符屬性。
re.X(全拼：VERBOSE): 詳細模式，主要是提高正則表達式的可讀性。這個模式下正則表達式可以是多行，忽略空白字符，並可以加入註釋。
re.S 示例：

>>> import re
>>> re_str = 'This is the first test.\nThis is the second test.'
>>> p = re.compile(r'This.*test')
>>> p.match(re_str).group()
'This is the first test'
>>>
>>>
>>> re_str = 'This is the first test.\nThis is the second test.'
>>> p = re.compile(r'This.*test', re.S)
>>> p.match(re_str).group()
'This is the first test.\nThis is the second test'

re.M 示例：

>>> import re
>>> re_str = 'This is the first test.\nThis is the second test.'
This is the first test.
This is the second test.
>>> p = re.compile(r'^This.*?test\.$')
>>> p.findall(re_str)
[]
>>> p = re.compile(r'^This.*?test\.$',re.S)
>>> p.findall(re_str)
['This is the first test.\nThis is the second test.']
>>> 
>>> p = re.compile(r'^This.*?test\.$',re.M)
>>> p.findall(re_str)
['This is the first test.', 'This is the second test.']

從上面的示例中能看到在沒有使用 re.M 前, ^ 和 $ 是把整個字符串一次性匹配，在引入 re.M 多行模式後，會把每一行看做單個字符串，逐行用 ^ 和 $ 匹配。

match 和 search 的區別

示例：

>>> import re
>>> p = re.compile(r'[a-z]+')
>>> p.match('123abc')
>>> m = p.match('123abc')
>>> print(m)
None
>>> s = p.search('123abc')
>>> print(s)
<_sre.SRE_Match object; span=(3, 6), match='abc'>
>>> s.group()
'abc'

通過上面的示例可以看到，match() 函數是從字符串開始處匹配，如果起始位置匹配不成功，則 match() 就返回 none，匹配成功，則返回匹配對象，而 search() 則是掃描整個字符串。search() 將掃描整個字符串，並返回它第一個匹配對象。通常 search() 比 match() 更適用。

理解 group 和 groups

先通過以下幾個簡單的示例，來區分一下 group 與 groups 有何不同。
示例一：

>>> import re
>>> p = re.compile(r'ab')
>>> s = p.search('abcd')
>>> s.group()
'ab'
>>> s.groups()
()

示例二：

>>> import re
>>> p = re.compile(r'(a)b')
>>> s = p.search('abcd')
>>> s.group()
'ab'
>>> s.group(0)
'ab'
>>> s.group(1)
'a'
>>> s.groups()
('a',)

示例三：

>>> import re
>>> p = re.compile(r'(a(b)c)d')
>>> s = p.search('abcd')
>>> s.group()
'abcd'
>>> s.group(0)
'abcd'
>>> s.group(1)
'abc'
>>> s.group(2)
'b'
>>> s.groups()
('abc', 'b')

從上面三個示例可以看出，group() 可以對匹配的正則表達式進行分組，組的編號默認從 0 開始，0 是匹配整體。groups() 是默認會返回一個元組，元組中的元素是匹配到的每個小括號中的內容。

另外從示例三中可以看到，分組也可以嵌套使用。

非捕獲組的使用方式

捕獲組和非捕獲組區別：
捕獲組用()作爲分組，並對分組中進行匹配並捕獲匹配的內容。
非捕獲組也是使用()作爲分組，只是括號內的格式爲 (?:pattern)，非捕獲組參與匹配但是不捕獲匹配的內容，這樣的分組就叫非捕獲組。
通過以下兩個示例進一步理解非捕獲組：
示例一：

>>> m = re.match(r"([abc])+", "abc")
>>> m.groups()
('c',)
>>> m = re.match(r"(?:[abc])+", "abc")
>>> m.groups()
()

示例二：

>>> import re
>>> p = re.compile(r'industr(y|ies)')
>>> s = p.search('industry')
>>> s.group()
'industry'
>>> s.groups()
('y',)
>>> 
>>> p = re.compile(r'industr(?:y|ies)')
>>> s = p.search('industry')
>>> s.group()
'industry'
>>> s.groups()
()

通過實例二能夠明顯看到，在使用分組的時候，s.groups() 是能夠捕獲到匹配的內容，但分組設置爲非捕獲組後，非捕獲組仍參與匹配，只是groups() 並沒有捕獲分組中匹配的內容。

命名組的使用方式

命令組定義：(?P<name>pattern)
命名組的行爲與捕獲組完全相同，並且還將命名組的名稱與對應組相關聯。

示例一：

>>> p = re.compile(r'(?P<word>\b\w+\b)')
>>> m = p.search('Lots of punctuation')
>>> m.group('word')
'Lots'
>>> m.group(1)
'Lots'

示例一中，定義了一個命名組叫做 word ，使用 search() 方法匹配一個單詞後返回結果。

示例二：

>>> p = re.compile(r'\b(?P<word>\w+)\s+(?P=word)\b')
>>> p.search('Paris in the the spring').group()
'the the'
>>> p = re.compile(r'\b(?P<word>\w+)\s+(\1)\b')
>>> p.search('Paris in the the spring').group()
'the the'

示例二中，分別使用了命名組和組編號引用的第一個()中的內容。

分組與替換的結合

re.sub() 方法結合分組來匹配替換

>>> import re
>>> p = re.compile(r'(hello)\s(abc)\s(123)')
>>> p.sub('\\1\\2', 'hello abc 123')
'helloabc'
>>> p.sub('\\1', 'hello abc 123')
'hello'
>>> p.sub('\\2', 'hello abc 123')
'abc'
>>> p.sub('\\3', 'hello abc 123')
'123'
>>> p.sub('\\1\\2', 'hello abc 123')
'helloabc'
>>> p.sub('\\1\\3', 'hello abc 123')
'hello123'

re.sub() 方法結合命名組來匹配替換

>>> import re
>>> p = re.compile(r'(?P<g1>hello)\s(?P<g2>abc)\s(?P<g3>123)')
>>> p.sub('\g<g1>', 'hello abc 123')
'hello'
>>> p.sub('\g<g2>', 'hello abc 123')
'abc'
>>> p.sub('\g<g3>', 'hello abc 123')
'123'
>>> p.sub('\g<g1>\g<g2>', 'hello abc 123')
'helloabc'
>>> p.sub('\g<g2>\g<g3>', 'hello abc 123')
'abc123'

常用正則表達式

以上涉及到的命令操作環境是 Python 3.6。

參考：
https://docs.python.org/3/howto/regex.html
https://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html

深入理解 Python 的 re 模塊

re 模塊中常用的函數

re.compile()

re.match()

re.search()

re.findall()

re.finditer()

re.sub()

re.split()

re 模塊中 flag 的理解與使用

match 和 search 的區別

理解 group 和 groups

非捕獲組的使用方式

命名組的使用方式

分組與替換的結合

常用正則表達式

vue項目獲取富文本編輯器wangEditor內容導出爲word（html轉word格式並下載）

dotnet C# 創建 X11 應用時設置窗口背景顏色

Navicat安裝與激活教程

TDengine docker安裝方法

vue3組件通信與props

sapui5

Alpine Linux apk add DNS lookup error

部分JDK版本的發佈時間

工作中用到的腳本合集

合併代碼時Beyond Compare設置

JAVA面試基礎篇（JAVA集合）（一）

Java常見框架面試問題，面試官會怎樣問關於框架的問題？這裏都給你總結好了！

工作服的問題

（二）ODS層更新：源表和目標表，沒有last_update,比對取增量，卻重複抽到某部分數據，怎麼解決？

網絡安全行業全領域白皮書

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結