使用 Python 模塊 re 實現解析小工具

原創

跳跳投

2020-02-24 02:53

Python re 的主要功能

Python 的 re 正則表達式模塊定義了一系列函數，常量以及異常；同時，正則表達式被編譯成‘ RegexObject ’實例，本身可以爲不同的操作提供方法。接下來簡要介紹一下這些函數的功能和用法。

compile

re.compile(pattern[, flags])

把正則表達式的模式和標識轉化成正則表達式對象，供 match() 和 search() 這兩個函數使用。

re 所定義的 flag 包括：

re.I 忽略大小寫

re.L 表示特殊字符集 \w, \W, \b, \B, \s, \S 依賴於當前環境

re.M 多行模式

re.S 即爲’ . ’並且包括換行符在內的任意字符（’ . ’不包括換行符）

re.U 表示特殊字符集 \w, \W, \b, \B, \d, \D, \s, \S 依賴於 Unicode 字符屬性數據庫

re.X 爲了增加可讀性，忽略空格和’ # ’後面的註釋

例：以下兩種用法結果相同：

compiled_pattern = re.compile(pattern)
result = compiled_pattern.match(string)

result = re.match(pattern, string)

re.search(pattern, string[, flags])

在字符串中查找匹配正則表達式模式的位置，返回 MatchObject 的實例，如果沒有找到匹配的位置，則返回 None。

對於已編譯的正則表達式對象來說（re.RegexObject），有以下 search 的方法：

search (string[, pos[, endpos]])

若 regex 是已編譯好的正則表達式對象，regex.search(string, 0, 50) 等同於 regex.search(string[:50], 0)。

具體示例如下。

>>> pattern = re.compile("a")
>>> pattern.search("abcde")     # Match at index 0
>>> pattern.search("abcde", 1)  # No match;

match

re.match(pattern, string[, flags])

判斷 pattern 是否在字符串開頭位置匹配。對於 RegexObject，有：

match(string[, pos[, endpos]])

match() 函數只在字符串的開始位置嘗試匹配正則表達式，也就是隻報告從位置 0 開始的匹配情況，而 search() 函數是掃描整個字符串來查找匹配。如果想要搜索整個字符串來尋找匹配，應當用 search()。

split

re.split(pattern, string[, maxsplit=0, flags=0])

此功能很常用，可以將將字符串匹配正則表達式的部分割開並返回一個列表。對 RegexObject，有函數：

split(string[, maxsplit=0])

例如，利用上面章節中介紹的語法：

>>> re.split('\W+', 'test, test, test.')
['test', 'test', 'test', '']
>>> re.split('(\W+)', ' test, test, test.')
[' test ', ', ', ' test ', ', ', ' test ', '.', '']
>>> re.split('\W+', ' test, test, test.', 1)
[' test ', ' test, test.']

對於一個找不到匹配的字符串而言，split 不會對其作出分割，如：

>>> re.split('a*', 'hello world')
['hello world']

findall

re.findall(pattern, string[, flags])

在字符串中找到正則表達式所匹配的所有子串，並組成一個列表返回。同樣 RegexObject 有：

findall(string[, pos[, endpos]])

示例如下：

#get all content enclosed with [], and return a list
>>> return_list = re.findall("(\[.*?\])",string)

finditer

re.finditer(pattern, string[, flags])

和 findall 類似，在字符串中找到正則表達式所匹配的所有子串，並組成一個迭代器返回。同樣 RegexObject 有：

finditer(string[, pos[, endpos]])

sub

re.sub(pattern, repl, string[, count, flags])

在字符串 string 中找到匹配正則表達式 pattern 的所有子串，用另一個字符串 repl 進行替換。如果沒有找到匹配 pattern 的串，則返回未被修改的 string。Repl 既可以是字符串也可以是一個函數。對於 RegexObject 有：

sub(repl, string[, count=0])

此語法的示例有：

>>> p = re.compile( '(one|two|three)')
>>> p.sub( 'num', 'one word two words three words')
'num word num words num words'

同樣可以用以下方法，並指定 count 爲 1（只替換第一個）：

>>> p.sub( 'num', ' one word two words three words', count=1)

' num word two words three words'

subn

re.subn(pattern, repl, string[, count, flags])

該函數的功能和 sub() 相同，但它還返回新的字符串以及替換的次數。同樣 RegexObject 有：

subn(repl, string[, count=0])

參考 Python 使用文檔中 RE 相關章節 The Python Standard Library - Regular expression operations，查看 RE 語法使用細節。
參看文章 Regular Expressions Primer，瞭解更多 Python RE 的使用實例。
參看 developerWorks 中的文章可愛的 Python：Python 中的文本處理，瞭解使用 Python RE 進行文本解析的實例。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

使用 Python 模塊 re 實現解析小工具

Python re 的主要功能

添加硬盤進行分區並且創建lvm

centos升級ssh7.4

python正則表達式簡介

C語言中const int * 和 int * const

Linux端口相關介紹

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結