python正則表達式——regex模塊

單詞起始位置、結束位置、分界位置

regex用\m表示單詞起始位置，用\M表示單詞結束位置。

\b：是單詞分界位置，但不能區分是起始還是結束位置。

局部範圍的flag控制

(?flags-flags:...)

在re模塊，flag只能作用於整個表達式，現在可以作用於局部範圍了：

>>> regex.search(r"<B>(?i:good)</B>", "<B>GOOD</B>")
<regex.Match object; span=(0, 11), match='<B>GOOD</B>'>

在這個例子裏，忽略大小寫模式只作用於標籤之間的單詞。

(?i:)是打開忽略大小寫，(?-i:)則是關閉忽略大小寫。

如果有多個flag挨着寫既可，如(?is-f:)：減號左邊的是打開，減號右邊的是關閉。

全局範圍的flag控制

除了局部範圍的flag，還有全局範圍的flag控制，如 (?si-f)<B>good</B>

re模塊也支持這個，可以參見Python文檔。

把flags寫進表達式、而不是以函數參數的方式聲明，方便直觀且不易出錯。

Additional features：附加功能

Added support for lookaround in conditional pattern (Hg issue 163)

對條件模式中環顧四周的支持

>>> regex.match(r'(?(?=\d)\d+|\w+)', '123abc')
<regex.Match object; span=(0, 3), match='123'>
>>> regex.match(r'(?(?=\d)\d+|\w+)', 'abc123')
<regex.Match object; span=(0, 6), match='abc123'>

這與在一對替代方案的第一個分支中進行環視不太一樣。

>>> print(regex.match(r'(?:(?=\d)\d+\b|\w+)', '123abc'))   # 若分支1不匹配，嘗試第2個分支
<regex.Match object; span=(0, 6), match='123abc'>
>>> print(regex.match(r'(?(?=\d)\d+\b|\w+)', '123abc'))    # 若分支1不匹配，不嘗試第2個分支
None

在第一個示例中，環顧四周匹配，但第一個分支的其餘部分不匹配，因此嘗試了第二個分支，而在第二個示例中，環顧四周匹配，並且第一個分支不匹配，但是第二個分支沒有嘗試。

Added POSIX matching (leftmost longest) (Hg issue 150)

POSIX匹配（最左最長）：(?p)

>>> # Normal matching.
>>> regex.search(r'Mr|Mrs', 'Mrs')
<regex.Match object; span=(0, 2), match='Mr'>
>>> regex.search(r'one(self)?(selfsufficient)?', 'oneselfsufficient')
<regex.Match object; span=(0, 7), match='oneself'>
>>> # POSIX matching.
>>> regex.search(r'(?p)Mr|Mrs', 'Mrs')
<regex.Match object; span=(0, 3), match='Mrs'>
>>> regex.search(r'(?p)one(self)?(selfsufficient)?', 'oneselfsufficient')
<regex.Match object; span=(0, 17), match='oneselfsufficient'>

Added (?(DEFINE)...) (Hg issue 152)

命名組：如果沒有名爲“ DEFINE”的組，則…將被忽略，但只要有任何組定義，(?(DEFINE))將可用：

>>> regex.search(r'(?(DEFINE)(?P<quant>\d+)(?P<item>\w+))(?&quant) (?&item)', '5 elephants')
<regex.Match object; span=(0, 11), match='5 elephants'>

# 卡兩頭爲固定樣式、中間隨意的內容
>>> regex.search(r'(?(DEFINE)(?P<quant>\d+)(?P<item>\w+))(?&quant)[\u4E00-\u9FA5](?&item)', '123哈哈dog')
<regex.Match object; span=(0, 8), match='123哈哈dog'>

[[a-z]--[aeiou]]

V0：simple sets，與re模塊兼容

V1：nested sets，功能增強，集合包含'a'-'z'，排除“a”, “e”, “i”, “o”, “u”

eg：

regex.search(r'(?V1)[[a-z]--[aeiou]]+', 'abcde')

或

regex.search(r'[[a-z]--[aeiou]]+', 'abcde', flags=regex.V1)

<regex.Match object; span=(1, 4), match='bcd'>

(?p)

POSIX匹配（最左最長匹配）

eg：

regex.search(r'one(self)?(selfsufficient)?', 'oneselfsufficient')
<regex.Match object; span=(0, 7), match='oneself'>

>>> regex.search(r'(?p)Mr|Mrs', 'Mrs')
<regex.Match object; span=(0, 3), match='Mrs'>
>>> regex.search(r'(?p)one(self)?(selfsufficient)?', 'oneselfsufficient')
<regex.Match object; span=(0, 17), match='oneselfsufficient'>

(?(DEFINE)...)

命名組內容及名字：如果沒有名爲“ DEFINE”的組，則…將被忽略，但只要有任何組定義，(?(DEFINE))將起作用。

eg：

>>> regex.search(r'(?(DEFINE)(?P<quant>\d+)(?P<item>\w+))(?&quant) (?&item)', '5 elephants')
<regex.Match object; span=(0, 11), match='5 elephants'>

# 卡兩頭爲固定樣式、中間隨意的內容
>>> regex.search(r'(?(DEFINE)(?P<quant>\d+)(?P<item>\w+))(?&quant)[\u4E00-\u9FA5](?&item)', '123哈哈dog')
<regex.Match object; span=(0, 8), match='123哈哈dog'>

保留K出現位置之後的匹配內容，丟棄其之前的匹配內容。

>>> m = regex.search(r'(\w\w\K\w\w\w)', 'abcdef')
<regex.Match object; span=(2, 5), match='cde'> 保留cde，丟棄ab
>>> m[0] 'cde'
>>> m[1] 'abcde'

>>> m = regex.search(r'(?r)(\w\w\K\w\w\w)', 'abcdef')
<regex.Match object; span=(1, 3), match='bc'> 反向，保留bc，丟棄def
>>> m[0] 'bc'
>>> m[1] 'bcdef'

expandf

使用下標來獲取重複捕獲組的捕獲

>>> m = regex.match(r"(\w)+", "abc")
>>> m.expandf("{1}") 'c' m.expandf("{1}") == m.expandf("{1[-1]}")
>>> m.expandf("{1[0]} {1[1]} {1[2]}") 'a b c'
>>> m.expandf("{1[-1]} {1[-2]} {1[-3]}") 'c b a'

定義組名
>>> m = regex.match(r"(?P<letter>\w)+", "abc")
>>> m.expandf("{letter}") 'c'
>>> m.expandf("{letter[0]} {letter[1]} {letter[2]}") 'a b c'
>>> m.expandf("{letter[-1]} {letter[-2]} {letter[-3]}") 'c b a'

>>> m = regex.match(r"(\w+) (\w+)", "foo bar")
>>> m.expandf("{0} => {2} {1}") 'foo bar => bar foo'

>>> m = regex.match(r"(?P<word1>\w+) (?P<word2>\w+)", "foo bar")
>>> m.expandf("{word2} {word1}") 'bar foo'

同樣可以用於search()方法

subf

subfn

subf和subfn分別是sub和subn的替代方案。當傳遞替換字符串時，他們將其視爲格式字符串。

>>> regex.subf(r"(\w+) (\w+)", "{0} => {2} {1}", "foo bar")
'foo bar => bar foo'
>>> regex.subf(r"(?P<word1>\w+) (?P<word2>\w+)", "{word2} {word1}", "foo bar")
'bar foo'

partial

部分匹配：match、search、fullmatch、finditer都支持部分匹配，使用partial關鍵字參數設置。匹配對象有一個pattial參數，當部分匹配時返回True，完全匹配時返回False

>>> regex.search(r'\d{4}', '12', partial=True)
<regex.Match object; span=(0, 2), match='12', partial=True>
>>> regex.search(r'\d{4}', '123', partial=True)
<regex.Match object; span=(0, 3), match='123', partial=True>
>>> regex.search(r'\d{4}', '1234', partial=True)
<regex.Match object; span=(0, 4), match='1234'>   完全匹配：沒有partial
>>> regex.search(r'\d{4}', '12345', partial=True)
<regex.Match object; span=(0, 4), match='1234'>
>>> regex.search(r'\d{4}', '12345', partial=True).partial 完全匹配
False
>>> regex.search(r'\d{4}', '145', partial=True).partial 部分匹配
True
>>> regex.search(r'\d{4}', '1245', partial=True).partial   完全匹配
  False

capturesdict()

groupdict()

captures()

capturesdict() 是 groupdict() 和 captures()的結合：

groupdict()：返回一個字典，key = 組名，value = 匹配的最後一個值

captures()：返回一個所有匹配值的列表

capturesdict()：返回一個字典，key = 組名，value = 所有匹配值的列表

>>> m = regex.match(r"(?:(?P<word>\w+) (?P<digits>\d+)\n)+", "one 1\ntwo 2\nthree 3\n")
>>> m.groupdict()
{'word': 'three', 'digits': '3'}
>>> m.captures("word")
['one', 'two', 'three']
>>> m.captures("digits")
['1', '2', '3']
>>> m.capturesdict()
{'word': ['one', 'two', 'three'], 'digits': ['1', '2', '3']}

(?P<name>)

允許組名重複

允許組名重複，後面的捕獲覆蓋前面的捕獲
可選組：
>>> # Both groups capture, the second capture 'overwriting' the first.
>>> m = regex.match(r"(?P<item>\w+)? or (?P<item>\w+)?", "first or second")
>>> m.group("item") 'second'
>>> m.captures("item") ['first', 'second']

>>> m = regex.match(r"(?P<item>\w+)? or (?P<item>\w+)?", " or second")
>>> m.group("item") 'second'
>>> m.captures("item") ['second']

>>> m = regex.match(r"(?P<item>\w+)? or (?P<item>\w+)?", "first or ")
>>> m.group("item") 'first'
>>> m.captures("item") ['first']

強制性組：
>>> m = regex.match(r"(?P<item>\w*) or (?P<item>\w*)?", "first or second")
>>> m.group("item") 'second'
>>> m.captures("item") ['first', 'second']

>>> m = regex.match(r"(?P<item>\w*) or (?P<item>\w*)", " or second")
>>> m.group("item") 'second'
>>> m.captures("item") ['', 'second']

>>> m = regex.match(r"(?P<item>\w*) or (?P<item>\w*)", "first or ")
>>> m.group("item") ''
>>> m.captures("item") ['first', '']

detach_string

匹配對象通過其string屬性，對所搜索字符串進行引用。detach_string方法將“分離”該字符串，使其可用於垃圾回收，如果該字符串很大，則可能節省寶貴的內存。

>>> m = regex.search(r"\w+", "Hello world") >>> print(m.group()) Hello >>> print(m.string) Hello world >>> m.detach_string() >>> print(m.group()) Hello >>> print(m.string) None

(?0)、(?1)、(?2)

(?R)或(?0)嘗試遞歸匹配整個正則表達式。
(?1)、(?2)等，嘗試匹配相關的捕獲組，第1組、第2組。(Tarzan|Jane) loves (?1) == (Tarzan|Jane) loves (?:Tarzan|Jane)
(?＆name)嘗試匹配命名的捕獲組。

>>> regex.match(r"(Tarzan|Jane) loves (?1)", "Tarzan loves Jane").groups()
('Tarzan',)
>>> regex.match(r"(Tarzan|Jane) loves (?1)", "Jane loves Tarzan").groups()
('Jane',)

>>> m = regex.search(r"(\w)(?:(?R)|(\w?))\1", "kayak")
>>> m.group(0, 1, 2)
('kayak', 'k', None)

模糊匹配

三種類型錯誤：

插入： “i”
刪除：“d”
替換：“s”
任何類型錯誤：“e”

Examples:

foo match “foo” exactly
(?:foo){i} match “foo”, permitting insertions
(?:foo){d} match “foo”, permitting deletions
(?:foo){s} match “foo”, permitting substitutions
(?:foo){i,s} match “foo”, permitting insertions and substitutions
(?:foo){e} match “foo”, permitting errors

如果指定了某種類型的錯誤，則不允許任何未指定的類型。在以下示例中，我將省略item並僅寫出模糊性：

{d<=3} permit at most 3 deletions, but no other types
{i<=1,s<=2} permit at most 1 insertion and at most 2 substitutions, but no deletions
{1<=e<=3} permit at least 1 and at most 3 errors
{i<=2,d<=2,e<=3} permit at most 2 insertions, at most 2 deletions, at most 3 errors in total, but no substitutions

It’s also possible to state the costs of each type of error and the maximum permitted total cost.

Examples:

{2i+2d+1s<=4} each insertion costs 2, each deletion costs 2, each substitution costs 1, the total cost must not exceed 4
{i<=1,d<=1,s<=1,2i+2d+1s<=4} at most 1 insertion, at most 1 deletion, at most 1 substitution; each insertion costs 2, each deletion costs 2, each substitution costs 1, the total cost must not exceed 4

Examples:

{s<=2:[a-z]} at most 2 substitutions, which must be in the character set [a-z].
{s<=2,i<=3:\d} at most 2 substitutions, at most 3 insertions, which must be digits.

默認情況下，模糊匹配將搜索滿足給定約束的第一個匹配項。ENHANCEMATCH (?e)標誌將使它嘗試提高找到的匹配項的擬合度（即減少錯誤數量）。

BESTMATCH標誌將使其搜索最佳匹配。

regex.search("(dog){e}", "cat and dog")[1] returns "cat" because that matches "dog" with 3 errors (an unlimited number of errors is permitted).
regex.search("(dog){e<=1}", "cat and dog")[1] returns " dog" (with a leading space) because that matches "dog" with 1 error, which is within the limit.
regex.search("(?e)(dog){e<=1}", "cat and dog")[1] returns "dog" (without a leading space) because the fuzzy search matches " dog" with 1 error, which is within the limit, and the (?e) then it attempts a better fit.

匹配對象具有屬性fuzzy_counts，該屬性給出替換、插入和刪除的總數：

>>> # A 'raw' fuzzy match:
>>> regex.fullmatch(r"(?:cats|cat){e<=1}", "cat").fuzzy_counts
(0, 0, 1)
>>> # 0 substitutions, 0 insertions, 1 deletion.

>>> # A better match might be possible if the ENHANCEMATCH flag used:
>>> regex.fullmatch(r"(?e)(?:cats|cat){e<=1}", "cat").fuzzy_counts
(0, 0, 0)
>>> # 0 substitutions, 0 insertions, 0 deletions.

匹配對象還具有屬性fuzzy_changes，該屬性給出替換、插入和刪除的位置的元組：

>>> m = regex.search('(fuu){i<=2,d<=2,e<=5}', 'anaconda foo bar')
>>> m
<regex.Match object; span=(7, 10), match='a f', fuzzy_counts=(0, 2, 2)>
>>> m.fuzzy_changes
([], [7, 8], [10, 11])

\L<name>

Named lists
老方法：p = regex.compile(r"first|second|third|fourth|fifth")，如果列表很大，則解析生成的正則表達式可能會花費大量時間，並且還必須注意正確地對字符串進行轉義和正確排序，例如，“ cats”位於“ cat”之間。

新方法：順序無關緊要，將它們視爲一個set

>>> option_set = ["first", "second", "third", "fourth", "fifth"]
>>> p = regex.compile(r"\L<options>", options=option_set)

named_lists屬性：

>>> print(p.named_lists)
# Python 3
{'options': frozenset({'fifth', 'first', 'fourth', 'second', 'third'})}
# Python 2
{'options': frozenset(['fifth', 'fourth', 'second', 'third', 'first'])}

Set operators

僅版本1行爲

添加了集合運算符，並且集合可以包含嵌套集合。

按優先級高低排序的運算符爲：

|| for union (“x||y” means “x or y”)
~~ (double tilde) for symmetric difference (“x~~y” means “x or y, but not both”)
&& for intersection (“x&&y” means “x and y”)
-- (double dash) for difference (“x–y” means “x but not y”)

隱式聯合，即[ab]中的簡單並置具有最高優先級。因此，[ab && cd] 與 [[a || b] && [c || d]] 相同。

eg：

[ab] # Set containing ‘a’ and ‘b’
[a-z] # Set containing ‘a’ .. ‘z’
[[a-z]--[qw]] # Set containing ‘a’ .. ‘z’, but not ‘q’ or ‘w’
[a-z--qw] # Same as above
[\p{L}--QW] # Set containing all letters except ‘Q’ and ‘W’
[\p{N}--[0-9]] # Set containing all numbers except ‘0’ .. ‘9’
[\p{ASCII}&&\p{Letter}] # Set containing all characters which are ASCII and letter

匹配對象具有其他方法，這些方法返回有關重複捕獲組的所有成功匹配的信息。這些方法是：

matchobject.captures([group1, ...])
matchobject.starts([group])
matchobject.ends([group])
matchobject.spans([group])

>>> m = regex.search(r"(\w{3})+", "123456789")
>>> m.group(1)
'789'
>>> m.captures(1)
['123', '456', '789']
>>> m.start(1)
6
>>> m.starts(1)
[0, 3, 6]
>>> m.end(1)
9
>>> m.ends(1)
[3, 6, 9]
>>> m.span(1)
(6, 9)
>>> m.spans(1)
[(0, 3), (3, 6), (6, 9)]

訪問組的方式

（1）通過下標、切片訪問：
>>> m = regex.search(r"(?P<before>.*?)(?P<num>\d+)(?P<after>.*)", "pqr123stu")
>>> print(m["before"])
pqr
>>> print(len(m))
4
>>> print(m[:])
('pqr123stu', 'pqr', '123', 'stu')

（2）通過group("name")訪問：
>>> m.group('num')

'123'

（3）通過組序號訪問：
>>> m.group(0)

'pqr123stu'

>>> m.group(1)

'pqr'

?r

python正則表達式——regex模塊

單詞起始位置、結束位置、分界位置

局部範圍的flag控制

全局範圍的flag控制

Additional features：附加功能

Added support for lookaround in conditional pattern (Hg issue 163)

Added POSIX matching (leftmost longest) (Hg issue 150)

Added (?(DEFINE)...) (Hg issue 152)

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

【2024-05-21】以茶會友

Tensorboard—使用keras結合Tensorboard可視化

Github代碼上傳和下載

決策樹（Decision Tree）和隨機森林

BERT：代碼解讀、實體關係抽取實戰

TensorFlow：常用函數介紹

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結