python正則表達式複習3

原創

2018-08-29 19:19

正向預查找

import re

# ?=pattern ，正向預查找 (look-ahead)
# 下面是檢查是否<尖括號有缺失的情況
address = re.compile(
    '''
    ((?P<name>
    ([\w.,]+\s+)*[\w.,]+
    )
    \s+
    ) # 名字必需存在，正向預查找尖括號
    # 尖括號要麼配對，要麼不要，不能出現單個
    (?= (<.*>$) # 配對的尖括號
    |
    ([^<].*[^>]$) #  沒有尖括號
    )
    <? # 尖括號可選
    (?P<email>
    [\w\d.+-]+ 
    @
    ([\w\d.]+\.)+ # 
    (com|org|edu) # 
    )
    >? # 尖括號可選
    ''',
    re.UNICODE | re.VERBOSE)

candidates = [
    u'First Last <[email protected]>',
    u'No Brackets [email protected]',
    u'Open Bracket <[email protected]',
    u'Close Bracket [email protected]>',
    ]

for candidate in candidates:
    print 'Candidate:', candidate
    match = address.search(candidate)
    if match:
        print ' Name :', match.groupdict()['name']
        print ' Email:', match.groupdict()['email']
    else:
        print ' No match'

結果
Candidate: First Last [email protected]
Name : First Last
Email: [email protected]
Candidate: No Brackets [email protected]
Name : No Brackets
Email: [email protected]
Candidate: Open Bracket

關於正向預查找和反向預查找

提供字符串:foobarbarfoo
bar(?=bar) 找到第一個bar (找到的bar後面跟一個bar) .
bar(?!bar) 找到第二個bar (找到的bar後面沒有跟一個bar).
(?<=foo)bar 找到第一個bar (找到的bar前面跟一個foo).
(?<!foo)bar 找到第二個bar (找到的bar前面不跟一個foo).

下面是stackoverflow上面的一個解析

Look ahead Positive(?=)
Find expression A where expression B follows
A(?=B)

Look ahead Negative(?!)
Find expression A where expression B does not follow
A(?!B)

Look behind Positive(?<=)
Find expression A where expression B precedes
(?<=B)A

Look behind Negative(?<!)
Find expression A where expression B does not precedes it
(?<!B)A

最小組團

注：最小組團是無捕捉的特殊正則表達式分組，它可以用於優化正則表達式性能

非組團: /\b(engineer|engrave|end)\b/
如果把“engineering”拿去匹配，正則引擎會先匹配到“engineer”，但接下來就遇到了字詞邊界\b，所以匹配不成功。然後，正則引擎又會嘗試在字串裏尋找下一個匹配內容：engrave。匹配到eng的時候，後面的又對不上了，匹配失敗。最後，嘗試 “end”，結果同樣是失敗。仔細觀察，你會發現，一旦engineer匹配失敗，並且都抵達了字詞邊界，“engrave”和“end”這兩個詞就已經不可能匹配成功了。
這兩個詞都比engineer短小，從長度上來說就不可能被匹配了，所以正則引擎不應該再多做無謂的嘗試。

最小組團:/\b(?>engineer|engrave|end)\b/
只會匹配一次，發現engineer都不滿足要求，就不再回溯了，直接匹配不成功

練習代碼

look_ahead = re.compile('python(?:2|3)') 
look_ahead_pattern = re.compile('python(?=2)')
look_ahead_not_pattern = re.compile('python(?!2)')
text = 'pythonic python2 python3'

def print_info(re_obj, text=text):
    for match in re_obj.finditer(text):
        print match.group(),
        print 'start is %d, end is %d' % (match.start(), match.end())
    print

print_info(look_ahead)
print_info(look_ahead_pattern)
print_info(look_ahead_not_pattern)

結果
python2 start is 9, end is 16
python3 start is 17, end is 24

python start is 9, end is 15

python start is 0, end is 6
python start is 17, end is 23

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python正則表達式複習3

正向預查找

關於正向預查找和反向預查找

最小組團

練習代碼

任務管理器無法結束進程解決辦法

win7分享wifi

python字符串學習

mysql 在存儲過程中輸出日誌信息

python 文檔生成器 sphinx

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結