python正則表達式系列（3）——正則內置屬性

原創

2018-09-01 18:26

本文主要總結一下python正則的一些內置屬性的用法。

1. 編譯標誌：flags

首先來看一下re.findall函數的函數原型：

import re 
print('【Output】')
print help(re.findall)

【Output】
Help on function findall in module re:

findall(pattern, string, flags=0)
    Return a list of all non-overlapping matches in the string.

    If one or more groups are present in the pattern, return a
    list of groups; this will be a list of tuples if the pattern
    has more than one group.

    Empty matches are included in the result.

None

可以看出，re.findall函數的最後一個參數是flags，默認值是0，這個falgs就是編譯標誌，即正則的內置屬性，使用不同的編譯標誌可以讓正則產生不同的匹配效果。那麼falgs可以取哪些值呢？用help(re)來看一下re的DATA有哪些：

print help(re)

# 【Output】
'''
...
DATA
    DOTALL = 16
    I = 2
    IGNORECASE = 2
    L = 4
    LOCALE = 4
    M = 8
    MULTILINE = 8
    S = 16
    U = 32
    UNICODE = 32
    VERBOSE = 64
    X = 64
...
'''

下面試驗一下上面的每一種編譯標誌的作用。

2. DOTALL, S

使”.”匹配包括”\n”在內的所有字符（”.”默認是不能匹配”\n“的），舉例：

p = r'me.com'
print '【Output】'
print re.findall(p,'me.com')
print re.findall(p,'me\ncom')
print re.findall(p,'me\ncom',re.DOTALL)
print re.findall(p,'me\ncom',re.S)

【Output】
['me.com']
[]
['me\ncom']
['me\ncom']

3. IGNORECASE, I

使匹配對大小寫不敏感，舉例：

p = r'a'
print '【Output】'
print re.findall(p,'A')
print re.findall(p,'A',re.IGNORECASE)
print re.findall(p,'A',re.I)

【Output】
[]
['A']
['A']

4. LOCALE, L

本地化匹配，使用了該編譯標誌後，\w,\W,\b,\B,\s,\S等字符的含義就和本地化有關了。

5. MULTILINE, M

開啓多行匹配，影響”^”和”$”。舉例：

s = """
aa bb cc
bb aa
aa ccd
"""
p1 = r'^aa'
p2 = r'cc$'
print '【Output】'
print re.findall(p1,s)
print re.findall(p1,s,re.M)

print re.findall(p2,s)
print re.findall(p2,s,re.M)

【Output】
[]
['aa', 'aa']
[]
['cc']

6. VERBOSE, X

開啓正則的多行寫法，使之更清晰。舉例：

p = r"""
\d{3,4}
-?
\d{7,8}
"""
tel = '010-12345678'
print '【Output】'
print re.findall(p,tel)
print re.findall(p,tel,re.X)

【Output】
[]
['010-12345678']

7. UNICODE, U

以unicode編碼進行匹配，比如用’\s’匹配中文全角的空格符：\u3000，不加該編譯標誌和加該編譯標誌的效果對比如下：

s = u'\u3000'
p = r'\s'
print '【Output】'
print re.findall(p,s)
print re.findall(p,s,re.U)

【Output】
[]
[u'\u3000']

8. 如何同時使用多個編譯標誌？

有時候可能同時要用到多種編譯標誌，比如我既想在匹配的時候忽略大小寫，又想讓”.”匹配換行符號”\n”，前面的方式貌似不行了，那怎麼辦呢？

方法：在正則的任意位置加上這句即可：(?iLmsux)

其中i對應re.I，L對應re.L，m對應re.M，s對應re.S，u對應re.U，x對應re.X。舉例：

s = 'Abc\ncom'
p = r'abc.com(?is)'  # 注：編譯標誌(?is)可以加在正則的任意位置，這裏加在了末尾
print '【Output】'
print re.findall(p,s)

【Output】
['Abc\ncom']

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python正則表達式系列（3）——正則內置屬性

1. 編譯標誌：flags

2. DOTALL, S

3. IGNORECASE, I

4. LOCALE, L

5. MULTILINE, M

6. VERBOSE, X

7. UNICODE, U

8. 如何同時使用多個編譯標誌？

《利用Python進行數據分析》學習筆記系列——IPython

動態規劃系列（1）——金礦模型的理解

用python實現快速排序算法

python正則表達式系列（5）——零寬斷言

python正則表達式系列（3）——正則內置屬性

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結