Python之正则表达式

原創

2018-09-01 17:46

正则表达式元字符如下：. ^ $ * + ? { } [ ] \ | ( )
 . 匹配除换行符以外的所以字符
 ^ 规定匹配模式必须出现在目标字符串的开头，例如：^hell hello hellboy
 $ 规定匹配模式必须出现在目标字符串的结尾，例如：ar$ car bar
* 其前一个字符必须在目标对象中连续出现零次或多次
 + 其前一个字符必须在目标对象中连续出现一次或多次
？其前一个字符必须在目标对象中连续出现一次或零次
{n} 匹配确定的n次，例如:o{2} oo
{n,} 至少匹配n次，例如:o{2} oo ooo oooo
{n,m} 至少匹配n次，至多匹配m次，例如:o{2,3} oo ooo
[A-Z] A-Z内任意一个大写字母
[a-z] a-z内任意一个小写字母
[0-9] 0-9内任意一个数字,等价于 \d
[A-Za-z0-9] 任意一个字母或数字,等价于 \w
\ 转义字符，例如[ ==> [ , \==>\
| 管道符号，A和B是任意的RE，那么A|B就是匹配A或者B的一个新的RE。

\s 用于匹配单个空格，包括tab键和换行符
\S 用于匹配单个空格之外的所有字符
\d 匹配0-9的数字
\w 匹配字母、数字或下划线
\W 匹配所有和\w不匹配的字符

使用正则表达式
re.compile(pattern, flags=0)
编译正则表达式，返回一个 pattern 对象。

>>>prog = re.compile(pattern)
>>>result = prog.match(string)

等价于

>>>result = re.match(pattern, string)

第一种方式能实现正则表达式的重用。

re.match(pattern, string, flags=0)
如果字符串的开头能匹配正则表达式，返回对应的 match 对象，否则返回None。

re.search(pattern, string, flags=0)
在字符串中查找，是否能匹配正则表达式，若是，返回对应的 match 对象，否则返回None。

re.split(pattern, string, maxsplit=0, flags=0)
使用正则表达式分离字符串。如果用括号将正则表达式括起来，那么匹配的字符串也会被列入到list中返回。maxsplit是分离的次数，maxsplit=1分离一次，默认为0，不限制次数。

>>> p = re.compile(r'\W+')
>>> p2 = re.compile(r'(\W+)')
>>> p.split('This... is a test.')
['This', 'is', 'a', 'test', '']
>>> p2.split('This... is a test.')
['This', '... ', 'is', ' ', 'a', ' ', 'test', '.', '']

re.findall(pattern, string, flags=0)
找到 RE 匹配的所有子串，并把它们作为一个列表返回。如果无匹配，返回空列表。

>>> p = re.compile('\d+')
>>> p.findall('12 drummers drumming, 11 pipers piping, 10 lords a-leaping')
['12', '11', '10']

re.finditer(pattern, string, flags=0)
找到 RE 匹配的所有子串，并把它们作为一个迭代器返回。

>>> p = re.compile('\d+')
>>> iterator = p.finditer('12 drummers drumming, 11 ... 10 ...')
>>> iterator  
<callable_iterator object at 0x...>
>>> for match in iterator:
...     print(match.span())
...
(0, 2)
(22, 24)
(29, 31)

re.sub(pattern, repl, string, count=0, flags=0)
找到 RE 匹配的所有子串，并将其用一个不同的字符串替换。可选参数 count 是模式匹配后替换的最大次数；count 必须是非负整数。缺省值是 0 表示替换所有的匹配。如果无匹配，字符串将会无改变地返回。

group([group1, …])

>>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist")
>>> m.group(0)       # 整个匹配
'Isaac Newton'
>>> m.group(1)       # 第一个子串
'Isaac'
>>> m.group(2)       # 第二个子串
'Newton'
>>> m.group(1, 2)    # 多个子串组成的元组
('Isaac', 'Newton')

如果有其中有用(?P…)这种语法命名过的子串的话，相应的groupN也可以是名字字符串。例如：

>>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
>>> m.group('first_name')
'Malcolm'
>>> m.group('last_name')
'Reynolds'

groups(default=None)
返回一个由所有匹配到的子串组成的元组。

>>> m = re.match(r"(\d+)\.(\d+)", "24.1632")
>>> m.groups()
('24', '1632')

default的作用：

>>> m = re.match(r"(\d+)\.?(\d+)?", "24")
>>> m.groups()      # 第二个默认是None
('24', None)
>>> m.groups('0')   # 现在默认是0了
('24', '0')

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Python之正则表达式

【简写Mybatis-02】注册机的实现以及SqlSession处理

手绘二维码

.NET借助虚拟网卡实现一个简单异地组网工具

Date,SimpleDateFormat,Calendar 類的應用

Spring AOP 的實現

input實現不可修改

用SchemaExport生成數據庫

Spring注入方式

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結