python的正則表達式使用

1.正則表達式的簡單介紹

正則表達式是用於處理字符串的強大工具，擁有自己獨特的語法和一個獨立的處理引擎，效率上可能不如str自帶方法，但是功能更強大。
在所有提供了正則表達式的編程語言裏，正則表達式的語法都是一樣的，區別只在於不同的編程語言實現支持的語法數量不同。
正則表達式不是Python語言的一部分，Python只是通過re模塊提供對正則表達式的支持。
數量詞的貪婪與非貪婪：Python裏數量詞默認是貪婪的（在少數語言裏也可能是默認非貪婪），總是嘗試匹配儘可能多的字符；非貪婪的則相反，總是嘗試匹配儘可能少的字符。例如：正則表達式"ab*“如果用於查找"abbbc”，將找到"abbb"。而如果使用非貪婪的數量詞"ab*?"，將找到"a"。
數量詞的非貪婪模式模式：*？；+？；？？；{m,n}?

import re
result0 = re.match('(ab){2,4}?','ababab')
result1 = re.match('(ab){2,4}','ababab')

print(result0.group())
print(result1.group())

Python支持的正則表達式元字符和語法：

2.re模塊

2.1 re.compile(strPattern[, flag]):

這個方法是Pattern類的工廠方法，用於將字符串形式的正則表達式編譯爲Pattern對象。第二個參數flag是匹配模式，取值可以使用按位或運算符’|‘表示同時生效，比如re.I | re.M。
另外，你也可以在regex字符串中指定模式，比如re.compile(‘pattern’, re.I | re.M)與re.compile(’(?im)pattern’)是等價的。
可選值有：

re.I(re.IGNORECASE): 忽略大小寫（括號內是完整寫法，下同）
re.M(MULTILINE): 多行模式，改變’^‘和’$'的行爲（參見上圖）
re.S(DOTALL): 點任意匹配模式，改變’.'的行爲
re.L(LOCALE): 使預定字符類 \w \W \b \B \s \S 取決於當前區域設定
re.U(UNICODE): 使預定字符類 \w \W \b \B \s \S \d \D 取決於unicode定義的字符屬性
re.X(VERBOSE): 詳細模式。這個模式下正則表達式可以是多行，忽略空白字符，並可以加入註釋。以下兩個正則表達式是等價的：

a = re.compile(r'''\d+ # the integre part
				   \.  # the decimal point
				   \d* # some fractional digits''',reX)
等價於a = re.compile(r'\d+\.\d*')

re提供了衆多模塊方法用於完成正則表達式的功能。這些方法可以使用Pattern實例的相應方法替代，唯一的好處是少寫一行re.compile()代碼，但同時也無法複用編譯後的Pattern對象。在後面會看到這種區別。

2.2 match(string[, pos[, endpos]]) | re.match(pattern, string[, flags]):

這個方法將從string的pos下標處起嘗試匹配pattern；如果pattern結束時仍可匹配，則返回一個Match對象；如果匹配過程中pattern無法匹配，或者匹配未結束就已到達endpos，則返回None。
pos和endpos的默認值分別爲0和len(string)；
re.match()無法指定這兩個參數，參數flags用於編譯pattern時指定匹配模式。
注意：這個方法並不是完全匹配。當pattern結束時若string還有剩餘字符，仍然視爲成功。想要完全匹配，可以在表達式末尾加上邊界匹配符’$’。
可以校驗用戶名，密碼等是否符合規定。

import re
#re.match 進行字符串匹配，從第一個字節開始，挨個比較
line = '8734521----90134zhudanian'
result = re.match(r'(.*)----(.*)',line)
print(f'result:{result}')
print(f'type of result:{type(result)}')
print(f'result.string:{result.string}')
print(f'result.group(0):{result.group(0):}')
print(f'result.group(1):{result.group(1):}')
print(f'result.group(2):{result.group(2):}')
print(f'result.group(1,2):{result.group(1,2):}')
print(f'result.groups():{result.groups():}')

'''
# re.compile 進行正則表達式預編譯
regex = r'(.*)----(.*)'
pattern = re.compile(regex)
line = '8734521----90134zhudanian'
result = pattern.match(line)
print(f'result:{result}')
print(f'result.group():{result.group():}')
'''

-------------------------------------------------
結果：
result:<re.Match object; span=(0, 25), match='8734521----90134zhudanian'>
type of result:<class 're.Match'>
result.string:8734521----90134zhudanian
result.group(0):8734521----90134zhudanian
result.group(1):8734521
result.group(2):90134zhudanian
result.group(1,2):('8734521', '90134zhudanian')
result.groups():('8734521', '90134zhudanian')

2.3 search(string[, pos[, endpos]]) | re.search(pattern, string[, flags]):

這個方法用於查找字符串中可以匹配成功的子串。從string的pos下標處起嘗試匹配pattern，如果pattern結束時仍可匹配，則返回一個Match對象；若無法匹配，則將pos加1後重新嘗試匹配；直到pos=endpos時仍無法匹配則返回None。
pos和endpos的默認值分別爲0和len(string))；re.search()無法指定這兩個參數，參數flags用於編譯pattern時指定匹配模式。

import re
# re.seach 搜索時只要在字符串中找到滿足正則表達式的子串就行
# 是包含關係，而且找到一個就停止
str = 'a231456  7890653'
searchobj=re.search("[1-9]\\d{4,10}",str)
print(searchobj)
print(searchobj.group())
---------------------------------------
結果：
<re.Match object; span=(1, 7), match='231456'>
231456

2.4 findall(string[, pos[, endpos]]) | re.findall(pattern, string[, flags]):

搜索string，以列表形式返回全部能匹配的子串。

import re
str = 'a231456  7890653'
searchobj=re.findall("[1-9]\\d{4,10}",str)
print(searchobj)
---------------------------------------
結果：
['231456', '7890653']

找手機號碼的正則表達式：r"^1[34578]\d{9}$"
qq號碼的正則表示式：r"^ [1-9]\d{4,10} $"
找郵箱的正則表達式：r"([A-Z0-9._%±]+@[A-Z0-9.-]+.[A-Z]{2,4})"（注意複雜的表達式必須預編譯，否則得不到結果）
IP地址的正則表達式：^((2(5[0-5]|[0-4]\d))|[0-1]?\d{1,2})(.((2(5[0-5]|[0-4]\d))|[0-1]?\d{1,2})){3}$（2(5[0-5]|[0-4]\d) 匹配：200 ~ 255
[0-1]?\d{1,2} 匹配：0 ~ 199）
出生年月日的正則表達式：^((18)|(19)|(20))\d{2} - ((0[1-9])|(1[0-2])) - (0[1-9]|[1-2][0-9]|3[0-1])$

2.5 finditer(string[, pos[, endpos]]) | re.finditer(pattern, string[, flags]):

搜索string，返回一個順序訪問每一個匹配結果（Match對象）的迭代器。

import re

get_number= re.finditer(r'\d+','aa12BB34cc56')
get_letter = re.finditer(r'[a-zA-Z]+','aa12BB34cc56')
ger_str = re.finditer(r'[^你好壞呀]','你好壞呀，我好喜歡')#[^ ]取反
for data in get_number:
    print(data.group())

print('----------------------------------')

for data in get_letter:
    print(data.group())

print('----------------------------------')

for data in ger_str:
    print(data.group(),end='')
------------------------------------
結果：
12
34
56
----------------------------------
aa
BB
cc
----------------------------------
我好喜歡

findall和finditer使用注意：

1.返回值類型不同，findall返回列表，可以用print直接顯示，finditer返回迭代對象，用對象group方法顯示；
2.當正則表達式中有（）分組時，findall返回分組中的匹配內容，但是finditer默認返回全部匹配對象。

content = '''email:[email protected]
email:[email protected]
email:[email protected]
'''
result_finditer = re.finditer(r"(\d+)@(\w+).com", content)
#正則有兩個分組，我們需要分別獲取分區，分組從0開始，group方法不傳遞索引默認爲0，代表了整個正則的匹配結果
for i in result_finditer :
   phone_no = i.group(1)
   email_type = i.group(2)
   print(i.groug())# 返回整個匹配對象

result_findall = re.findall(r"(\d+)@(\w+).com", content)
#此時返回的雖然爲[]，但不是簡單的[],而是一個tuple類型的list  
#如：[('12345678', '163'), ('2345678', '163'), ('345678', '163')]
for i in result_findall :
   phone_no = i[0]
   email_type = i[1]
print(result_findall)    # 結果：[('2345678', '163'), ('2345678', '163'), ('345678', '163')]

因此假如我們需要拿到整個正則和每個分組的匹配，使用findall我們需要將整個正則作爲一個分組：

re.findall(r"((\d+)@(\w+).com)", content)
[('[email protected]', '2345678', '163'), ('[email protected]', '2345678', '163'), ('[email protected]', '345678', '163')]

2.6 split(string[, maxsplit]) | re.split(pattern, string[, maxsplit]):

按照能夠匹配的子串將string分割後返回列表。maxsplit用於指定最大分割次數，不指定將全部分割。

'''
字符串切割---只能做很簡單的切割
line="363316626----3633166268190xl0"
linelist=line.split("----")
print(linelist)
'''
import re
# 字符串中間 間隔的空格不一致，用字符串方法無法切割，只能用正則表達式
line="127740 1小姐    22   166 本科  未婚   合肥 山羊座  編輯 普通話"
mylist = re.split("\\s+",line)
print(mylist)

line1="a,b c;d"
mylist=re.split(r"[\s\,\;]",line1) #\s\,\;三個符號選一個
print(mylist)
-----------------------------------------------
結果：
['127740', '1小姐', '22', '166', '本科', '未婚', '合肥', '山羊座', '編輯', '普通話']
['a', 'b', 'c', 'd']

2.7 sub(repl, string[, count]) | re.sub(pattern, repl, string[, count]):

使用repl替換string中每一個匹配的子串後返回替換後的字符串。
當repl是一個字符串時，可以使用\id或\g、\g引用分組，但不能使用編號0。
當repl是一個方法時，這個方法應當只接受一個參數（Match對象），並返回一個字符串用於替換（返回的字符串中不能再引用分組）。
count用於指定最多替換次數，不指定時全部替換。

import re

p = re.compile(r'(\w+) (\w+)')
s0 ='i say, hello world!'
s1 ='screw you,fuck away'
print(p.sub(r'\2 \1', s0))#應用分組標籤
print(re.sub('screw','sc\*\*',s1))#屏蔽敏感詞

def func(m):
    return m.group(1).title() + ' ' + m.group(2).title()

print(p.sub(func, s0))
---------------------------------
結果：
say i, world hello!
sc** you,fuck away
I Say, Hello World!

subn(repl, string[, count]) |re.sub(pattern, repl, string[, count]):
返回 (sub(repl, string[, count]), 替換次數)。

python的正則表達式使用

1.正則表達式的簡單介紹

2.re模塊

2.1 re.compile(strPattern[, flag]):

2.2 match(string[, pos[, endpos]]) | re.match(pattern, string[, flags]):

2.3 search(string[, pos[, endpos]]) | re.search(pattern, string[, flags]):

2.4 findall(string[, pos[, endpos]]) | re.findall(pattern, string[, flags]):

2.5 finditer(string[, pos[, endpos]]) | re.finditer(pattern, string[, flags]):

findall和finditer使用注意：

2.6 split(string[, maxsplit]) | re.split(pattern, string[, maxsplit]):

2.7 sub(repl, string[, count]) | re.sub(pattern, repl, string[, count]):

雙重求和∑∑的定義及性質

屏幕座標系

python中os.path.dirname(file)的使用

windows環境下安裝Python的.whl文件和tar.gz文件

Python文件IO

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結