re
python中re模塊提供了正則表達式相關操作
字符:
. 匹配除換行符以外的任意字符
\w 匹配字母或數字或下劃線或漢字
\s 匹配任意的空白符
\d 匹配數字
\b 匹配單詞的開始或結束
^ 匹配字符串的開始
$ 匹配字符串的結束
次數:
* 重複零次或更多次
+ 重複一次或更多次
? 重複零次或一次
{n} 重複n次
{n,} 重複n次或更多次
{n,m} 重複n到m次
match
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | # match,從起始位置開始匹配,匹配成功返回一個對象,未匹配成功返回None match(pattern, string, flags = 0 ) # pattern: 正則模型 # string : 要匹配的字符串 # falgs : 匹配模式 X VERBOSE Ignore whitespace and comments for nicer looking RE's. I IGNORECASE Perform case - insensitive matching. M MULTILINE "^" matches the beginning of lines (after a newline) as well as the string. "$" matches the end of lines (before a newline) as well as the end of the string. S DOTALL "." matches any character at all , including the newline. A ASCII For string patterns, make \w, \W, \b, \B, \d, \D match the corresponding ASCII character categories (rather than the whole Unicode categories, which is the default). For bytes patterns, this flag is the only available behaviour and needn't be specified. L LOCALE Make \w, \W, \b, \B, dependent on the current locale. U UNICODE For compatibility only. Ignored for string patterns (it is the default), and forbidden for bytes patterns. |
# 無分組 r = re.match("h\w+", origin) print(r.group()) # 獲取匹配到的所有結果 print(r.groups()) # 獲取模型中匹配到的分組結果 print(r.groupdict()) # 獲取模型中匹配到的分組結果 # 有分組 # 爲何要有分組?提取匹配成功的指定內容(先匹配成功全部正則,再匹配成功的局部內容提取出來) r = re.match("h(\w+).*(?P<name>\d)$", origin) print(r.group()) # 獲取匹配到的所有結果 print(r.groups()) # 獲取模型中匹配到的分組結果 print(r.groupdict()) # 獲取模型中匹配到的分組中所有執行了key的組
search
1 2 | # search,瀏覽整個字符串去匹配第一個,未匹配成功返回None # search(pattern, string, flags=0) |
# 無分組 r = re.search("a\w+", origin) print(r.group()) # 獲取匹配到的所有結果 print(r.groups()) # 獲取模型中匹配到的分組結果 print(r.groupdict()) # 獲取模型中匹配到的分組結果 # 有分組 r = re.search("a(\w+).*(?P<name>\d)$", origin) print(r.group()) # 獲取匹配到的所有結果 print(r.groups()) # 獲取模型中匹配到的分組結果 print(r.groupdict()) # 獲取模型中匹配到的分組中所有執行了key的組
findall
1 2 3 | # findall,獲取非重複的匹配列表;如果有一個組則以列表形式返回,且每一個匹配均是字符串;如果模型中有多個組,則以列表形式返回,且每一個匹配均是元祖; # 空的匹配也會包含在結果中 #findall(pattern, string, flags=0) |
# 無分組 r = re.findall("a\w+",origin) print(r) # 有分組 origin = "hello alex bcd abcd lge acd 19" r = re.findall("a((\w*)c)(d)", origin) print(r)
sub
1 2 3 4 5 6 7 8 | # sub,替換匹配成功的指定位置字符串 sub(pattern, repl, string, count = 0 , flags = 0 ) # pattern: 正則模型 # repl : 要替換的字符串或可執行對象 # string : 要匹配的字符串 # count : 指定匹配個數 # flags : 匹配模式 |
# 與分組無關 origin = "hello alex bcd alex lge alex acd 19" r = re.sub("a\w+", "999", origin, 2) print(r)
split
1 2 3 4 5 6 7 | # split,根據正則匹配分割字符串 split(pattern, string, maxsplit = 0 , flags = 0 ) # pattern: 正則模型 # string : 要匹配的字符串 # maxsplit:指定分割個數 # flags : 匹配模式 |
# 無分組 origin = "hello alex bcd alex lge alex acd 19" r = re.split("alex", origin, 1) print(r) # 有分組 origin = "hello alex bcd alex lge alex acd 19" r1 = re.split("(alex)", origin, 1) print(r1) r2 = re.split("(al(ex))", origin, 1) print(r2)
IP:^(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}$ 手機號:^1[3|4|5|8][0-9]\d{8}$ 郵箱: [a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+(\.[a-zA-Z0-9_-]+)+