1. 正則表達式語法
2. 常用的正則表達式處理函數
re.match嘗試從字符串的開始匹配一個模式
import re
text = "Elaine is a beautiful girl, she is cool,clever, and so on..."
#match從字符串的開始匹配一個模式
m = re.match(r"(\w+)\s",text)
if m:
print m.group()
else:
print 'not match'
re.search 從字符串的任一位置去匹配一個模式
re.match("c","abcdef") # No Match
re.search("c", "abcdef") # Match
re.sub 替換字符串中的匹配項
re.sub(r'\s+','-',string)
#將string串中的空格替換成"-"
re.split 分割字符串
#將字符串按空格分割成一個單詞列表
re.split(r'\s+',text)
re.findall 獲取字符串中所有匹配的串
#匹配所有包含o的單詞
re.findall(r’\w*o\w*',text)
>>>['cool', 'so', 'on']
re.complie 把正則表達式編譯成一個正則表達式對象,可以複用該正則表達式
import re
text = "Elaine is a beautiful girl, she is cool,clever, and so on..."
pattern = re.compile(r'\w*o\w*')
newstring = lambda m: '[' + m.group(0) + ']'
print pattern.findall(text)
print pattern.sub(newstring, text)
#將單詞中包含o的用“[]"括起來
4. 正則表達式實例
teststr1 = "800-123-4234"pattern1 = re.compile(r'^\d{3}-\d{3}-\d{4}$')print pattern1.findall(teststr1)
teststr2 = "email: [email protected], email: [email protected]"pattern2 = re.compile(r"\w+:\s+\w+@\w+\.(?:org|com|net)") #()中的?:表示括號內的內容不做爲分組print pattern2.findall(teststr2)
實例3.
teststring=["HELLO world","hello world!","hello world"]
expressions = ["hello","HELLO"]
for string in teststring:
for expression in expressions:
if re.search(expression,string):
print expression , "found in string" ,string
else:
print expression , "not found in string" ,string