Python正則表達式re講解

正則表達式

正則表達式RE是一種小型的、高度專業化的編程語言，在Python中內嵌，通過re模塊實現

正則表達式包括：

字符匹配：
普通字符：大多數字母和字符，如正則表達式test會和字符串“test”完全匹配
元字符 . ^ $ * + ? [] {} \ | ()
RE最重要的就是學習元字符的使用，以實現更多的匹配規則

import re
s= r'abc'	#定義正則表達式模式
re.findall(s,str1)
res= r"t[io]p"
res= r"t[^io]p"		匹配非，除外

元字符講解


	[] 用來指定一個字符集：[abc],[a-z],[0-9][a-zA-Z0-9]
	元字符在字符集中無效：[akm$]
	匹配字符集取反，匹配除外：[^5]

	^ 匹配行首，
	$	匹配行尾，行尾被定義爲要麼是字符串尾，要麼是一個換行字符後的任何位置

case
s="helloworld,hello_boy"
r=r"hello"
re.findall(r,s)
>>> re.findall(r,s)
['hello', 'hello']
>>> r1=r"hello*"
>>> re.findall(r1,s)
['hello', 'hello']
>>> r2=r"^hello"
>>> re.findall(r2,s)
['hello']
>>> r3=r"boy$"
>>> re.findall(r3,s)
['boy']
>>>

元字符匹配實戰


case
>>> import re
>>> r=r"^abc"
>>> re.findall(r,'abc')
['abc']
>>> re.findall(r,'aa abc')
[]
>>> re.findall(r,'aa ^abc')
[]
>>> re.findall(r,'^abc  ^abcd  ^abcde')
[]
>>> re.findall(r,'^abc  ^abcd  ^abcde')
[]
>>> r1=r"\^a"
>>> re.findall(r1,'^abc  ^abcd  ^abcde')
['^a', '^a', '^a']

\ 反斜槓\後面可以加不同的字符以表示不同特殊意義，也可以用於取消所有的元字符：\[或\\
\d匹配任何十進制數，相當於類[0-9]
\D匹配任何非數字字符，相當於類[^0-9]
\w匹配任何字母數字字符，相當於[a-zA-Z0-9]
\W匹配任何非字母數字字符，相當於[^a-zA-Z0-9]

case
>>> r=r"[0-9]"
>>> re.findall(r,'123456789')
['1', '2', '3', '4', '5', '6', '7', '8', '9']
>>> re.findall(r,'12345 6789')
['1', '2', '3', '4', '5', '6', '7', '8', '9']
>>> r1=r"\d"
>>> re.findall(r1,'12345 6789')
['1', '2', '3', '4', '5', '6', '7', '8', '9']
>>> r2=r"\w"
>>> re.findall(r2,'hsdkf 1234 ^jlk ab23')
['h', 's', 'd', 'k', 'f', '1', '2', '3', '4', 'j', 'l', 'k', 'a', 'b', '2', '3']
>>>

重複*
正則表達式第一功能是能夠匹配不定長的字符集，另一個功能室可以指定正則表達式的一部分重複次數， * 指定前一個字符可以被匹配0或任意多次。如 a[bcd]*b匹配"abcbd"
匹配電話號碼
r=r"^010-\d{8}"		//將前面匹配規則重複{}次數

case
>>> r3=r"a[bcd]*b"
>>> re.findall(r3,'abcbd')
['abcb']
>>> r4=r"^010-\d{8}"
>>> re.findall(r3,'010-29818485')
[]
>>> re.findall(r4,'010-29818485')
['010-29818485']
>>> r5=r"^010-\d{6}"
>>> re.findall(r5,'010-29818485')
['010-298184']
>>> re.findall(r4,'010-298184')
[]
>>> r6=r"^a[bcd]*F$"
>>> re.findall(r6,'adbccdF')
['adbccdF']
>>> re.findall(r6,'aF')
['aF']
>>> r7=r"^a[bcd]{3}F$"
>>> re.findall(r6,'abbbF')
['abbbF']
>>> re.findall(r6,'abddF')
['abddF']
>>>

+	表示匹配1或多次，注意*與+號不同，*可以匹配0或多，+ 匹配1或多次。
		點號，代表任意一個字符
		匹配0或1次，通常可用於標識某事物是可選的。比如匹配電話號中間的-，字符串中間的下劃線_
case
>>> 
>>> r8=r"^010-?\d{8}"
>>> re.findall(r8,'010-29818485')
['010-29818485']
>>> re.findall(r8,'01029818485')
['01029818485']
>>> r9=r"^010-?\d{8}$"
>>> re.findall(r8,'01029818485ab')
['01029818485']
>>> re.findall(r9,'01029818485ab')
[]
>>> 

>>> a1=r"hello.world"
>>> re.findall(a1,'hello world')
['hello world']
>>> re.findall(a1,'hello-world')
['hello-world']
>>> re.findall(a1,'helloworld')
[]
>>> re.findall(a1,'hello?world')
['hello?world']
>>>

{m,n}	花括號，其中m,n是十進制數，該限定符意思是至少m~n個重複。忽略m則下邊界是0，忽略n上邊界爲無窮大。
{0,} 等同於 *，{1,} 等同於+，而{0,1}等同於?		非常靈活

re compile

如果某個正則表達式 r“regrexpre” 使用率較高，更好的方式是利用re模塊自帶的編譯compile將正則表達式編譯，這樣每次re匹配時就不需要re模塊解釋器翻譯了，速度更快。

p_tel = re.compile(r8)

>>> p_tel.findall('01029818485ab')
['01029818485']
>>> 

編譯之後的re正則，會比未編譯的速度快很多。而且編譯後還可以選擇其他一些參數，例如re.compile()中接受可選的標誌參數，常用來實現不同的特殊功能，指定是否區分大小寫，讓正則更靈活
反斜槓的麻煩----字符串前加“r”反斜槓就不會被任何特殊方式處理

使用re編譯後的對象，執行匹配，可以利用re內的方法執行更復雜精確的匹配。---‘RegexObject’實例有一些方法和屬性，完整的列表可查閱Python library Reference
match()		決定RE時候在字符串剛開始的位置匹配（在開頭位置匹配）
search()	掃描字符串，找到這個RE匹配的位置
findall()	找到這個RE匹配的所有子串，並把他們作爲一個列表返回
finditer()	找到這個RE匹配的所有子串，並把他們作爲一個迭代器返回

如果沒有匹配到的話，match()和search()將返回None，如果成功的話，就會返回一個‘MatchObject’實例。
>>> p_tel.match('01029818485ab')
<_sre.SRE_Match object at 0x0139A838>
>>> p_tel.match('01029818485ab8')
<_sre.SRE_Match object at 0x0139A870>
>>> p_tel.match('0102981ab8485ab8')
>>> 未匹配到，返回None

RE分組 group


case

>>> email = r"\w{3}@\w+(\.com|\.cn)"
>>> re.match(email,'[email protected]')
<_sre.SRE_Match object at 0x01390560>
>>> re.match(email,'www.ilovepython.cn')
>>> re.match(email,'[email protected]')
<_sre.SRE_Match object at 0x013996E0>
>>> re.match(email,'[email protected]')
>>> re.findall(email,'[email protected]')
['.cn']
>>> 

分組(re)可以把多種或or關係的放在一起，或進行其他操作，當使用分組後，findall()會優先返回re分組中的匹配結果。re分組的這一特點可以應用與篩選結果，鏈接，標籤等：見下case
>>> s="""hhsdj  dskj  hello  src=csvt yes jdjsds
   djhsjk  src=123 yes jdsa
   src=234 yes
   hello src=python yes ksa
   """

>>> s
'hhsdj  dskj  hello  src=csvt yes jdjsds\n   djhsjk  src=123 yes jdsa\n   src=234 yes\n   hello src=python yes ksa\n   '
>>> import re
>>> r1=r"hello src=.+ yes"
>>> re.findall(r1,s)
['hello src=python yes']
>>> r2=r"hello src=(.+) yes"
>>> re.findall(r2,s)
['python']

Python正則表達式re講解

正則表達式

正則表達式包括：

元字符講解

元字符匹配實戰

re compile

RE分組 group

AI模型 Llama 3體驗筆記

【面試準備】又一次失敗的面試經歷，題目離譜～資深軟件測試工程師

dotnet 8 版本與銀河麒麟V10和UOS系統的 glibc 兼容性

python3字符串string 方法示例

Python模塊和包的邏輯層級及如何import

遊戲設計模式---命令模式

Python第三方庫outline

如何系統化學Python？

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結