L2.3.0 正則re模塊 (regex)

正則：re regex。專業做字符串查找篩選。比’’，find()強大的多。有自己專業的語法。優點：功能最爲強大。缺點;學習曲線陡峭。
場景：爬蟲、網頁解析。匹配、flask django框架的路由就是基於正則
regex 三方包，功能比內置的re包更強大
前綴r，raw原始字符串，運行中不需要處理轉義字符
print(‘abc\nabc’) print(r’abc\nabc’)→abc\nabc

1. re函數方法總結

方法名稱	格式	說明
findall	re.findall(表達式，字符串)	返回所有滿足匹配條件的結果，放在列表
search	re.search(表達式，字符串).groups()	函數會在字符串內查找模式匹配，直到找到第一個匹配然後返回一個包含匹配信息的對象，該對象可以通過調用group()方法得到匹配的字符串，如果字符串沒有匹配，則返回None
match	re.match(表達式，字符串).groups()	同search，不過只在字符串開始處進行匹配
split	re.split(表達式，字符串)	按表達式對字符串進行分割，返回列表
sub	re.sub(表達式，替換字符，字符串， count)	按表達式類型替換成新的字符，返回字符串
subn	re.subn(表達式，替換字符，字符串， count)	按表達式類型替換新的字符串，返回元組，存放着替換結果和替換次數
compile	re.compile(表達式)	將正則表達式編譯成爲一個正則表達式對象
finditer	re.finditer(表達式，字符串)	finditer返回一個存放匹配結果的迭代器

2. 貪婪匹配

貪婪匹配：在滿足匹配時，匹配儘可能長的字符串，默認情況下，採用貪婪模式。

正則	待批評字符	匹配結果	說明
<.*>	<script>…<\script>	<script…<\script>	默認爲貪婪模式，會匹配儘量長的字符串
<.*?>	r’\d’	<script> <script>	加上？爲將貪婪匹配模式轉爲非貪婪匹配模式，會盡量匹配短的字符

幾個常用的非貪婪匹配模式：

*？重複任意次，但儘可能少重複
+？重複1次或更多次，但儘可能少重複
？？重複0次或1次，但儘可能少重複
{n, m}? 重複n到m次，但儘可能少重複
{n,}? 重複n次以上，但儘可能少重複

.*? 的用法：

. 是任意字符

是取0至無限長度
？是非貪婪模式。
合在一起就是取儘量少的任意字符，一般不會這樣單獨寫，他大多用在： .*?x
就是取前面任意長度的字符，直到一個x出現

#（1）findall
import re
ret = re.findall('\d','adsf123456we7we')
#匹配字符串中是數字的字符，並將匹配值返回到列表中
print(ret)
#['1', '2', '3', '4', '5', '6', '7']

#(2) search
ret = re.search('\d','adsf123456we7we').group()
#按照表達式匹配到第一個值就返回
print(ret)  # 1

#3）match
ret = re.match('\w','adsf123456we7we').group()
#按照表達式匹配開頭第一個值，符合的話就返回，不符合就報錯
print(ret)  # a

#（4）sub
ret = re.sub('\d','*','adsf123456we7we',0)
#匹配字符串中的數字，並且替換成*號，0表示替換所有
print(ret)   # adsf******we*we


te = 'Tina is a good girl, she is cool, clever, and so on...'
pat = re.compile(r'\w*oo\w*')
print(pat.findall(te))
#['good', 'cool']   查找所有包含'oo'的單詞
#compile() 編譯正則表達式模式，返回一個對象的模式。   pattern：編譯時用的表達式字符串

#（5）subn
ret = re.subn('\d','*','adsf123456we7we',0)
#匹配字符串中的數字，並且替換成*號，返回一個元組，存放這替換結果和替換次數
print(ret)   # ('adsf******we*we', 7)


#（6）compile
obj = re.compile('\d')  #將正則表達式編譯成一個正則表達式對象
ret = obj.search('ads123asd456').group()
print(ret)   # 1


#（7）finditer
ret = re.finditer('\d', 'adsf451we15615adf16')
#finditer返回一個存放匹配結果的迭代器
print(ret)
for i in ret:
    print(i.group())

#<callable_iterator object at 0x000002035EEEE0B8>
# 4
# 5
# 1
# 1
# 5
# 6
# 1
# 5
# 1
# 6

import re
# find()   簡單但功能有限不方便
html = r'<html><body><h1>hello world</h1></body></html>'
start_index = html.find('<h1>')
end_index = html.find('</h1>')
print(html[start_index: end_index+1]) 
  # <h1>hello world<
# 1》匹配固定字符串1次
key = r'javapythonUIVRpythonjava'
patten = re.compile(r'python')
matcher1 = re.search(patten, key)
print(matcher1)
print(matcher1[0])  # python
# compile(正則規則)  返回包含規則的匹配器對象。
# re.search(匹配器對象，待查找字符串)

# 2》任意字符串   .匹配任意字符 +修飾前面的匹配規則重複一次或多次 .+匹配一個或多個任意字符
key2 = r'<h1>hello world</h1>'
patten2 = re.compile(r'<h1>.+</h1>')
matcher2 = re.search(patten2, key2)
print(matcher2[0])   # <h1>hello world</h1>

# 3》匹配 點 加號 轉義\
key3 = r'[email protected]'
patten3 = re.compile(r'.+\.com')  # 判斷用戶輸入是否qq郵箱  .+不太準確
matcher3 = re.search(patten3, key3)
print(matcher3[0])   # [email protected]

# 4> * 前面的字符出現0次或多次
key4 = r'http://www.baidu.com  https://www.baidu.com '
p4 = re.compile(r'https*://')
m4 = re.search(p4, key4)
matcher4 = p4.findall(key4)
print(matcher4)    # ['http://', 'https://']

# 匹配器.findall(帶匹配字符串) 返回列表

# 5>[Aa]匹配一個字符  中括號裏任意一個字符符合就算匹配到
key = r'SelectSELECT'   # sql大小寫不敏感
pa5 = re.compile(r'[Ss][Ee][Ll][Ee][Cc][Tt]')
print(pa5.findall(key))   # ['Select', 'SELECT']

# 6> 排除
key6 = r'mat cat hat pat'
pa6 = re.compile(r'[^p]at')
print(pa6.findall(key6)) # ['mat', 'cat', 'hat']

# 7> 如果符合條件默認匹配儘可能多的字符。貪婪模式
key7 = r'[email protected]'
p7 = re.compile(r'.+@.+\.')
print(p7.findall(key7))  # ['[email protected].']

# 8》 惰性匹配  +？符合任意多字符的情況下  字符最少的
p8 = re.compile(r'.+@.+?\.')
print(p8.findall(key7))  # ['1968608841@163.']

# 9> 固定次數
key9 = r'saas and sas and saaas'
p9 = re.compile(r'sa{1,2}s')
print(p9.findall(key9))  # ['saas', 'sas']

# 10> 匹配換行後的內容  re.S
key10 = r"""
aaahelloaaa
bbb
world
aaa
"""
p10 = re.compile(r'hello.*?world', re.S)
print(p10.findall(key10))
# ['helloaaa\nbbb\nworld']

# 11> 分組  (子正則式)  返回元組，每一項對應每一個子正則式匹配的結果
key11 = r""" hello小明worldaaa """
p11 = re.compile(r'hello(.*?)world(.*?)')
print(p11.findall(key11))
# [('小明', '')]

p11 = re.compile(r'hello(.*?)world')
# [('小明')]

L2.3.0 正則re模塊 (regex)

1. re函數方法總結

2. 貪婪匹配

藍橋15屆stema編程題密碼鎖-動態規劃 C++和Python最後一道題

2021看雪SDC議題回顧 | SaTC：一種全新的物聯網設備漏洞自動化挖掘方法

C# 代碼學習

Kafka存儲機制

aws語音呼叫調用，告警電話

【轉】[C#] WebAPI 防止併發調用二（冥等性）

HTTP URL 詳解

得物 ZooKeeper SLA 也可以 99.99%

創新工具：2024年開發者必備的一款表格控件（二）

車牌識別控制檯可快速整合二次開發

L1.4.1 random模塊

L1.4.2 time時間處理

L1.4 os模塊基礎應用

L1.1 pillow程序(帶有噪點的驗證碼圖片，封裝。干擾圓圈，隨機生成兩條直線。圖片添加字體。字符畫)

L1.2 sqlite3數據庫(概念,基本原理; 數據庫圖像管理工具datagrip, database)

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結