正則表達式常用方法

正則切割

import re
c =re.compile(r'\d')
s='hello52bthg235gfre'
ret=c.split(s，count=2)
print(ret)

正則替換

import re

r='sd7fB2dv2B1cfB;'
d=re.sub('\d','&',r,count=2)
print(d)#替換，及指定替換次數，不指定就全部替換

s = "1abc23def45"
#用？將s中符合r"\w"匹配的替換，打印替換後的結果及替換個數在一個元組內。
print(str(re.subn(r"\w", "?", s)[1]))#可以在中括號內輸入下標選擇打印


#替換
c =re.compile(r'\d')
s='hello52bthg235gfre'
ret=c.sub('&',s,count=2)
print(ret)
#封裝調用函數人爲干預替換過程
def double(s):
    print(s)
    return str(int(s.group())*2)
##先用正則公式對字符串進行匹配，匹配到後對應給要替換的位置參數，一看是函數把正則結果返回到函數進行處理獲取返回值進行替換
d=re.sub(r'\d', double, 'dg2hfgh4fh')

應用場景

特定規律字符串的查找切割替換等
郵箱格式、URL、IP等的校驗
爬蟲項目中，特定內容的提取

使用原則

只要使用字符串等函數解決的問題，就不要使用正則
正則的效率比較低，同時會降低代碼的可讀性。
世界上最難理解的三樣東西：醫生的處方、道士的神符、碼農的正則。
提醒：正則是用來寫的，不是用來讀的；在不指定功能的情況下，不要試圖閱讀別人的正則。

基本使用match/search/findall/compile

說明：正則不是我們寫的，python中正則的解析通過re模塊完成
相關函數：
- match：從開頭進行匹配，匹配到就返回正則結果對象，沒有返回None。
- search：從任意位置匹配，功能同上。
  #從開頭查找，找到返回正則匹配結果對象，沒有找到返回None
  #m = re.match(‘abc’, ‘abcdask;jabcsdajl’)
  #從任意位置查找，功能同上
  m = re.search(‘abc’, ‘shdalsjdabchaoeor’)
  
  if m:
  # 返回匹配的內容
  print(m.group())
  # 返回匹配內容位置
  print(m.span())
- findall：全部匹配，返回匹配到的內容組成的列表，沒有返回空列表。
  #先生成正則表達式對象
  c =re.compile(‘hello’)
  #從開頭匹配
  x =c.match(‘scdvdshello;csd55’)
  #從任意位置查找
  z =c.search(‘scdvdshello;csd55’)
  #返回所有匹配內容的列表，若沒找到返回空列表
  #f =c.findall(‘scdvdshello;csd55’)
  f =re.findall(‘ab’,‘scabvdshelablo;csabd55’)
  #與上邊等同
  f =re.findall(’(ab)’,‘scdvabdshabello;csabd5’)
  #按照正則公式整體查找但是隻打印括號內的，有多個括號時匹配到打印各個括號內的內容到一個元組內
  f =re.findall(‘sd(ab)hn’,‘ssdabhnllsdabhn5’)
  [‘ab’, ‘ab’]
  f =re.findall(‘sd(ab)h(ab)n’,‘ssdabhabn5’)
  [(‘ab’, ‘ab’)]
  #先按照最外邊括號匹配找到打印括號內內容，再去掉外邊括號再次匹配打印
  f =re.findall(’(sd(ab)h(ab)n)’,‘ssdabhabn5sdabhabn’)
  [(‘sdabhabn’, ‘ab’, ‘ab’), (‘sdabhabn’, ‘ab’, ‘ab’)]
  #按照括號裏的或語法依次匹配打印
  f =re.findall(‘a(hello|world)c’,‘sahellochg25aworldcn’)
  [‘hello’, ‘world’]
  #先匹配外部括號再去掉外部括號再次匹配分別打印到元組
  f =re.findall(’(a(hello|world)c)’,‘sahellochg25aworldcn’)
  [(‘ahelloc’, ‘hello’), (‘aworldc’, ‘world’)]
  print(f)
  
  #匹配所有內容，找到返回所有匹配的內容列表，沒有找到返回空列表
  f = re.findall(‘abcd’, ‘adsjkjdabcajsdlasabcjsdlaabc’)
  print(f)
- compile：生成正則表達式對象
  #先生成正則表達式對象
  c = re.compile(‘hello’)
  #print(type©)
  
  #從開頭匹配
  m = c.match(‘hellosadk;ask;kahellosadhlkas’)
  print(m)
  
  #從任意位置匹配
  s = c.search(‘hadlsjasdhellokjsdlks’)
  print(s)
  
  #匹配所有內容
  f = c.findall(‘hellosdhasdjahelloshdajldhello’)
  print(f)
  說明：此方式更加靈活，將正則匹配分爲兩步完成(先生成正則對象，然後需要時進行匹配)

正則規則

單個字符\d \B [ ]

普通字符：簡單來說就是一對一的完全匹配。
[]：中間的任意一個字符
	[a-z]：a~z之間的任意字符(所有的小寫字母)
	[a-zA-Z]：匹配所有的字母，多個連續的片段中間不能有任何多餘的字符
	[^0-9]：匹配除0-9以外的任意字符
. ：匹配除'\n'以外的任意字符
\d：數字字符，等價於[0-9]
\D：非數字字符，等價於[^0-9]
\w：匹配字(數字、字母、下劃線、漢字)
\W：匹配非字(\w相反的內容)
\s：所有的空白字符(\n、\t、\r、空格)
\S：非空白字符(\s相反的內容)
\b：詞邊界(開頭、結尾、空格、標點等)
\B：非詞邊界(\b相反的內容)

次數限定+ ? {m,n}

*：前面的字符出現任意次
+：至少一次
?：最多一次
{m,n}：m <= 次數 <= n
{m,}：至少m次
{,n}：至多n次
{m}：指定m次

邊界限定:^ $

^：以指定的內容開頭
$：以指定的內容結束
示例：

import re

#^：以指定內容開頭
#f = re.findall(’^hello’, ‘hellosjdhelloalskdhello’)

#$以指定內容結尾
f = re.findall(‘hello$’, ‘hellosjdhelloalskdhello’)

print(f)

優先級控制:’|’

|：表示或，它擁有最低的優先級
()：表示一個整體，可以明確的指定結合性/優先級
示例：

import re

f = re.findall(‘a(hello|world)c’, ‘sahdjsahellocaworldcjsdhs’)

print(f)

分組匹配:\1

示例1:

import re

c = re.compile(r’(\d+)([a-z]+)(\d+)’)

s = c.search(‘sdsdj235sadhasjd36432sjdha’)

#完整匹配結果
print(s.group(), s.span())
#等價於上式
print(s.group(0), s.span(0))
#第幾個()匹配的結果
print(s.group(1), s.span(1))
print(s.group(2), s.span(2))
print(s.group(3), s.span(3))
示例2:

import re

#固定匹配
#c = re.compile(r’\w+</\1>’)

#\1表示前面第一個()匹配的內容
#c = re.compile(r’<([a-z]+)><([a-z]+)>\w+</\2></\1>’)

#給分組()起名字
c = re.compile(r’<(?P[a-z]+)><(?P[a-z]+)>\w+</(?P=dahua)></(?P=goudan)>’)

s = c.search(’<div><a>百度一下</a></div>’)

if s:
print(s.group())

貪婪匹配

貪婪：最大限度的匹配。正則的匹配默認就是貪婪的。
非貪婪：只要滿足條件，能少匹配就少匹配。’?'經常用於取消貪婪

示例：

import re

# ?：取消任意多次的貪婪匹配
# c = re.compile(r'a.*?b')#可以沒有，從左側開始符合就返回儘可能的少匹配

# ?：取消至少一次的貪婪匹配
c = re.compile(r'a.+?b')#至少要有一個，從左側開始符合就返回儘可能的少匹配

s = c.search('ssjdsdabjdsd')

if s:
    print(s.group())

匹配模式re.I忽略大小寫/re.M多行匹配/re.S作爲單行處理，忽略\n

說明：所謂匹配模式就是對匹配的原則進行整體的修飾。
示例：

import re

#忽略大小寫re.I
s = re.search(r’hello’, ‘HELLO world’, re.I)

#正則默認是單行匹配，使用 re.M 可以進行多行匹配
s = re.search(r’^hello’, ‘sdhasdh\nhello world’, re.M)

#使.匹配任意字符，作爲單行處理，忽略\n
#string = ‘<div>hello</div>’
string = ‘’’<div>
hello
</div>’’’
s = re.search(r’<div>.*?</div>’, string, re.S)

if s:
print(s.group())

import re

m=re.match(‘abc’,‘vvvcxdabc52ff2d…’)#只從開頭查找
print(m)
n=re.search(‘abc’,‘vvvcxdabc52ff2d…’)#從任意位置查找
print(n)

if m:
print(m.group())#打印匹配內容
print(m.span())#打印匹配內容位置

f = re.findall(‘abc’,‘5abc1cdfdabc2abc15f.;’)
print(f)#返回所有匹配內容的列表，若沒找到返回空列表

#先生成正則表達式對象
c =re.compile(‘hello’)

#從開頭匹配
x =c.match(‘scdvdshello;csd55’)
#從任意位置查找
z =c.search(‘scdvdshello;csd55’)
#返回所有匹配內容的列表，若沒找到返回空列表
f =c.findall(‘scdvdshello;csd55’)

#單個字符
#匹配連續字符，[]中間的任意一個字符
#[a-z]:a~z之間的任意一個字母
c=re.compile(’[sdc]’)
d=re.compile(’[a-z]’)
#匹配所有的字母，多個連續的片段中間
#不能有任何多餘的字符
e=re.compile(’[a-zA-Z]’)
#匹配相反的字符（除指定的字符以外的）
#加上個’^‘箭頭
g=re.compile(’[^fas]’)
#’.‘匹配出’\n’以外的任意字符
o=re.compile(’.’)

#\d只匹配0~9的數字
k=re.compile(’\d’)
#\D匹配除數字以外的字符

#\w匹配字（數字，字母，下劃線，漢字）

#\W非字以外的字符

#\s所有的空白字符（\n，\t，\r空格）
j=re.compile(’\s’)
v=c.findall(‘scfvsa sfv\ndcs’)
#\S非空白字符

#\b詞邊界\b在python中有特殊意義需要加r
l=re.compile(r’\bhello’)
n=c.findall(‘helloscfvs,helloavdcshello’)
#\B非詞邊界

#用控制字符可以出現任意次
h=re.findall('abc’,‘savdfdbbabbcbbabcbbfvwe’)
#要求某字符至少出現一次
m=re.findall(‘ab+c’,‘savdfdbbabbcbbabcbbfvwe’)
#要求至多一次
q=re.findall(‘ab？c’,‘savdfdbbabbcbbabcbbfvse’)
#指定出現次數
p=re.findall(‘ab{3}c’,‘savdfdbbabbcbbabcbbfse’)
#指定出現次數的範圍，包括邊界[1,3]
r=re.findall(‘ab{1,3}c’,‘sdavdfdbbabfvswe’)
#指定至多多少次
s=re.findall(‘ab{0,3}c’,‘sdavdfdbbabbbcbswe’)
#指定至少多少次
t=re.findall(‘ab{3,}c’,‘sdavdfdbbabbbcbbbfvswe’)

#邊界限定
#以指定內容開頭
t=re.findall(’^hello’,‘hellosdafdbhellobfvswe’)
#以指定內容結尾
u=re.findall(‘hello$’,‘hellosdafdbhellobhello’)

#優先級控制
#指定內容有多個時，’|'它擁有最低優先級出現hello或world或兩個都有都可以
w=re.findall(‘hello|world’,‘hellosdafdbhworldellllo’)
#a和c之間出現hello或world，只能有兩個中的一個
y=re.findall(‘a(hello|world)c’,‘hellosdafdbhworldellllo’)
print(w)

#分組匹配

#固定匹配:\1代表前邊第一個括號內的內容
j=re.compile(r’<([a-z]+)>\w+</\1>’)
s=j.search(’<a>百度</a>’)

j=re.compile(r’<([a-z]+)><([a-z]+)>\w+</\2></\1>’)
s=j.search(’<div><a>百度</a></div>’)

#給分組（）起名字
j=re.compile(r’<(?P[a-z]+)><([a-z]+)>\w+</\2></(?P=hello)>’)
s=j.search(’<div><a>百度</a></div>’)

#貪婪匹配
n=re.compile(r’a.b’)
m=n.search(‘cdfadfdacvbdvd’) #儘可能多的匹配
print(m)
#非貪婪
#’?'取消任意次數貪婪匹配
n=re.compile(r’a.？b’)
m=n.search(‘cdfadfdacvbdvd’) #儘可能少的匹配
print(m)

#’?‘取消至少一次的貪婪匹配
n=re.compile(r’a.+？b’)
m=n.search(‘cdfadfdacvbdvd’) #儘可能少的匹配
print(m)

#忽略大小寫re.I
n=re.compile(r’hello’,‘hello world’,re.I)
#正則是默認單行匹配，使用re.M可以進行多行匹配
n=re.compile(r’hello’,‘hello\nworld’,re.M)

#使’.‘匹配任意字符，re.s作爲單行處理，忽略\n
string=’<div>hello</div>’

n=re.compile(r’’)

正則表達式常用方法

Kafka存儲機制

aws語音呼叫調用，告警電話

【轉】[C#] WebAPI 防止併發調用二（冥等性）

HTTP URL 詳解

項目上線阿里雲並配置ssl證書實現https訪問

無參裝飾器函數和帶參裝飾器函數

xpath取出某個標籤下多個標籤的所有文本信息幾種方法

Flask，Django項目收發郵件及python的email和smtplib模塊收發郵件

Github常用命令，管理Git倉庫文檔分享

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結