python的正則表達式用法

match函數

import re
# 匹配某個字符串
text = 'hello'
# match接受兩個參數（正則表達式,要匹配的字符串),從字符串的頭開始匹配,如果第一個字符不匹配就認爲找不到
res = re.match('he', text)
# group()把匹配的字符串打印出來
print(res.group())
輸出：he

.:可以匹配任意字符，但是不能匹配換行符\n,如果需要匹配換行符需要添加re.DOTALL參數

import re
text = 'hello'
res = re.match('.', text)
print(res.group())
輸出：h

/d:匹配任意數字0-9

import re
text = '1'
res = re.match('\d', text)
print(res.group())
輸出：1

\D:匹配任意的非數字

import re
text = '+'
res = re.match('\D', text)
print(res.group())
輸出：+

\s:匹配空白字符包括(\n,\t,\r,空格)

import re
text = ' '
res = re.match('\s', text)
print(res.group())
輸出：

\w:匹配的是a-z和A-Z以及數字下劃線

import re
text = '_'
res = re.match('\w', text)
print(res.group())
輸出：_

\W:匹配正好與小寫w相反,小寫w匹配不到的大寫W都能匹配到

import re
text = '+'
res = re.match('\W', text)
print(res.group())
輸出：+

[]:組合的方式，只要滿足中括號中的字符就可以匹配

import re
text = '0376-888888888adads'
res = re.match('[\d\-]+', text)
print(res.group())
輸出：0376-888888888

之前提到的幾種匹配規則可以用中括號代替：

\d : [0-9]
\D : [ ^0-9 ]
\w : [0-9a-zA-Z_]
\W: [ ^0-9a-zA-Z_ ]

匹配多個字符

*:可以匹配0或任意多個字符

import re
text = '0376'
res = re.match('\d*', text)
print(res.group())
輸出：0376

+:匹配1個或者多個字符

import re
text = 'ab+cd'
res = re.match('\w+', text)
print(res.group())
輸出：ab
Ps：當匹配不到會報錯

?:匹配一個或0個

import re
text = 'ab+cd'
res = re.match('\w?', text)
print(res.group())
輸出：a

{m}:匹配m個字符

import re
text = 'abcd'
res = re.match('\w{2}', text)
print(res.group())
輸出：ab

{m,n}:匹配m-n個字符

import re
text = 'abcd'
res = re.match('\w{1,3}', text)
print(res.group())
輸出：abc
以匹配最多的方式匹配，即貪婪模式

實例

驗證手機號碼，第一位必須以1開頭，第二位必須是34578，後面9位可以是任意數字

import re

text = '15517672121'
res = re.match('1[34578]\d{9}', text)
print(res.group())
輸出：15517672121

匹配郵箱:郵箱規則是郵箱名稱是數字，字母，下劃線組成的，然後是@符號，後面就是域名了

import re

text = '[email protected]'
res = re.match('\w+@[a-zA-Z0-9]+\.[a-z]+', text)
print(res.group())
輸出：155176@qq.com

驗證url：前面是http或https或ftp然後再加上一個冒號，再加上兩個斜槓，再後面就是可以出現任意非空白字符了。

import re

url = 'https://www.baidu.com/'
res = re.match('(http|https|ftp)://[^s]+', url)
print(res.group())
輸出：https://www.baidu.com/

驗證身份證：前17位是數字，第18位可以是數字，x，X。

import re

url = '53012119760427732X'
res = re.match('\d{17}[\dxX]', url)
print(res.group())
輸出：53012119760427732X

幾個常用的符號

^(脫字號)：表示以…開始

import re

text = 'hello'
res = re.match('^h', text)
print(res.group())
輸出：h

如果在中括號中，代表的是取反操作。

$：表示以…結尾

import re
text = '[email protected]'
res = re.match('\[email protected]$', text)
print(res.group())
輸出：hello@qq.com

|:匹配多個表達式或者字符串

import re
text = 'ftp'
res = re.match('http|https|ftp', text)
print(res.group())
輸出：ftp

貪婪模式與非貪婪模式

貪婪模式儘量匹配多的字符
import re
text = '2121212'
res = re.match('\d+', text)
print(res.group())
輸出：2121212
使用?開啓非貪婪模式，直匹配符合條件最小的匹配結果
import re
text = '2121212'
res = re.match('\d+?', text)
print(res.group())
輸出：2

轉義字符和原生字符串

在正則表達式中，有些字符串是有特殊意義的字符。因此如果想要匹配這些字符，那麼就必須使用反斜槓進行轉義。

import re

text = 'macbookpro price is $3000'
res = re.search('\$\d+', text)
print(res.group())
輸出：$3000

在python和正則表達式中’'都是用來做轉義的,使用原生字符串後python不在進行轉義

import re

text = '\c'
res = re.search(r'\\c', text)
print(res.group())
輸出：\c

正則表達式常用函數

group()

在正則表達式中，可以對過濾到的字符串進行分組，分組使用圓括號的方式。
group：和group（0）是等價的，返回的是整個滿足條件的字符串
groups：返回的是裏面的子組，索引從1開始
group（1）：返回的是第一個子組，可以傳入多個

import re
text = "apple's price $100,orange's price is $30"
res = re.search(r'.*(\$\d+).*(\$\d+)', text)
print(res.group(0))
print(res.group(1))
print(res.group(2))
print(res.group(1,2))
print(res.groups())
輸出：
apple's price $100,orange's price is $30
$100
$30
('$100', '$30')
('$100', '$30')

findall

找到所有滿足條件的，返回的是一個列表

import re

text = "apple's price $100,orange's price is $30"
res = re.findall(r'\$\d+', text)
print(res)
輸出：['$100', '$30']

sub用來替換字符串，將匹配到的字符串替換成另一個字符串

import re
text = "apple's price $100,orange's price is $30"
#第一個參數表示模式，第二個表示要替換成的字符串，第三個表示源字符串
res = re.sub(r'\$\d+', '$200', text)
print(res)
輸出：apple's price $200,orange's price is $200

split用來分割字符串，返回的是一個列表

import re

text = "hello python"
res = re.split(' ', text)
print(res)
輸出：['hello', 'python']

compile

對於一個經常要用到的正則表達式，可以使用compile進行編譯，後期再使用的時候可以直接拿過來用，執行效率會更快。

import re

text = "the number is 20.50"
r = re.compile('\d+\.?\d*')
res = re.search(r, text)
print(res.group())
輸出：20.50

python的正則表達式用法

匹配多個字符

實例

幾個常用的符號

貪婪模式與非貪婪模式

轉義字符和原生字符串

正則表達式常用函數

Buffer-Overflow Vulnerability Lab

Shellshock Attack Lab

Shellshock

爬取中國天氣網上中國所有城市最低氣溫，存入mongodb，並用pyecharts展示

python對象轉換成json對象，以及json對象轉換成python對象

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結