在正文開始之前向大家介紹一個在線的很好用的正則表達式規則測試的網站 https://regex101.com/
本文簡要介紹Shell編程中正則表達式的相關內容,所謂正則表達式(regular expression)是指一種字符串匹配的模式(pattern),其可以用來檢查一個串是否含有某種子串、將匹配的子串替換或者從某個串中取出符合某個條件的子串等。值得一提的是,不同的編程語言的正則表達式大同小異,但略有不同。
對正則表達式進行詳細展開前首先要明確其與通配符的不同:
- 正則表達式用來在文件中匹配符合條件的字符串,其屬於包含匹配。所謂包含匹配是指當某一行內容中包含符合正則表達式的字符串時,該行能夠被篩選出來,因此當正則表達式更詳細時,篩選出的內容更爲具體。grep、awk、sed等命令可以支持正則表達式。
- 通配符一般用來匹配符合條件的文件名,通配符是完全匹配。ls、cp、find這些命令不支持正則表達式,所以只能使用shell自己的通配符來進行匹配。
- 一般可以認爲,正則表達式用於字符串匹配,即文件內容的匹配,而通配符用於文件名的匹配,兩者的相同符號具有不同的含義
瞭解了正則表達式與通配符的不同之後,正式進入正則表達式的學習,有關正則表達式的一些基礎規則(部分規則)如下
元字符 | 作 用 |
* | 前一個字符匹配0次或任意多次 |
. | 匹配除了換行符外任意一個字符 |
^ | 匹配行首。例如^hello會匹配以hello開頭的行 |
$ | 匹配行尾。例如hello$會匹配以hello結尾的行 |
[] | 匹配中括號中指定的任意一個字符,只匹配一個字符。 例如[aoeiu]匹配任意一個元音字母,[0-9]匹配任意一位數字,[a-z][0-9]匹配小寫字母和一位數字構成的兩位字符 |
[^] | 匹配除中括號的字符以外的任意一個字符。例如[^0-9]匹配任意一位非數字字符,[^a-z]表示任意一位非小寫字母 |
\ | 轉義符。用於取消特殊符號的含義 |
\{n\} | 表示其前面的字符恰好出現n次。例如[0-9]\{4\}匹配4爲數字 |
\{n,\} | 表示其前面的字符出現不小於n次。例如[0-0]\{2,\}表示兩位及以上的數字 |
\{n,m\} | 表示其前面的字符至少出現n次,最多出現m次。例如:[a-z]\{6,8\}匹配6-8位的小寫字母 |
爲更具體的說明正則表達式的匹配規則,編寫了測試文件regex_test.txt,內容如下
This is a txt file about regex rule.
For test regex rule,i try to write this file.
Maybe there are some wrong words because of testing.
said
sold
saaaid
555nice
nic55e
以上述文件爲基礎,對正則表達式的規則作出以下測試
- "*"
- grep "a*" regex_test.txt
- 匹配所有內容,包括空白行
- grep "aa*" regex_test.txt
- 匹配至少一個a的行
[root@localhost tmp]# grep "a*" regex_test.txt
This is a txt file about regex rule.
For test regex rule,i try to write this file.
Maybe there are some wrong words because of testing.
said
sold
saaaid
555nice
nic55e
[root@localhost tmp]# grep "aa*" regex_test.txt
This is a txt file about regex rule.
Maybe there are some wrong words because of testing.
said
saaaid
- "."
- grep "s..d" regex_test.txt
- 匹配在s和d之間一定有兩個字符的字符串
- grep "s.*d" regex_test.txt
- 匹配在s和d之間有任意字符的字符串
[root@localhost tmp]# grep "s..d" regex_test.txt
said
sold
[root@localhost tmp]# grep "s.*d" regex_test.txt
Maybe there are some wrong words because of testing.
said
sold
saaaid
- "^" "$" "\"
- grep "^M" regex_test.txt
- 匹配以大寫M開頭的行
- grep "\.$" regex_test.txt
- 匹配以.結尾的行
- grep -n "^$" regex_test.txt
- 匹配空白行並顯示行號
[root@localhost tmp]# grep "^M" regex_test.txt
Maybe there are some wrong words because of testing.
[root@localhost tmp]# grep "\.$" regex_test.txt
This is a txt file about regex rule.
For test regex rule,i try to write this file.
Maybe there are some wrong words because of testing.
[root@localhost tmp]# grep -n "^$" regex_test.txt
2:
4:
6:
8:
10:
12:
14:
- "[]"
- grep "s[ao]id" regex_test.txt
- 匹配字母s和字母串id中,要不爲a,要不爲o
- grep "[0-9]" regex_test.txt
- 匹配任意一個數字
- grep "^[0-9]" regex_test.txt
- 匹配以數字開頭的行
[root@localhost tmp]# grep "s[ao]id" regex_test.txt
said
[root@localhost tmp]# grep "[0-9]" regex_test.txt
555nice
nic55e
[root@localhost tmp]# grep "^[0-9]" regex_test.txt
555nice
- "[^]"
- grep "^[^a-z]" regex_test.txt
- 匹配不用小寫字母開頭的行
- grep "^[^a-zA-Z]" regex_test.txt
- 匹配不用字母開頭的行
[root@localhost tmp]# grep "^[^a-z]" regex_test.txt
This is a txt file about regex rule.
For test regex rule,i try to write this file.
Maybe there are some wrong words because of testing.
555nice
[root@localhost tmp]# grep "^[^a-zA-Z]" regex_test.txt
555nice
- "\{n\}" "\{n,\}" "\{n,m\}"
- grep "[0-9]\{3\}" regex_test.txt
- 匹配包含3個連續數字的字符串
- grep "[0-9]\{2,\}" regex_test.txt
- 匹配最少連續兩個數字的字符串1
- grep "sa\{1,3\}id" regex_test.txt
- 匹配s和id之間至少有一個a至多有3個a的字符串
[root@localhost tmp]# grep "[0-9]\{3\}" regex_test.txt
555nice
[root@localhost tmp]# grep "[0-9]\{2,\}" regex_test.txt
555nice
nic55e
[root@localhost tmp]# grep "sa\{1,3\}id" regex_test.txt
said
saaaid
本篇博客只包含Shell編程的部分正則表達式規則,其餘例如"?","+"等在後續博客中更新