正則表達式語法講解(一)

       

Syntax of Regular Expressions(1)

(正則表達式語法1)

              

Important note
Below is the description of regular expressions implemented in freeware library TRegExpr. Please note, that the library widely used in many free and commertial software products. The author of TRegExpr library cannot answer direct questions from this products' users. Please, send Your questions to the product's support first.
重要事項
以下是對自由軟件TregExpr庫實現的正則表達式的說明。請注意,這個庫廣泛用於很多免費和商業軟件產品。但TregExpr的作者不會直接回答來自使用這些產品的用戶的問題。如果(這些用戶)需要幫助,請先發送你的問題到這些產品的售後服務部門。
 
Introduction
Regular Expressions are a widely-used method of specifying patterns of text to search for. Special metacharacters allow You to specify, for instance, that a particular string You are looking for occurs at the beginning or end of a line, or contains n recurrences of a certain character.
介紹
正則表達式是廣泛使用的、根據指定的文本模式進行查找的方法。它允許你指定特殊的原字符,比如你可以查找位於一行開頭或結尾的特殊字符串,或者包括n個重複出現字符的字符串。
 
Regular expressions look ugly for novices, but really they are very simple (well, usually simple ;) ), handly and powerfull tool.
正則表達式對於初學者看來很費解,其實它真的是非常簡單、易學和強大的工具。
 
I recommend You to play with regular expressions using RegExp Studio - it'll help You to uderstand main conceptions. Moreover, there are many predefined examples with comments included into repository of R.e. visual debugger.
我建議你使用RegExp Studio學習正則表達式-它可以幫助你理解主要的概念。另外,R.e. visual debugger的資料庫裏面還有很多有註釋的完整示例。
 
Let's start our learning trip!
讓我們開始吧!
 
Simple matches
 
Any single character matches itself, unless it is a metacharacter with a special meaning described below.
簡單匹配
任何一個字符匹配它自己,除非它是下面有特殊含義的元字符。
 
A series of characters matches that series of characters in the target string, so the pattern "bluh" would match "bluh'' in the target string. Quite simple, eh ?
一系列的字符匹配目標串中相同的字符,所以“bluh”模式匹配目標串裏的“bluh”。非常簡單,不是嗎?
 
You can cause characters that normally function as metacharacters or escape sequences to be interpreted literally by 'escaping' them by preceding them with a backslash "/", for instance: metacharacter "^" match beginning of string, but "/^" match character "^", "//" match "/" and so on.
你可以使字符作爲一個元字符的功能處理,或者通過在它們的前面加反斜線“/” 做轉義序列處理,即按它們的字面意思進行解釋,比如:元字符“^”匹配字符串的開頭,但“/^”匹配字符“^”,同樣的有“//”表示“/”等。
 
Examples:
 
 foobar          matchs string 'foobar'
 /^FooBarPtr     matchs '^FooBarPtr'
 
Note for C++ Builder users
Please, read in FAQ answer on question Why many r.e. work wrong in Borland C++ Builder?
C++Builder的使用者注意
請閱讀FAQ中回答的關於爲什麼許多r.e在Borland C++ Builder無法正常工作的問題?
 
Escape sequences
轉義序列
 
Characters may be specified using a escape sequences syntax much like that used in C and Perl: "/n'' matches a newline, "/t'' a tab, etc. More generally, /xnn, where nn is a string of hexadecimal digits, matches the character whose ASCII value is nn. If You need wide (Unicode) character code, You can use '/x{nnnn}', where 'nnnn' - one or more hexadecimal digits.
 
 
 /xnn     char with hex code nn
 /x{nnnn} char with hex code nnnn (one byte for plain text and two bytes for Unicode)
 /t       tab (HT/TAB), same as /x09
 /n       newline (NL), same as /x0a
 /r       car.return (CR), same as /x0d
 /f       form feed (FF), same as /x0c
 /a       alarm (bell) (BEL), same as /x07
 /e       escape (ESC), same as /x1b
 
 /xnn     16進制nn形式的字符
 /x{nnnn} 16進制nnnn形式的字符(一字節用於明文,兩字節用於Unicode)
 /t       tab (HT/TAB), 同/x09
 /n       換行 (NL), 同/x0a
 /r       回車(CR), 同/x0d
 /f       換頁 (FF), 同/x0c
 /a       報警 (bell) (BEL), 同/x07
 /e       逃逸符 (ESC), 同/x1b
 
 
Examples:
 
 foo/x20bar   matchs 'foo bar' (note space in the middle)
 /tfoobar     matchs 'foobar' predefined by tab
 foo/x20bar   匹配’foo bar’(注意中間的空格)
 /tfoobar     匹配前面有tab的’foobar’
 
Character classes
字符類
You can specify a character class, by enclosing a list of characters in [], which will match any one character from the list.
你可以通過用[]包括一系列字符指定一個字符類, 將匹配任何[]中的字符。
 
If the first character after the "['' is "^'', the class matches any character not in the list.
如果[後第一個字符使“^”,這個類將匹配任何不在這個[]裏的的列表。
 
Examples:
 foob[aeiou]r   finds strings 'foobar', 'foober' etc. but not 'foobbr', 'foobcr' etc.
 foob[^aeiou]r find strings 'foobbr', 'foobcr' etc. but not 'foobar', 'foober' etc.
 foob[aeiou]r   匹配'foobar', 'foober'等,但不匹配'foobbr', 'foobcr'等.
 foob[^aeiou]r 匹配'foobbr', 'foobcr'等,但不匹配'foobar', 'foober'等.
 
 
Within a list, the "-'' character is used to specify a range, so that a-z represents all characters between "a'' and "z'', inclusive.
在一個列表中,“-”表示一個範圍,所以a-z表示a到z間的所有字符。
 
If You want "-'' itself to be a member of a class, put it at the start or end of the list, or escape it with a backslash. If You want ']' you may place it at the start of list or escape it with a backslash.
如果你要匹配“-”,你要把它放在列表的開始或者結束,或者用“/”轉義。
如果你要匹配“]”,你要把它放在列表的開始,或者用“/”轉義。
 
 
Examples:
 [-az]      matchs 'a', 'z' and '-'
 
 [az-]      matchs 'a', 'z' and '-'
 [a/-z]     matchs 'a', 'z' and '-'
 [a-z]      matchs all twenty six small characters from 'a' to 'z'
 [/n-/x0D] matchs any of #10,#11,#12,#13.
 [/d-t]     matchs any digit, '-' or 't'.
 []-a]      matchs any char from ']'..'a'.
 
 
Metacharacters
元字符
 
Metacharacters are special characters which are the essence of Regular Expressions. There are different types of metacharacters, described below.
元字符是正在表達式的本質,它是一類特殊的字符,下面展示了不同類型的元字符:
 
Metacharacters - line separators
元字符 – 行分隔符
 
 ^      start of line。表示一行的開頭
 $      end of line。表示一行的結束
 /A     start of text。表示文本的開始
 /Z     end of text。表示文本的結束
 .      any character in line。匹配任意一個字符
 
Examples:
 ^foobar     matchs string 'foobar' only if it's at the beginning of line
 foobar$     matchs string 'foobar' only if it's at the end of line
 ^foobar$    matchs string 'foobar' only if it's the only string in line
 
 foob.r      matchs strings like 'foobar', 'foobbr', 'foob1r' and so on
 
The "^" metacharacter by default is only guaranteed to match at the beginning of the input string/text, the "$" metacharacter only at the end. Embedded line separators will not be matched by "^'' or "$''.
當嵌入行分割符後,"^"或"$''就不在表示原來的意思。
 
You may, however, wish to treat a string as a multi-line buffer, such that the "^'' will match after any line separator within the string, and "$'' will match before any line separator. You can do this by switching On the modifier /m.
但是你可能處理多行文本,但是這樣"^"或"$''就只會匹配行分隔符後的開頭或者結束。這時,你可以啓用修改符/m。
發佈了20 篇原創文章 · 獲贊 4 · 訪問量 12萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章