正則表達式語法講解(二)

The /A and /Z are just like "^'' and "$'', except that they won't match multiple times when the modifier /m is used, while "^'' and "$'' will match at every internal line separator.
/A and /Z的含義跟"^'' and "$''一樣,但當使用/m時,他們不能匹配多次,而此時"^'' and "$''會在每個行分割符中匹配多次。

The ".'' metacharacter by default matches any character, but if You switch Off the modifier /s, then '.' won't match embedded line separators.
".''默認匹配任意一個字符,但如果你關閉/s,".''就不會匹配嵌入的行分隔符。

TRegExpr works with line separators as recommended at www.unicode.org ( http://www.unicode.org/unicode/reports/tr18/ ):
www.unicode.org ( http://www.unicode.org/unicode/reports/tr18/ )有TregExpr與行分隔符一起工作的說明:

 "^" is at the beginning of a input string, and, if modifier /m is On, also immediately following any occurrence of /x0D/x0A or /x0A or /x0D (if You are using Unicode version of TRegExpr, then also /x2028 or  /x2029 or /x0B or /x0C or /x85). Note that there is no empty line within the sequence /x0D/x0A.
"^"位於輸入字符串的開頭,但是,如果/m是開啓的,它會立即匹配跟隨在/x0D/x0A or /x0A or /x0D後的字符串(如果你使用Unicode版本的TregExpr,那麼也可以是/x2028 or  /x2029 or /x0B or /x0C or /x85)。注意在/x0D/x0A序列中沒有空行。

"$" is at the end of a input string, and, if modifier /m is On, also immediately preceding any occurrence of  /x0D/x0A or /x0A or /x0D (if You are using Unicode version of TRegExpr, then also /x2028 or  /x2029 or /x0B or /x0C or /x85). Note that there is no empty line within the sequence /x0D/x0A.
"$"位於輸入字符串的結尾,但是,如果/m是開啓的,它會立即匹配在/x0D/x0A or /x0A or /x0D前的字符串(如果你使用Unicode版本的TregExpr,那麼也可以是/x2028 or  /x2029 or /x0B or /x0C or /x85)注意在/x0D/x0A序列中沒有空行。

"." matchs any character, but if You switch Off modifier /s then "." doesn't match /x0D/x0A and /x0A and /x0D (if You are using Unicode version of TRegExpr, then also /x2028 and  /x2029 and /x0B and /x0C and /x85).
"."匹配任意一個字符,但是,如果關閉/s,那麼"."不會匹配/x0D/x0A and /x0A and /x0D(如果你使用Unicode版本的TregExpr,那麼也不會匹配/x2028 and  /x2029 and /x0B and /x0C and /x85)
 
Note that "^.*$" (an empty line pattern) doesnot match the empty string within the sequence /x0D/x0A, but matchs the empty string within the sequence /x0A/x0D.
注意"^.*$"(空行模式)不會匹配中間有/x0D/x0A序列的空字符串,但匹配中間有/x0A/x0D的空字符串。

Multiline processing can be easely tuned for Your own purpose with help of TRegExpr properties LineSeparators and LinePairedSeparator, You can use only Unix style separators /n or only DOS/Windows style /r/n or mix them together (as described above and used by default) or define Your own line separators!
藉助於TregExpr的LineSeparators和LinePairedSeparator屬性,可以很輕鬆處理多行的情形。你可以只使用Unix風格的/n分隔符或者只使用DOS/Windows風格的/r/n或者混合使用(如上描述一樣並以默認的意義使用)或者定義你自己的行分隔符。

Metacharacters - predefined classes 
元字符 – 預定義類

  /w     an alphanumeric character (including "_")  一個阿爾發字符(包括"_")
  /W     a nonalphanumeric      非阿爾發字符
  /d     a numeric character      數字
  /D     a non-numeric          非數字
  /s     any space (same as [ /t/n/r/f])    任意空格(同[ /t/n/r/f])
  /S     a non space             非空格

 

You may use /w, /d and /s within custom character classes.
你可以使/w,/d和/s在自定義字符類中。

Examples:
  foob/dr     matchs strings like 'foob1r', ''foob6r' and so on but not 'foobar', 'foobbr' and so on
  foob[/w/s]r matchs strings like 'foobar', 'foob r', 'foobbr' and so on but not 'foob1r', 'foob=r' and so on
  foob/dr     匹配如'foob1r', ''foob6r'等字符串,除了'foobar', 'foobbr'等。
foob[/w/s]r  匹配如'foobar', 'foob r', 'foobbr'等字符串,除了'foob1r', 'foob=r'等。

TRegExpr uses properties SpaceChars and WordChars to define character classes /w, /W, /s, /S, so You can easely redefine it.
TRegExpr 使用SpaceChars and WordChars熟悉定義字符類/w, /W, /s, /S,你可以輕鬆地重定義它。

Metacharacters - word boundaries
元字符 – 單詞匹配

  /b     Match a word boundary 。     匹配單詞
  /B     Match a non-(word boundary)   匹配非單詞

{TODO    不知道怎麼翻譯哦}
A word boundary (/b) is a spot between two characters that has a /w on one side of it and a /W on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a /W.

Metacharacters - iterators
元字符 – 迭代符

Any item of a regular expression may be followed by another type of metacharacters - iterators. Using this metacharacters You can specify number of occurences of previous character, metacharacter or subexpression.
任意一個正在表達式可能跟有其他類型的元字符 - 迭代符。
使用這種元字符可以匹配指定前面字符、元字符或者子表達式出現的次數的模式,

  *      zero or more ("greedy"), similar to {0,}    出現0次或以上,同{0,}
  +      one or more ("greedy"), similar to {1,}     出現1次或以上,同{1,}
  ?      zero or one ("greedy"), similar to {0,1}     出現0次或1次,同{0,1},即要麼匹配,要麼不匹配。

{TODO    下面的翻譯有點莫名其妙,要參考下別人是怎麼翻譯的啊}

  {n}    exactly n times ("greedy")               出現n次
  {n,}   at least n times ("greedy")                至少n次
  {n,m}  at least n but not more than m times ("greedy")    n≤count≤
  *?     zero or more ("non-greedy"), similar to {0,}?    要麼出現0次,要麼出現一次非0
  +?     one or more ("non-greedy"), similar to {1,}?    要麼出現一次,要麼出現一次
  ??     zero or one ("non-greedy"), similar to {0,1}?    要麼出現0次,要麼出現一次1次
  {n}?   exactly n times ("non-greedy")                出現n次
  {n,}?  at least n times ("non-greedy")                 要麼出現n次,要麼大於n
  {n,m}? at least n but not more than m times ("non-greedy")   要麼出現一次大於n小於m,要麼不出現。    

So, digits in curly brackets of the form {n,m}, specify the minimum number of times to match the item n and the maximum m. The form {n} is equivalent to {n,n} and matches exactly n times. The form {n,} matches n or more times. There is no limit to the size of n or m, but large numbers will chew up more memory and slow down r.e. execution.
所以,{}中的數字,形如{n,m},指定最小的次數n和最大的m。
{n}形式等於{n,n},即匹配確切的n次。
{n,}形式匹配n次或更多次。
對於n或者m的大小沒有限制,但更大的數字將消耗更多的內存並降低r.e的執行速度。

If a curly bracket occurs in any other context, it is treated as a regular character.
如果{}出現在其他上下文,它被認爲一個規則字符。

Examples:
  foob.*r     matchs strings like 'foobar',  'foobalkjdflkj9r' and 'foobr'
             匹配如'foobar', 'foobalkjdflkj9r' and 'foobr'

  foob.+r     matchs strings like 'foobar', 'foobalkjdflkj9r' but not 'foobr'
             匹配如'foobar', 'foobalkjdflkj9r',除了'foobr'
  foob.?r     matchs strings like 'foobar', 'foobbr' and 'foobr' but not 'foobalkj9r'
             匹配如'foobar', 'foobbr' and 'foobr',除了'foobalkj9r'
  fooba{2}r   matchs the string 'foobaar'
             匹配'foobaar'
  fooba{2,}r  matchs strings like 'foobaar', 'foobaaar', 'foobaaaar' etc.
             匹配如'foobaar', 'foobaaar', 'foobaaaar'等
  fooba{2,3}r matchs strings like 'foobaar', or 'foobaaar'  but not 'foobaaaar'
             匹配如'foobaar', or 'foobaaar',除了'foobaaaar'

A little explanation about "greediness". "Greedy" takes as many as possible, "non-greedy" takes as few as possible. For example, 'b+' and 'b*' applied to string
'abbbbc' return 'bbbb', 'b+?' returns 'b', 'b*?' returns empty string, 'b{2,3}?' returns 'bb', 'b{2,3}' returns 'bbb'.
關於"greediness"的解釋。"Greedy"是匹配出現的最多的情況,而"non-greedy"只匹配出現最少的情況。比如,'b+' and 'b*'對於'abbbbc'分別返回'bbbb','b{2,3}?'對'abbbbc'返回'bb','b{2,3}'對'abbbbc'返回'bbb'

You can switch all iterators into "non-greedy" mode (see the modifier /g).
你可以是所有迭代符爲"non-greedy"模式(參閱修改符/g)

Metacharacters – alternatives
元字符 – 可選符

You can specify a series of alternatives for a pattern using "|'' to separate them, so that fee|fie|foe will match any of "fee'', "fie'', or "foe'' in the target string (as would f(e|i|o)e). The first alternative includes everything from the last pattern delimiter ("('', "['', or the beginning of the pattern) up to the first "|'', and the last alternative contains everything from the last "|'' to the next pattern delimiter. For this reason, it's common practice to include alternatives in parentheses, to minimize confusion about where they start and end.
你可以對一個模式使用"|''分割的一系列你指定的可選符,這樣fee|fie|foe將匹配目標串中任意的"fee'', "fie'', or "foe''(f(e|i|o)e同)。

Alternatives are tried from left to right, so the first alternative found for which the entire expression matches, is the one that is chosen. This means that alternatives are not necessarily greedy. For example: when matching foo|foot against "barefoot'', only the "foo'' part will match, as that is the first alternative tried, and it successfully matches the target string. (This might not seem important, but it is important when you are capturing matched text using parentheses.)

Also remember that "|'' is interpreted as a literal within square brackets, so if You write [fee|fie|foe] You're really only matching [feio|].

Examples:
  foo(bar|foo)  matchs strings 'foobar' or 'foofoo'.


Metacharacters – subexpressions
元字符 – 子表達式

The bracketing construct ( ... ) may also be used for define r.e. subexpressions (after parsing You can find subexpression positions, lengths and actual values in MatchPos, MatchLen and Match properties of TRegExpr, and substitute it in template strings by TRegExpr.Substitute).
(…)也用於定義正在表達式的子表達式(在解析你能找到的子表達式位置後,TRegExpr的MatchPos, MatchLen and Match屬性存儲了你找到的子表達式位置、長度和匹配,同時用TRegExpr.Substitute替代它們)

Subexpressions are numbered based on the left to right order of their opening parenthesis.
First subexpression has number '1' (whole r.e. match has number '0' - You can substitute it in TRegExpr.Substitute as '$0' or '$&').
子表達式按在括號中從左到右順序編號。
第一個子表達式爲'1'(整個正則表達式爲'0' - 你可以在TRegExpr.Substitute用'$0' or '$&'替換它)

Examples:
  (foobar){8,10}  matchs strings which contain 8, 9 or 10 instances of the 'foobar'
                匹配出現的8, 9 or 10個'foobar'
  foob([0-9]|a+)r matchs 'foob0r', 'foob1r' , 'foobar', 'foobaar', 'foobaar' etc.
                匹配'foob0r', 'foob1r' , 'foobar', 'foobaar', 'foobaar'等
   

Metacharacters - backreferences
元字符 – backreferences

Metacharacters /1 through /9 are interpreted as backreferences. /<n> matches previously matched subexpression #<n>.
元字符/1到/9被解釋爲backreferences。/<n>匹配前面匹配的子表達式#<n>。

Examples:
  (.)/1+         matchs 'aaaa' and 'cc'.            匹配'aaaa' and 'cc'
  (.+)/1+        also match 'abab' and '123123'     匹配'abab' and '123123'
  (['"]?)(/d+)/1 matchs '"13" (in double quotes), or '4' (in single quotes) or 77 (without quotes) etc
                匹配'"13"(兩個引號),或者'4'(一個引號)或者77(沒有引號)等

 

發佈了20 篇原創文章 · 獲贊 4 · 訪問量 12萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章