在Java正則表達式的相關類Matcher中,有如下幾個方法:
- int groupCount()
- String group(int group)
- int start(int group)
- int end(int group)
- String group(String name)
- int start(String name)
- int end(String name)
分組group的概念
首先先來看一段代碼,理解一下正則表達式中分組的概念
demo1
String text = "John writes about this, and John writes about that," + " and John writes about everything. ";
String patternString1 = "(John)";
Pattern pattern = Pattern.compile(patternString1);
Matcher matcher = pattern.matcher(text);
System.out.println("groupCount is -->" + matcher.groupCount());
while (matcher.find()) {
System.out.println("found: " + matcher.group(1));
}
輸出結果爲
groupCount is –>1
found: John
found: John
found: John
demo2
String text = "John writes about this, and John writes about that," + " and John writes about everything. ";
String patternString1 = "John";
Pattern pattern = Pattern.compile(patternString1);
Matcher matcher = pattern.matcher(text);
System.out.println("groupCount is -->" + matcher.groupCount());
while (matcher.find()) {
System.out.println("found: " + matcher.group(1));
}
輸出結果爲:
groupCount is –>0
Exception in thread “main” java.lang.IndexOutOfBoundsException: No group 1
上面兩個例子唯一的區別在於patternString1的值不同,具體表現正則表達式一個帶有括號,一個不帶括號.因此,我們也可以簡單的理解爲:
正則表達式中以’()’標記的子表達式所匹配的內容就是一個分組(group).
現在我們繼續看一個例子
demo3
String text = "John writes about this, and John writes about that," + " and John writes about everything. ";
String patternString1 = "(?:John)";
Pattern pattern = Pattern.compile(patternString1);
Matcher matcher = pattern.matcher(text);
System.out.println("groupCount is -->" + matcher.groupCount());
while (matcher.find()) {
System.out.println("found: " + matcher.group(1));
}
輸出結果:
groupCount is –>0
Exception in thread “main” java.lang.IndexOutOfBoundsException: No group 1
從demo3中可以看到,類似於(?:pattern)格式的子表達式不能算是一個分組.
因此分組的概念我們總結如下:
1. 正則表達式中以’()’標記的子表達式所匹配的內容就是一個分組(group).
2. 類似於(?:pattern)格式的子表達式不能算是一個分組
分組索引 group number
還是從demo開始
demo4
String text = "John writes about this, and John Doe writes about that,"
+ " and John Wayne writes about everything.";
String patternString1 = "(John) (.+?) ";
Pattern pattern = Pattern.compile(patternString1);
Matcher matcher = pattern.matcher(text);
matcher.find();//匹配字符串,匹配到的字符串可以在任何位置
int start = matcher.start();//返回當前匹配到的字符串在原目標字符串中的位置
int end = matcher.end();//返回當前匹配的字符串的最後一個字符在原目標字符串中的索引位置
System.out.println("found group: group(0) is '" + matcher.group(0));
System.out.println("found group: group(1) is '" + matcher.group(1) + "',group(2) is '" + matcher.group(2)+"'");
輸出結果爲:
found group: group(0) is ‘John writes
found group: group(1) is ‘John’,group(2) is ‘writes’
從輸出結果可以看出,當正則表達式包含多個group時,也就是含有多個’(pattern)’格式的子表達式時,它的分組索引(group number)是從1開始的,而group(0)代表了整個匹配的字符串.
爲了便於理解具體的分組以及分組編號的概念,請參考下圖
通過上面的內容,我們就可以完整理解group(int group)函數的使用.總結爲一下幾點:
- 類似於(pattern)格式((?:pattern)除外)的正則子表達式就代表的一個分組
- 分組索引是從1開始的,0代表正則表達式匹配的整個字符串,group(i)代表第i組匹配的內容
- groupCount() 函數返回當前正則表達式中分組的個數
好了,現在來看int start(int group)和int end(int group)兩個函數
首先呢,先回顧一下如下兩個函數:
1. int start() 返回當前匹配到的字符串在原目標字符串中的位置
2. int end() 返回當前匹配的字符串的最後一個字符在原目標字符串中的索引位置.
那麼加上int 類型的參數group後,其實就是返回指定分組的開始索引與結束索 具體如下例:demo5
String text = "John writes about this, and John Doe writes about that,"
+ " and John Wayne writes about everything.";
String patternString1 = "(John) (.+?) ";
Pattern pattern = Pattern.compile(patternString1);
Matcher matcher = pattern.matcher(text);
matcher.find();//匹配字符串,匹配到的字符串可以在任何位置
int start = matcher.start();//返回當前匹配到的字符串在原目標字符串中的位置
System.out.println(start);//0
int end = matcher.end();//返回當前匹配的字符串的最後一個字符在原目標字符串中的索引位置
System.out.println(end);//12
start = matcher.start(1);//第一個分組匹配的內容,也就是John開始的索引位置,0
System.out.println(start);//0
start = matcher.start(2);//第一個分組匹配的內容,也就是writes開始的索引位置,5
System.out.println(start);//5
end = matcher.end(1);//第一個分組匹配的內容,也就是John結束的索引位置,4
System.out.println(end);//4
end = matcher.end(2);//第二個分組匹配的內容,也就是writes開始的索引位置,12
System.out.println(end);//12
start = matcher.start(3);//Exception in thread "main" java.lang.IndexOutOfBoundsException: No group 3
注意最後一句,當你索引大於正則表達式中實際存在的索引數量,也就是groupCount()返回值是,會拋出異常,所以在使用時記得處理這一點.
綜上所述,可總結如下
- int start(int group) 返回當前分組匹配到的字符串在原目標字符串中的位置
- int end(int group) 返回當前分組匹配的字符串的最後一個字符在原目標字符串中的索引位置.
最後呢,就是如下幾個函數String group(String name),int start(String name)和int end(String name)還沒有研究明白,等待後續補充.
參考:
http://tutorials.jenkov.com/java-regex/matcher.html#group-method
http://stackoverflow.com/questions/16517689/confused-about-matcher-group-in-java-regex