java 正則表達式

一.java.lang.String對正則表達式的應用

public boolean matches(String regex)
public String replaceFirst(String regex,String replacement)
public String replaceAll(String regex,String replacement)
public String[] split(String regex,int limit)
public String[] split(String regex)

例子:

String regex = "abc";
String input = "abc";		
boolean b = input.matches(input);
System.out.println(b);//true


String regex = "a";
String input = "ayatem";
System.out.println(input.replaceFirst(regex, "s"));//syatem
System.out.println(input.replaceAll(regex, "s"));//syatem
//注意與replace的區別。replaceAll支持正則表達式,因此會對參數進行解析(兩個參數均是),
如replaceAll("\\d", "*"),而replace則不會,replace("\\d","*")就是替換"\\d"的字符串,而不會解析爲正則。


String regex = ":";
String input = "1:2:3:4";
System.out.println(Arrays.toString(input.split(regex)));// [1, 2, 3, 4]
System.out.println(Arrays.toString(input.split(regex, 3)));//[1, 2, 3:4]
System.out.println(Arrays.toString(input.split(regex, 2)));//[1, 2:3:4]

二.正則表達式的基本語法

1.Characters (字符:匹配單個字符)
x 		The character x 
\\ 		The backslash character 
\0n 		The character with octal value 0n (0 <= n <= 7) 
\0nn 		The character with octal value 0nn (0 <= n <= 7) 
\0mnn 		The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7) 
\xhh 		The character with hexadecimal value 0xhh 
\uhhhh 		The character with hexadecimal value 0xhhhh 
\t The 		tab character ('\u0009') 
\n The 		newline (line feed) character ('\u000A') 
\r The 		carriage-return character ('\u000D') 
\f The 		form-feed character ('\u000C') 
\a The 		alert (bell) character ('\u0007') 
\e The 		escape character ('\u001B') 
\cx The 	control character corresponding to x 
  
2.Character classes (字符範圍:匹配單個字符)
[abc] 		a, b, or c (simple class) 
[^abc] 		Any character except a, b, or c (negation) 
[a-zA-Z] 	a through z or A through Z, inclusive (range) 
[a-d[m-p]]	a through d, or m through p: [a-dm-p] (union) 
[a-z&&[def]] 	d, e, or f (intersection) 
[a-z&&[^bc]] 	a through z, except for b and c: [ad-z] (subtraction) 
[a-z&&[^m-p]] 	a through z, and not m through p: [a-lq-z](subtraction) 
  
3.Predefined character classes (預定義表達式,簡化字符範圍)
. 		Any character (may or may not match line terminators) 
\d 		A digit: [0-9] 
\D 		A non-digit: [^0-9] 
\s 		A whitespace character: [ \t\n\x0B\f\r] 
\S 		A non-whitespace character: [^\s] 
\w 		A word character: [a-zA-Z_0-9] 
\W 		A non-word character: [^\w] 
  
4.Boundary matchers (邊界匹配)
^ 		The beginning of a line 
$ 		The end of a line 
\b 		A word boundary 
\B 		A non-word boundary 
\A 		The beginning of the input 
\G 		The end of the previous match 
\Z 		The end of the input but for the final terminator, if any 
\z 		The end of the input 

5.quantifiers (量詞)
X? 		X,once or not at all 
X* 		X, zero or more times 
X+ 		X, one or more times 
X{n} 		X, exactly n times 
X{n,} 		X, at least n times 
X{n,m} 		X, at least n but not more than m times 

6.Logical operators (邏輯運算)
XY 		X followed by Y 
X|Y 		Either X or Y 
(X) 		X, as a capturing group 

7.Back references (回溯引用)
\n 		Whatever the nth capturing group matched 
\k<name> 	Whatever the named-capturing group "name" matched 

三.Class Pattern

java.util.regex.Pattern   
    其對象表示通過編譯的正則式,利用該類對象可以與任意字符串進行模式匹配  
構造器  
    Pattern類的構造器是private  
聲明  
    public final class Pattern extends Object implements Serializable  
創建Pattern的靜態工廠  
    public static Pattern compile(String regex)  
        將指定正則式編譯成Pattern對象返回  
    public static Pattern compile(String regex,int flags)  
        將指定正則式按照指定標誌編譯成Pattern對象返回
    public static final int CASE_INSENSITIVE  
    	將啓動對ASCII字符不區分大小寫匹配  
    public static final int UNICODE_CASE  
    	將啓動Unicode字符不區分大小寫匹配  
    public static final int DOTALL  
    	將啓動dotall模式,該模式下,"."將表示任意字符,包括回車符 
   還有以下幾種:
    public static final int COMMENTS 
    public static final int LITERAL  
    public static final int MULTILINE   
    public static final int UNICODE_CASE 

例子:

//獲取Pattern對象
String regex = "//d+";
Pattern pattern = Pattern.compile(regex);

String regex = "//d+";
//int flags = Pattern.CANON_EQ;
int flags = Pattern.CANON_EQ | Pattern.CASE_INSENSITIVE;
Pattern pattern = Pattern.compile(regex, flags);

四.Class Matcher

java.util.regex.Matcher 匹配器   
聲明    
    public final class Matcher extends Object implements MatchResult
常用方法:
Study methods review the input string and return a boolean indicating whether or not the pattern is found.
    	public boolean lookingAt(): Attempts to match the input sequence, starting at the beginning of the region, against the pattern.
    	public boolean find(): Attempts to find the next subsequence of the input sequence that matches the pattern.
    	public boolean find(int start): Resets this matcher and then attempts to find the next subsequence of the input sequence that matches the pattern, starting at the specified index.
    	public boolean matches(): Attempts to match the entire region against the pattern.
Index methods provide useful index values that show precisely where the match was found in the input string:
    	public int start(): Returns the start index of the previous match.
    	public int start(int group): Returns the start index of the subsequence captured by the given group during the previous match operation.
    	public int end(): Returns the offset after the last character matched.
    	public int end(int group): Returns the offset after the last character of the subsequence captured by the given group during the previous match operation.

例子:

String regex = "w(el)(come)";
String input = "Ladies and Gentleman, welcome to China, welcome to Shangdong";

Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
/**
 * matches:整個匹配,只有整個字符序列完全匹配成功,才返回True,否則返回False。但如果前部分匹配成功,將移動下次匹配的位置。
 * lookingAt:部分匹配,總是從第一個字符進行匹配,匹配成功了不再繼續匹配,匹配失敗了,也不繼續匹配。
 * find:部分匹配,從當前位置開始匹配,找到一個匹配的子串,將移動下次匹配的位置。
 * reset:給當前的Matcher對象配上個新的目標,目標是就該方法的參數;如果不給參數,
 * reset會把Matcher設到當前字符串的開始處。
 */
System.out.println(matcher.lookingAt());// false
System.out.println(matcher.matches());// false

System.out.println(matcher.find());//true
System.out.println("text: " + matcher.group() + " start index: " + matcher.start() + " end index:" + matcher.end());
// "text: "+matcher.group()+" start index: "+matcher.start()+" end index:"+matcher.end()
System.out.println(matcher.find());//true
System.out.println("text: " + matcher.group() + " start index: " + matcher.start() + " end index:" + matcher.end());
// "text: "+matcher.group()+" start index: "+matcher.start()+" end index:"+matcher.end()
matcher.reset();//重置匹配位置
System.out.println(matcher.find());//true
System.out.println("text: " + matcher.group() + " start index: " + matcher.start() + " end index:" + matcher.end());
// "text: "+matcher.group()+" start index: "+matcher.start()+" end index:"+matcher.end()

3.組的概念
String regex = "w(el)(come)";
String input = "Ladies and Gentleman, welcome to China, welcome to Shangdong";

Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);

int groupCount = matcher.groupCount();
System.out.println(groupCount);// 2 每個()即位一個組。group(0)爲整個regex

while (matcher.find()) {
	System.out.println(matcher.group(0));// welcome
	System.out.println(matcher.group(1));// el
	 System.out.println(matcher.group(2));// come
	 System.out.println(matcher.group(3));// 運行時錯誤
}

----------------------------------------2017/02/11 更新-------------------------------------

match,find區別

1.matches全局匹配 find部分匹配

String str = "I have an apple";
String regex = "\\w+";
matcher.matches();//false 
matcher.find();//true 

2.find 從當前位置開始匹配,找到一個匹配的子串,將移動下次匹配的位置

String str = "I have an apple";
String regex = "\\w+";
while(matcher.find()){
	System.out.print(matcher.group()); // I hava an apple
}

整個字符序列完全匹配成功,才返回True,否則返回False。但如果前部分匹配成功,將移動下次匹配的位置

String str = "I have an apple";
String regex = "\\w+";
System.out.println(matcher.matches()); //false
while(matcher.find()){
	System.out.println(matcher.group());// have an apple  matches前部分匹配成功沒有輸出I
}

group

下例中regex有4組,加上自己共5組

String str = "date:2015-3-2 14:35";
String regex = "^.*(\\d)-(\\d+)-(\\d+) (\\d+)$";
if(matcher.matches()){
	System.out.println(matcher.group(0));//date:2015-3-2 14:35
	System.out.println(matcher.group(1));//5  應該想要2015,待解決
	System.out.println(matcher.group(2));//3
	System.out.println(matcher.group(3));//2
	System.out.println(matcher.group(4));//14
}

命名分組(Java 7 新特性)

下面這個例子效果等同於上例,但更加容易獲取匹配到的字串

String str = "date:2015-3-2 14:35";
String regex = "^.*(?<year>\\d{4})-(?<month>\\d+)-(?<day>\\d+) (?<hour>\\d+).*$";
if(matcher.matches()){
	System.out.println(matcher.group(0));
	System.out.println(matcher.group("year")); //2015 解決
	System.out.println(matcher.group("month"));
	System.out.println(matcher.group("day"));
	System.out.println(matcher.group("hour"));
}

匹配模式

基本照搬:http://blog.csdn.net/chs_jdmdr/article/details/46885421

1、Greediness(貪婪型): 最大匹配

X?、X*、X+、X{n,}都是最大匹配。例如你要用“<.+>”去匹配“a<tr>aava </tr>abb”,也許你所期待的結果是想匹配“<tr>”,但是實際結果卻會匹配到“<tr>aava </tr>”。這是爲什麼呢?下面我們跟蹤下最大匹配的匹配過程。
①“<”匹配字符串的“<”。②“.+”匹配字符串的“tr>aava </tr>ab”,在進行最大匹配時,它把兩個“>”都匹配了,它匹配了所有字符,直到文本的最後字符“b” ③這時,發現不能成功匹配“>”,開始按原路回退,用“a”與“>”匹配,直到“ab”前面的“>”匹配成功。
這就是最大匹配,我們匹配的時候應該看最後面能匹配到哪。

例子:

String str = "a<tr>aava</tr>abb";
String regex = "<.+>";
if(matcher.find()){
	System.out.println(matcher.group(0));//<tr>aava</tr>
}

2、Reluctant(Laziness)(勉強型):最小匹配 

X?、X*、X+、X{n,}都是最大匹配。好,加個?就成了Laziness匹配。例如X??、X*?、X+?、X{n,}?都是最小匹配,其實X{n,m}?和X{n }?有些多餘。
最小匹配意味者,.+? 匹配一個字符後,馬上試一試>的匹配可能,失敗了,則.+? 再匹配一個字符,再馬上試一試>的匹配可能。JDK文檔中Greedy 和 Reluctant,它是以eat一口來隱喻的,所以翻譯成貪吃和(勉強的)厭食最貼切了。不過我喜歡最大匹配、最小匹配的說法。
例子:

String str = "a<tr>aava</tr>abb";
String regex = "<.+?>";
while(matcher.find()){
	System.out.print(matcher.group(0));}//<tr> </tr>

3、Possessive(佔有型):完全匹配 

與最大匹配不同,還有一種匹配形式:X?+、X*+、X++、X{n,}+等,成爲完全匹配。它和最大匹配一樣,一直匹配所有的字符,直到文本的最後,但它不由原路返回。也就是說,一口匹配,搞不定就算了。

String test = "a<tr>aava</tr>abb ";
String test2 = "<tr>";
String reg = "<.++>";
String reg2 = "<tr>";
System.out.println(test.replaceAll(reg, "###"));//a<tr>aava</tr>abb
System.out.println(test2.replaceAll(reg2, "###"));//###






發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章