Implement wildcard pattern matching with support for '?'
and '*'
.
'?' Matches any single character.
'*' Matches any sequence of characters (including the empty sequence).
The matching should cover the entire input string (not partial).
The function prototype should be:
bool isMatch(const char *s, const char *p)
Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "*") → true
isMatch("aa", "a*") → true
isMatch("ab", "?*") → true
isMatch("aab", "c*a*b") → false
這題的難點在於對星號的處理。便於理解,首先給出容易理解的遞歸解法。讓i和j分別爲兩個字符串的頭指針。
1)如果s[i] = p[j]或p[j] = ‘?’,共同推進;
2)如果p[j] = '*',尋找是否存在k >= i,使得k之後的部分和j之後的部分匹配。
這裏爲了方便處理多個連續的星號的情況,我先將p預處理一遍,壓縮所有連續的星號。
// suppress the string by filtering out unnecessary '*'
private String suppress(String s) {
StringBuilder sb = new StringBuilder();
char lastChar = ' ';
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (c != '*' || c == '*' && lastChar != '*') {
sb.append(c);
lastChar = c;
}
}
return sb.toString();
}
private boolean isMatch(String s, int start1, String p, int start2) {
// exit
if (start1 == s.length() || start2 == p.length())
return start1 == s.length() && start2 == p.length();
char a = s.charAt(start1);
char b = p.charAt(start2);
if (b != '*') {
return (a == b || b == '?')
&& isMatch(s, start1 + 1, p, start2 + 1);
} else {
for (int i = start1; i <= s.length(); i++) {
if (isMatch(s, i, p, start2 + 1))
return true;
}
return false;
}
}
public static boolean isMatch2(String s, String p) {
// suppress the '*'s in p
p = suppress(p);
return isMatch(s, 0, p, 0);
}
遞歸方法太慢,我試過從兩頭一起匹配,把允許靈活匹配的空間限制得更小,但是還是不行。網上搜了下方法,發現可以用DP去解。可惜自己嘗試着寫了幾次DP也還是超時。最後想能不能把遞歸寫成iterative的版本。結果發現其實不用棧也可以寫出來,而且代碼比預想中的簡潔。
其實這裏要回溯最關鍵的地方是記錄下之前星號出現的位置starPos,以及用星號匹配的字符串的終止位置starMatchEnd(這個值是exclusive的,畢竟星號可以匹配空字符串)。
如果沒有碰到星號,跟遞歸的情況一樣,共同推進兩個指針。
如果碰到了星號,讓s儘量去匹配p裏星號後的字符。
匹配不到就回溯,讓s裏的指針i指向保存的starMatchEnd的後面並更新starMatchEnd,讓p裏的指針j指向保存的starPos後面。
public boolean isMatch(String s, String p) {
// suppress the '*'s in p
p = suppress(p);
int i = 0, j = 0;
int starPos = -1; // the last star position
int starMatchEnd = 0; // the end of the content matched by the star
// (exclusive)
while (i < s.length()) {
char a = s.charAt(i);
if (j == p.length()) {
// the last character is a star
if (p.length() > 0 && p.charAt(p.length() - 1) == '*') {
return true;
}
// backtrack
else if (starPos != -1) {
j = starPos + 1;
i = ++starMatchEnd;
} else {
return false;
}
}
char b = p.charAt(j);
// if a single match exists
if (a == b || b == '?') {
i++;
j++;
}
// the star case
else if (b == '*') {
starMatchEnd = i; // initially assume the star matches an empty string
starPos = j;
j++;
}
// no single match
else {
// can never match
if (starPos == -1) {
return false;
}
// use the star to match all the unmatched string and
// advance i
else {
j = starPos + 1;
i = ++starMatchEnd;
}
}
}
// match if p has been scanned or only a ending '*' remains
return j == p.length() || j == p.length() - 1 && p.charAt(j) == '*';
}
第二遍做這個題目發現其實可以用checkpointing的思想去理解這個題目:
當碰到星號以後,p上可以把星號當成一個checkpoint,而同時對應s上的位置也是一個checkpoint,以此保存好之前還匹配的狀態。
在對i和j做checkpoint時,我們可以保證s[0...i-1]和p[0...j-1]是match的,每次嘗試都應該在i和j之後。一旦後面出現不匹配情況,如果存在checkpoint,就可以回滾到checkpoint的地方,開始不同的嘗試。
不過這裏的checkpoint有點特殊:對於p而言,下一次嘗試永遠都是假設星號匹配若干字符後,從星號之後開始嘗試匹配,所以p上的checkpoint是固定的,一直都是最近一次出現的星號。對s而言,每一次嘗試失敗,都要求s上的checkpoint不斷前移,否則每次的嘗試都會一樣,或者說每次讓星號匹配的內容都一樣,那樣就沒有意義了。
public static boolean isMatch(String s, String p) {
int checkPoint = -1; // Checkpoint in s, which will keep move forward
// after every rollback.
int lastStarPos = -1; // Checkpoint in p, which will be fixed after
// every rollback.
int i = 0, j = 0;
while (i < s.length()) {
// Normal match.
if (j < p.length()
&& (p.charAt(j) == '?' || s.charAt(i) == p.charAt(j))) {
i++;
j++;
continue;
} else if (j < p.length() && p.charAt(j) == '*') { // Checkpoint..
lastStarPos = j++;
// s[0...i - 1] and p[0...lastStarPos - 1] (i.e., p[0...j - 1])
// can match.
checkPoint = i;
continue;
}
// Mismatch found, and roll back to the last checkpoint.
if (lastStarPos != -1) {
i = ++checkPoint; // The checkpoint in s also moves forward.
j = lastStarPos + 1; // The checkpoint in p stays at the last
// star position.
continue;
}
// If s still has more to match, but p has been completely scanned.
return false;
}
// If p still has more to match, but s has been completely scanned.
// Can only allow tailing '*' in p.
while (j < p.length()) {
if (p.charAt(j) == '*')
j++;
else
return false;
}
return true;
}