LeetCode 28. Implement strStr() Rabin-Karp算法

Description:
Implement strStr().

Return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Example 1:

Input: haystack = “hello”, needle = “ll”
Output: 2

Clarification:

What should we return when needle is an empty string? This is a great question to ask during an interview.

For the purpose of this problem, we will return 0 when needle is an empty string. This is consistent to C’s strstr() and Java’s indexOf().

Solution:

思路一:
本題就是字符串逐個比較,如果用String.startWith()方法 應該還會方便點。

public int strStr(String haystack, String needle) {
        if(needle.length() == 0) return 0;
        if(needle.length() > haystack.length()) return -1;
        int len1 = haystack.length();
        int len2 = needle.length();
        int count = 0;
        for(int i = 0; i < len1; ++i){
            for(int j = 0; j < len2; ++j){
                if(i >= len1) return -1;
                if(j == 0) count = i;
                if(haystack.charAt(i) == needle.charAt(j)){
                    if(j == len2 - 1) return count;
                    ++i;
                    continue;
                }
                i -= j;
                break;
            }
        }
        return -1;
    }

思路二:
此題可以用Rabin-Karp算法:維基百科Rabin-Karp算法
該算法的僞代碼如下

function RabinKarp(string s[1..n], string pattern[1..m])
    hpattern := hash(pattern[1..m]);
    for i from 1 to n-m+1
        hs := hash(s[i..i+m-1])
        if hs = hpattern
            if s[i..i+m-1] = pattern[1..m]
                return i
    return not found

特別注意的是:只有當兩個字符串hash值相同時,纔會比較兩個字符串值是否相同(java開發中,如果要比價兩個元素一致,如果直接用equals方法,可能會造成效率低下,因爲很多對象equals都重寫了,String類就是如此。但是用hashCode只需要比較一下數字就行,執行速度特別快,如果hashcode不一致,也沒必要equals,如果一樣可以再次比較equals,確保是同一個對象。)

對於時間複雜度,維基百科如是說:

Lines 2, 4, and 6 each require O(m) time. However, line 2 is only executed once, and line 6 is only executed if the hash values match, which is unlikely to happen more than a few times. Line 5 is executed O(n) times, but each comparison only requires constant time, so its impact is O(n). The issue is line 4.
所以通常來說,主要是第四行的代碼,時間複雜度是O(n)

通常來說,計算hash時間複雜度是O(mn),但是由於採用滾動hash計算,時間複雜度是O(n),解析如下:(看不懂可以只記住結論~)

Naively computing the hash value for the substring s[i+1…i+m] requires O(m) time because each character is examined. Since the hash computation is done on each loop, the algorithm with a naïve hash computation requires O(mn) time, the same complexity as a straightforward string matching algorithms. For speed, the hash must be computed in constant time. The trick is the variable hs already contains the previous hash value of s[i…i+m-1]. If that value can be used to compute the next hash value in constant time, then computing successive hash values will be fast.
The trick can be exploited using a rolling hash. A rolling hash is a hash function specially designed to enable this operation. A trivial (but not very good) rolling hash function just adds the values of each character in the substring. This rolling hash formula can compute the next hash value from the previous value in constant time:

滾動計算hash的思路如下:

s[i+1..i+m] = s[i..i+m-1] - s[i] + s[i+m]

舉個例子,滑動窗口大小是2,字符串是"abcde",“bc” = “ab” - “a” + "c"
例如:

 [(104 × 256 ) % 101  + 105] % 101  =  65
 (ASCII of 'h' is 104 and of 'i' is 105)
 // ASCII a = 97, b = 98, r = 114. 
hash("abr") =  [ ( [ ( [ (97 × 256) % 101 + 98 ] % 101 ) × 256 ] % 101 ) + 114 ] % 101 = 4
//old hash -ve avoider old 'a' left base offset  base shift new 'a' prime modulus
hash("bra") =  [ ( 4 + 101 - 97 * [(256%101)*256] % 101 ) * 256 + 97 ] % 101 = 30

如果理解不了,可以思考下我們碰到101,該怎麼計算他的值?
先拿到1,然後拿到0 (1 x 10 + 0)= 10,再然後拿到1, (10 x 10 + 1 = 101 )

所以本題最終解題代碼如下:

這裏對函數hashString進行了重載,較少參數的是直接生成string的hash值,較多參數則是滑動hash

class Solution {
    public int strStr(String haystack, String needle) {
        if(needle.length() > haystack.length()) return -1;
        if(needle.equals(haystack.substring(0, needle.length()))) return 0;
        int length = needle.length();
        int res = hashString(haystack, length);
        int ans_hash = hashString(needle, needle.length());
        for(int i = 0; i < haystack.length() - length ; ++i){
            res = hashString(haystack, i + length, length, res);
            if(res == ans_hash) {
                if(needle.equals(haystack.substring(i + 1, i + 1 + length)) ) {
                    return i + 1;
                }
            }
        }
        return -1;
    }
    
    public int hashString(String s, int index, int length, int res){
        int base = 256;
        int temp = s.charAt(index - length);
        for(int i = 0; i < length - 1; ++i) {
            temp = temp * base % 101;
        }
        res = (res + 101 - temp) * base + s.charAt(index);
        return res % 101;
    }

    public int hashString(String s, int length){
        int base = 256;
        int result = 0;
        for(int i = 0; i < length; ++i){
            result = ((result * base) % 101 + s.charAt(i)) % 101;
        }
        return result;
    }
}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章