LeetCode28. Implement strStr() 字符串匹配

文章目錄

28.字符串匹配

28. Implement strStr()

28.字符串匹配

28. Implement strStr()

Return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Example 1:

Input: haystack = "hello", needle = "ll"
Output: 2

Example 2:

Input: haystack = "aaaaa", needle = "bba"
Output: -1

Clarification:

What should we return when needle is an empty string? This is a great question to ask during an interview.

For the purpose of this problem, we will return 0 when needle is an empty string. This is consistent to C’s strstr() and Java’s indexOf().

暴力破解

這道題本質上就是字符串的匹配問題。可以使用暴力破解的方式進行。首先先確立特殊情況。

當模板串needle和要文本串haystack都爲空的時候，返回 0
當模版串爲空的時候，返回0
當模板串比文本串要長的時候返回 -1

以上是首先要確定的三種特殊情況。

接下來進行暴力破解，我們可以知道，需要每個文本串都可以產生n-m個長度與模版串長度相同的子串，其中n爲文本串的長度，m爲模板串的長度。之後將模板串和文本串的字串進行逐一匹配。若匹配成功，則直接返回該匹配成功的子串的首字母在文本串的位置。這種方式需要的時間複雜度爲O((n-m+1)m)。

class Solution {
    public int strStr(String haystack, String needle) {
        if((isNullOrEmpty(haystack) && isNullOrEmpty(needle)) || isNullOrEmpty(needle)) {
            return 0;
        }
        if(haystack.length() < needle.length()){
            return -1;
        }
        
        int n = haystack.length();
        int m = needle.length();
        for(int i = 0; i < n-m +1 ;i++){
            if(compareTwoString(haystack.substring(i,i+m),needle)){
                return i;
            }
        }
        return -1;
    }
    public boolean isNullOrEmpty(String s){
        return s == null || s.length() == 0;
    }
    
    public boolean compareTwoString(String s1, String s2){
        int n = s2.length() ;
        for(int i = 0; i < n; i++){
            if(s1.charAt(i) != s2.charAt(i)){
                return false;
            }
        }
        return true;
    }
    
}

Rabin-Karp算法

Rabin-Karp算法是暴力算法的改進。RK算法的思想是將模板串看成爲一個數值，然後使用同樣的規則計算出同等長度文本串的子串的數值。如果數值不等，那麼就表示這兩個字符串不可能是相等的字符串。如果相等，則說明這兩個字符串有可能相等。對有可能相等的字符串的字符進行一一匹配。

這種方法對模板串進行了預先處理，能夠減少了字符串匹配的次數。但是最糟糕的情況下就和暴力破解一樣的。同時對模板進行處理需要O(m)的時間，而對於文本串值的計算需要O(m-n)的時間。

如何用數值來代表字符串，這裏採用了秦九韶算法，其中在第13行傳入的d表示進制，如256表示256進制。而這樣的處理方法會出現數值過大的情況，進而導致不方便操作。因此通過mod q的方式使數值變小。這種處理方式和hash的處理方式相似。當然也可以通過其他方式來計算這個數值。

class Solution {
    public int strStr(String haystack, String needle) {
        if((isNullOrEmpty(haystack) && isNullOrEmpty(needle)) || isNullOrEmpty(needle)) {
            return 0;
        }
        if(haystack.length() < needle.length()){
            return -1;
        }

        return RKM(haystack,needle,256,23);
    }
    
    public int RKM(String haystack,String needle,int d,int  q){
        int n = haystack.length();
        int m = needle.length();
        int h = 1;
        int p = 0;
        int t = 0;
        
        for(int i = 0; i < m -1;i++){
            h = (h*d)%q;
        }
        
        for(int i = 0; i < m;i++){
            p = (d*p + Integer.valueOf(needle.charAt(i))) % q;
            t = (d*t + Integer.valueOf(haystack.charAt(i))) % q;
        }
        
        for(int s = 0; s < n -m + 1; s++){
            if(p == t){
                if(compareTwoString(haystack.substring(s,s+m),needle)){
                    return s;
                }
            }
            
            if(s < n-m ){
                t = (d * (t-haystack.charAt(s)*h) + haystack.charAt(s+m))% q;
                if(t < 0){
                    t = t + q;
                }
            }
        }
        return -1;
    }
    
    public boolean isNullOrEmpty(String s){
        return s == null || s.length() == 0;
    }
    
    public boolean compareTwoString(String s1, String s2){
        int n = s2.length() ;
        for(int i = 0; i < n; i++){
            if(s1.charAt(i) != s2.charAt(i)){
                return false;
            }
        }
        return true;
    }
}

KMP算法

KMP算法在更大程度上減少無效的匹配。舉個例子如模板串爲"abab"，文本串爲“abacabab“。匹配abac的時候，我們很容易知道是不能匹配的。但是對於第二個a和第一個b。我們也很容易知道同樣是無效的。KMP就是通過一個next數組，通過改變偏移量來直接跳過已經知道是無效的匹配。

如abab的next數組爲[0,0,1,2]。而next表示的是下一個匹配的字符的下標。同樣是上面的那一個例子，我們匹配C失敗的時候，但是我們知道上一次匹配a是成功的，因此不再回退到最原始的情況。而是回到上一次匹配a成功的狀態，通過next數組我們回到了下標爲1的狀態，即匹配b。b是不能匹配c的，同理，我們通過next數組回到了0的狀態。這個時候纔回到了最原始的狀態。所以KMP是不直接回到原始狀態，而是回到上一次匹配成功的狀態，儘可能減少回退。

怎麼求next數組，這個也很簡單。只要將模板串和其自己進行比較就好了。next[i]換一種角度理解就是以i結尾的字符串P1的真後綴P2的最長前綴長度。首個字母的必然爲0。以abac爲例。

計算next[1]：P1爲 “ab”, 真後綴爲"b"，因此真後綴的最長前綴爲0。（這裏只是選取了最長前綴的一個後綴）

計算next[2]：P1爲 “aba”, 真後綴爲"a"，因此真後綴的最長前綴爲1。

計算next[3]：P1爲 “abab”, 真後綴爲"ab"，因此真後綴的最長前綴爲2。

KMP的本質思想和自動機的思想相似。根據當前的狀態判斷下一步的狀態是怎麼樣子的。而next數組換一種理解就是。如果當前狀態不匹配的話，就跳回之前匹配過的最好狀態，正如上面所說的，當c不匹配b的時候，就回到上一次匹配的狀態。

KMP算法在計算next數組的時候，需要O(m)的時間，匹配需要O(n)的時間，所以總的時間複雜度爲O(m+n)。空間複雜度爲O(m)

class Solution {
    public int strStr(String haystack, String needle) {
        if((isNullOrEmpty(haystack) && isNullOrEmpty(needle)) || isNullOrEmpty(needle)) {
            return 0;
        }
        if(haystack.length() < needle.length()){
            return -1;
        }
        int n = haystack.length();
        int m = needle.length();
        int[] next = getNext(needle);
        int q = 0;
        for(int i = 0; i < n ;i++){
            while(q > 0 && needle.charAt(q) != haystack.charAt(i)){
                q = next[q -1];
            }
            if(needle.charAt(q) == haystack.charAt(i)){
                q++;
            }
            if( q == m){
                return i - m +1;
            }
            // q = next[q];
        }
        return -1;
    }
    
    public int[] getNext(String needle){
        int m = needle.length();
        int[] next = new int[m];
        next[0] = 0;
        int k = 0;
        for(int i = 1; i < m ; i++){
            while(k > 0 && needle.charAt(k) != needle.charAt(i)){
                k = next[k - 1];
            }
            if(needle.charAt(k)== needle.charAt(i)){
                k ++;
            }
            next[i] = k;
        }
        return next;
    }
    
    public boolean isNullOrEmpty(String s){
        return s == null || s.length() == 0;
    }
}

BM算法（Boyer-Moore）

BM算法在一定程度上是KMP算法的優化。KMP算法在每一次失敗的時候都會回到上一次成功的狀態。但是如果模板串當中不存在着要匹配的字符，那麼KMP就會通過幾次跳躍跳回到模板串的第一個字符進行重新進行匹配。而BM算法最大的特點就是獲得更大的跳轉，而不需要像KMP一樣進行多次跳轉。

Sunday算法

Sunday算法和KMP算法一樣，都是通過計算偏移來減少匹配的次數。不同的是如何進行偏移。我們可以將文本串位置固定，通過模板串的方式來理解這個算法。

Sunday算法主要關注的是參加匹配的最末位的下一位字符。通過這個字符來決定如何偏移。那麼會有兩種情況：

這個字符已經存在在模板串當中，那麼就將模板串的最右端該字符與文本串的該字符對齊。即移動的位數爲模板串的長度 - 該字符最右出現的位置
如果該字符不存在模板串中，那麼則直接跳過，移動的位數爲模板串長度 + 1

那麼就需要使用一個next數組來計算匹配上模板串的字符的時候，需要移動的位數。

根據上面的兩種情況，可以參考13~18行代碼，不存在的模板串的字符就設置爲模板串的長度 + 1，存在的就爲模板串的長度 - 該字符最右出現的位置。

Sunday 算法計算next數組的時間爲O(m + 字符集的長度)，最壞的情況下就變爲了暴力破解，時間複雜度爲O(mn)，平均時間複雜度爲O(n)，空間複雜度爲O(字符集的長度)

class Solution {
    public int strStr(String haystack, String needle) {
        if((isNullOrEmpty(haystack) && isNullOrEmpty(needle)) || isNullOrEmpty(needle)) {
            return 0;
        }
        if(haystack.length() < needle.length()){
            return -1;
        }
        int n = haystack.length();
        int m = needle.length();
        int[] next = new int[256];
        // calculate next array
        for(int i = 0; i < next.length - 1;i++){
            next[i] = m + 1;
        }
        for(int i = 0; i < m; i++){
            next[needle.charAt(i)] = m - i;
        }
        
        int s = 0;//haystack position
        int j = 0;//needle position
        while( s <= n -m){
            j = 0;
            while( s + j < n && j < m && haystack.charAt(s+j) == needle.charAt(j)){
                j++;
                if(j == m){
                    return s;
                }
            }
            int max = s+m < n ? s+m : n - 1;
            s += next[haystack.charAt(max)];
        }
        return -1;
    }
    
    public boolean isNullOrEmpty(String s){
        return s == null || s.length() == 0;
    }
}

LeetCode28. Implement strStr() 字符串匹配

文章目錄

28.字符串匹配

28. Implement strStr()

暴力破解

Rabin-Karp算法

KMP算法

BM算法（Boyer-Moore）

Sunday算法

AI 畫圖真刺激，手把手教你如何用 ComfyUI 來畫出刺激的圖

公司剛入職了一名 Java 中級開發，短短 4 行代碼居然湊齊了 3 個 bug！我哭了~~

智影AI故事轉視頻創作神器！快速開啓AI繪畫小說推文之旅

數據展示動態（跑分）顯示

公衆號5月C#/.NET熱文一覽

git 下載大陸鏡像地址

LeetCode4.Median of Two Sorted Arrays 求有序數組的中位數

LeetCode 9. Palindrome Number 迴文數字

LeetCode 11. Container With Most Water 容器最大水容量

LeetCode 18. 4Sum四數之和

LeetCode 25. Reverse Nodes in k-Group 以組的形式反轉節點

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結