從LeetCode.strStr()出發理解KMP算法

最早接觸KMP算法應該是在一年前，當時買了北郵的那本數據結構與STL，由於是給通信背景的學生用的，並且教材是NLP出身的老師寫的，所以介紹了KMP算法（一般數據結構教材里根本沒有這個），但是當時短短几行描述中根本沒弄懂

從一道LeetCode談起

雙指針(暴力)解法

這是最容易想到的方法

順利AC

class Solution {
public:
    int strStr(string haystack, string needle) {
        if(needle.empty()){
            return 0;
        }
        if(haystack.empty()){
            return -1;
        }
        if(haystack.size()<needle.size()){
            return -1;
        }
        for(int i = 0; i<=haystack.size()-needle.size(); i++){
            if(haystack[i]==needle[0]){
                int flag = true;
                int j = 0;
                int _i = i;
                while(j<needle.size()){
                    if(haystack[_i]==needle[j]){
                        _i++;
                        j++;
                    }else{
                        flag = false;
                        break;
                    }
                }
                if(flag){
                    return i;
                }
            }
        }
        return -1;
    }
};

KMP出場

來源：

Knuth-Morris-Pratt 字符串查找算法，簡稱爲 “KMP算法”，常用於在一個文本串S內查找一個模式串P 的出現位置，這個算法由Donald Knuth、Vaughan Pratt、James H. Morris三人於1977年聯合發表，故取這3人的姓氏命名此算法。

算法流程：

假設現在文本串S匹配到 i 位置，模式串P匹配到 j 位置

如果j = -1，或者當前字符匹配成功（即S[i] == P[j]），都令i++，j++，繼續匹配下一個字符；

如果j != -1，且當前字符匹配失敗（即S[i] != P[j]），則令 i 不變，j = next[j]。此舉意味着失配時，模式串P相對於文本串S向右移動了j - next [j] 位。
換言之，當匹配失敗時，模式串向右移動的位數爲：失配字符所在位置 - 失配字符對應的next 值，即移動的實際位數爲：j - next[j]，且此值大於等於1。

next數組：

next數組的含義是：代表當前字符之前的字符串中，有多大長度的相同前綴後綴

這裏的next會整體右移一位，然後求法絕對是KMP算法裏最騷的一步

舉個例子：ABCDABD

看樣子很容易理解，寫出N*N 的解法很容易，寫出線性複雜度的解法就要點騷操作了！

先看代碼：

void GetNext(char* needle,int next[]) {  
    int pLen = strlen(needle);  
    next[0] = -1;  
    int k = -1;  
    int j = 0;  
    while (j < pLen - 1) {  
        //needle[k]表示前綴，needle[j]表示後綴  
        if (k == -1 || needle[j] == needle[k]) {  
            ++k;  
            ++j;  
            next[j] = k;  
        }else {  
            k = next[k];  // 這步纔是最騷的
        }  
    }  
}

GIF演示：

是不是覺得有那麼點DP的味道？？？細品再細品，確實有點DP的思想，詳見：https://www.zhihu.com/search?type=content&q=KMP%20DP

（來源：https://www.zhihu.com/search?type=content&q=KMP%20DP）

C語言描述

非常有意思的是，知乎高贊前五個核心代碼一模一樣！

可能最初來源於某個經典教材吧，看代碼：

void GetNext(char* needle,int next[]) {  
    int pLen = strlen(needle);  
    next[0] = -1;  
    int k = -1;  
    int j = 0;  
    while (j < pLen - 1) {  
        //needle[k]表示前綴，needle[j]表示後綴  
        if (k == -1 || needle[j] == needle[k]) {  
            ++k;  
            ++j;  
            next[j] = k;  
        }else {  
            k = next[k];  // 這步纔是最騷的
        }  
    }  
} 
int next[100000];
int strStr(char * haystack, char * needle){
    int i = 0;  
    int j = 0;  
    int sLen = strlen(haystack);  
    int pLen = strlen(needle);  
    GetNext(needle, next);    // 求出next數組
    while (i < sLen && j < pLen) {  
        //如果j = -1，或者當前字符匹配成功（即S[i] == P[j]），都令i++，j++      
        if (j == -1 || haystack[i] == needle[j]) {  
            i++;  
            j++;  
        }else {
        //如果j != -1，且當前字符匹配失敗（即S[i] != P[j]），則令 i 不變，j = next[j]
            j = next[j];  //next[j]即爲j所對應的next值   
        }  
    }  
    if (j == strlen(needle))  
        return i - j;  
    else  
        return -1;  
}

C++描述

我自己改了一點，核心不變

class Solution {
public:
    int strStr(string haystack, string needle) {
        // 先求出next數組
        int next[needle.size()+1];
        next[0] = -1;
        int i = 0, j = -1;
        while(i<needle.size()){
            // 因爲next數組和完全概念上的前後綴有一個右移一位的差距，所以處理的時候小心下標！
            if(j==-1||needle[i]==needle[j]){
                ++i;
                ++j;
                next[i] = j;
            }else{
                j = next[j];    // 這步纔是最騷的
            }
        }
        // 記得先重置雙指針
        i = 0;
        j = 0;
        // 核心KMP代碼
        // 因爲後面會改變haystack和needle數組的大小,這裏先設置一下
        int haystackSize = haystack.size();
        int needleSize = needle.size();
        while(i<haystackSize && j<needleSize){
            if(j==-1||haystack[i]==needle[j]){
            //如果j = -1，或者當前字符匹配成功（即S[i] == P[j]），都令i++，j++ 
                i++;
                j++;
            }else{  
        //如果j != -1，且當前字符匹配失敗（即S[i] != P[j]），則令 i 不變，j = next[j]                  //next[j]即爲j所對應的next值      
                j = next[j];
            }
        }
        if(j==needleSize)
            return i-j;
        return -1;
    }
};

參考

https://www.cnblogs.com/zhangtianq/p/5839909.html

https://www.zhihu.com/question/21923021

從LeetCode.strStr()出發理解KMP算法

從一道LeetCode談起

雙指針(暴力)解法

KMP出場

C語言描述

C++描述

參考

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

數字操作——從LeetCode題海中總結常見套路

二維矩陣——從LeetCode題海中總結常見套路

從LeetCode.strStr()出發理解KMP算法

關於C++編譯器默認編寫的一些思考

棧——從LeetCode題海中總結常見套路

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結