kmp算法講解 java

文章目錄

next數組

kmp算法本質上就是一個字符串匹配的算法。它的作用與java中String類的indexOf方法是一樣的，就是返回一個字符串（以下簡稱N串）在另一個字符串（以下簡稱M串）中的位置，其核心也就是找到主字符串中與匹配字符串相同的部分。只不過在複雜度上進行了一些優化。

暴力匹配算法

簡單來說，就是通過雙重循環來遍歷所有情況，以進行匹配。
相信所有正在學習kmp算法的人早已掌握這種方法了，我也不再多說，只給出一種解法以供參考。

public class Test {

	public static void main(String[] args) {
		System.out.println(simple("asdfghM asdfghN", "asdfghN"));
	}
	
	public static int simple(String src, String target){
		for(int i = 0; i < src.length() - target.length() + 1; i++){
			boolean flag = true;
			for(int j = 0; j < target.length(); j++){
				if(src.charAt(i + j) != target.charAt(j)){
					flag = false;
					break;
				}
			}
			if(flag){
				return i;
			}
			flag = true;
		}
		return -1;
	}

}

kmp算法

回顧一下之前的暴力匹配算法，它的實際比較過程應該是這樣的。

--------第1次比較--------
          a==a
asdfghM asdfghN
asdfghN
--------第2次比較--------
          s==s
asdfghM asdfghN
asdfghN
--------第3次比較--------
          d==d
asdfghM asdfghN
asdfghN
--------第4次比較--------
          f==f
asdfghM asdfghN
asdfghN
--------第5次比較--------
          g==g
asdfghM asdfghN
asdfghN
--------第6次比較--------
          h==h
asdfghM asdfghN
asdfghN
--------第7次比較--------
          M!=N
asdfghM asdfghN
asdfghN
--------第8次比較--------
          s!=a
asdfghM asdfghN
 asdfghN
--------第9次比較--------
          d!=a
asdfghM asdfghN
  asdfghN
--------第10次比較--------
          f!=a
asdfghM asdfghN
   asdfghN
--------第11次比較--------
          g!=a
asdfghM asdfghN
    asdfghN
--------第12次比較--------
          h!=a
asdfghM asdfghN
     asdfghN
--------第13次比較--------
          M!=a
asdfghM asdfghN
      asdfghN
--------第14次比較--------
           !=a
asdfghM asdfghN
       asdfghN
--------第15次比較--------
          a==a
asdfghM asdfghN
        asdfghN
--------第16次比較--------
          s==s
asdfghM asdfghN
        asdfghN
--------第17次比較--------
          d==d
asdfghM asdfghN
        asdfghN
--------第18次比較--------
          f==f
asdfghM asdfghN
        asdfghN
--------第19次比較--------
          g==g
asdfghM asdfghN
        asdfghN
--------第20次比較--------
          h==h
asdfghM asdfghN
        asdfghN
--------第21次比較--------
          N==N
asdfghM asdfghN
        asdfghN

仔細觀察，在第7次比較時，M != N，然後就將N串向右移一位，重新以第一位進行比較。這很合理，卻又太過笨拙。對於N串來說，它的每一個字符都是不相等的。在進行第7次比較時，已經確認它的前6個字符與M串一一對應了，換句話說，N串的第一個字符（a）與M串的2~5的字符（sdfgh）肯定也是不相等的。所以，第8次比較不應該只向右移一位，而是應該向右移6位，進行上圖中的“第13次比較”。

ps：爲什麼是向右移6位而不是7位？因爲
N(1) != N(7) && M(7) != N(7) 並不能推導出 N(1) != M(7)

從上述例子中應該能猜得到，根據N串本身的特點，可以在比較中向右移更多的位數。特殊的，若N串中所有字符都不相等，那麼可以向右移當前比較位數減一的位數。

對於這樣的特殊情況，是可以向右移最多的位數的。那麼現在要考慮的就是，在什麼情況下，不能夠向右移這麼多的位數。換句話說，當進行第7次比較並且不相等時，對於之前的6個字符來說，N串只需向右移動某個位數（小於6），即可使N串的前幾個字符與M串的前6個字符中的某個子串相等，並且使之有可能存在另一個解。

這只是一個半成品的推論，它並不能爲我們做什麼。這是還是拿起筆在紙上寫一寫吧。

我寫了這樣一個例子。

asdasea......
asdaseN

在進行第7次比較後，一定不存在一個移動小於6的位數的解。在通過多次嘗試之後，終於總結出規律：※※※N串的前6位字符必須首尾存在相同子串。
寫個例子驗證一下。

--------第7次比較--------
         M != N
asdasdM
asdasdN
--------第8次比較--------
         M != a
asdasdM
   asdasdN

解釋：

由於N串的前6位是首尾存在長度爲3的相同子串的，所以N(1,2,3)一定是等於M(4,5,6)的，所以第8次直接比較M(7)與N(4)。
在第7次比較中，N(7) != M(7)，但是N(4)依然有可能等於M(7)。

通過上面的規律，可以總結出更一般向的結論：若N串已經匹配到了第a位（a大於1），並且在a位之前的子串中存在首尾相同的長度爲b的子串（若不存在這樣的子串，則記b等於0），則可以將N串向右移b位，並且用N(b+1)繼續與M串中上一次參與比較的字符進行比較（根據個人習慣，也可以理解成N(a-b)）。

ps: 其實，數學好的同學是可以寫出嚴謹的數學證明來驗證這一結論的，我以前也寫過，然後，忘了。不過這麼直觀的東西不用證明也是可以的吧。

現在，問題的關鍵已經轉換到求N串的每一位字符之前的子串中，最長首尾相同子串的長度上了。而這個最長長度所組成的數組，即是next數組。

next數組

先寫個例子看一下

asdaseN   
這個字符串所對應的next數組爲
[-1, 0, 0, 0, 1, 2, 0]

e（第6位）對應的值爲2，是因爲之前有as這個長度爲2的子串。s（第5位）對應的值爲1是因爲之前有a這個長度爲1的子串。至於首位的-1，則可以認爲這是人爲規定的，因爲它之前沒有子串。在之後的代碼中也會看到，-1被當做一個特殊值使用。畢竟，如果第一個值就不相等，那麼肯定是直接向後移一位的。

雖然我們已經知道了next數組的含義，但是想要求出它並不是一件容易事。

暴力求解next數組

遇到複雜的問題總是想暴力求解一下，因爲這通常比較簡單。但是next數組的暴力求解還是有點複雜的。況且，作爲解決暴力搜索字符串複雜度問題的最核心步驟，竟然還是用暴力搜索求解，也顯得有些滑稽。

	public static int[] getNext1(String target){
		int[] next = new int[target.length()];
		for(int i = 0; i < target.length(); i++){
			for(int j = 0; j < i; j++){
				//System.out.println(target.substring(0, i));
				for(int q = j; q > 0; q--){
					//System.out.println("q = " + q + " j = " + j);
					//System.out.println(target.charAt(j - q) + "--" + target.charAt(i - q));
					if(target.charAt(j - q) != target.charAt(i - q)){
						break;
					}
					if(q == 1){
						next[i] = j;
					}
						
				}
			}
			next[i] = i == 0 ? -1 : Math.max(next[i], -1);
		}
		return next;
	}

求解next數組

正確的next數組求解算法要優雅得多。它會以前一位的值作爲基礎來推算下一位的值，其中會用到遞歸的寫法，更準確的說，這屬於一種動態規劃。那麼首先就是要找到它的推算規則。

next數組定義：next數組每一位的值，代表之前子串中，首尾最長相同字符串的長度。

爲什麼在這裏又提了一遍next數組定義？因爲這是理解後面算法的核心！

首先，準備一個特徵性很強的字符串用作討論abaababa 。
它對應的next數組應該是這樣的

 a b a a b a b a
-1 0 0 1 1 2 3 2

next數組的遞推可以分爲3種情況

第一種
首位爲-1，其他位若沒有首位相同字符串，則爲0 。這個是終止條件。
第二種
現在，先假設我們已經求得了它的前6位，即[-1, 0, 0, 1, 1, 2] 。
當求解第7位時，可以知道它的前一位是2 。next數組的第6位是2，這代表着，
N(1)N(2) == N(4)N(5)，因此，現在只需要判斷N(3)與N(6)是否相等就可以確定N(7)是否等於3 （用遞歸的說法就是2 + 1）。在這個例子中，N(3)，N(6)都爲1，所以可以直接得出N(7)等於3 。
第三種
若上述中的N(3)與N(6)不等又會怎樣呢？先假設現在已經求出了next數組的前7位，即
[-1, 0, 0, 1, 1, 2, 3] 。
在求解第8位時，依然按照上面的方法求解，但是會發現N(4) ！= N(7)，這時候就需要一個跳躍性思維了：直接觀察N(N(7))，即N(3) = 1 。
現在的情況是，N(7) = 3,N(3) = 1 。結合着next數組的定義來看，這代表着：
N(1)=N(3)=N(4)=N(6)
不過我們現在需要的只是N(1)=N(6) ，這樣就可以只判斷N(2)與N(7)是否相等來確定N(8)是否等於2（用遞歸的角度說就是N(3)+1）。在本例中，N(2)與N(7)相等，因此N(8)爲2 。若不相等，則繼續按照此方法遞歸着找下去，直到找到相等的值，或者找到盡頭，即第1位。

靠着上面3個規則，已經可以求出next了。

	public static int[] getNext(String target){
		int[] next = new int[target.length()];
		next[0] = -1;
		int i = 1;
		int j = 0;
		
		while(i < target.length() - 1){
			if(j == -1 || target.charAt(j) == target.charAt(i)){
				j++;
				i++;
				next[i] = j;
			}else {
				j = next[j];
			}
		}
		return next;
	}

這段代碼是上面3種規則的完美體現，相信已經不需要再解釋了。

kmp算法代碼

現在已經有了kmp算法的規則和最重要的next求法，剩下的只要組裝一下就行了。

	/**
	 * @param args
	 */
	public static void main(String[] args) {
		// TODO Auto-generated method stub
		String src = "asdaseM asdaseN";
		String target = "asdaseN";
		
		System.out.println(kmp(src, target));
	}
	
	public static int kmp(String src, String target){
		
		int[] next = getNext(target);
			
		int i = 0;	//src下標
		int j = 0;	//target下標
		
		for(; j < target.length() && i < src.length(); ){
			if(j == -1 || src.charAt(i) == target.charAt(j)){
				i++;
				j++;
			}else {
				j = next[j];
			}
		}
		
		if(j == target.length()){
			return i - j;
		}
			
		return -1;
	}

	public static int[] getNext(String target){
		int[] next = new int[target.length()];
		next[0] = -1;
		int i = 1;
		int j = 0;
		
		while(i < target.length() - 1){
			if(j == -1 || target.charAt(j) == target.charAt(i)){
				j++;
				i++;
				next[i] = j;
			}else {
				j = next[j];
			}
		}
		return next;
	}

下面附贈一個方便展示比較過程的輔助類，可以幫助理解。

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class KmpBlog {

	static class KmpPrint{
		private String src;
		private String targer;
		private List<Integer[]> list = new ArrayList();
		
		KmpPrint(String src, String target){
			this.src = src;
			this.targer = target;
		}
		
		public void add(int i1, int i2){
			Integer[] arr = {i1, i2};
			this.list.add(arr);
		}
		
		public void print(){
			for(Integer[] arr: this.list){
				System.out.println("--------第" + (this.list.indexOf(arr) + 1) + "次比較--------");
				System.out.println("          " + this.src.charAt(arr[0] + arr[1]) 
						+ (this.src.charAt(arr[0] + arr[1]) == this.targer.charAt(arr[1]) ? "==" : "!=") 
						+ this.targer.charAt(arr[1]));
				System.out.println(this.src);
				System.out.printf("%" + (this.targer.length() + arr[0]) + "s%n", this.targer);
			}
		}
		
	}
	
	/**
	 * @param args
	 */
	public static void main(String[] args) {
		// TODO Auto-generated method stub
		String src = "asdaseM asdaseN";
		String target = "asdaseN";
		
		System.out.println(kmp(src, target));
	}
	
	public static int kmp(String src, String target){
		
		KmpPrint prt = new KmpPrint(src, target);
		
		int[] next = getNext(target);
		System.out.println("next : " + Arrays.toString(next));
			
		int i = 0;	//src下標
		int j = 0;	//target下標
		
		for(; j < target.length() && i < src.length(); ){
			if(j != -1)
				prt.add(i - j, j);
			if(j == -1 || src.charAt(i) == target.charAt(j)){
				i++;
				j++;
			}else {
				j = next[j];
			}
		}
		
		if(j == target.length()){
			prt.print();
			return i - j;
		}
			
		prt.print();
		return -1;
	}

	public static int[] getNext(String target){
		int[] next = new int[target.length()];
		next[0] = -1;
		int i = 1;
		int j = 0;
		
		while(i < target.length() - 1){
			if(j == -1 || target.charAt(j) == target.charAt(i)){
				j++;
				i++;
				next[i] = j;
			}else {
				j = next[j];
			}
		}
		return next;
	}

}

kmp算法優化

kmp算法依然存在缺陷。嘗試對下面的字符串進行匹配，它的匹配過程應該是這樣的

aaaac aaaaac
aaaaac

--------第1次比較--------
          a==a
aaaac aaaaac
aaaaac
--------第2次比較--------
          a==a
aaaac aaaaac
aaaaac
--------第3次比較--------
          a==a
aaaac aaaaac
aaaaac
--------第4次比較--------
          a==a
aaaac aaaaac
aaaaac
--------第5次比較--------
          c!=a
aaaac aaaaac
aaaaac
--------第6次比較--------
          c!=a
aaaac aaaaac
 aaaaac
--------第7次比較--------
          c!=a
aaaac aaaaac
  aaaaac
--------第8次比較--------
          c!=a
aaaac aaaaac
   aaaaac
--------第9次比較--------
          c!=a
aaaac aaaaac
    aaaaac
...
...

可以發現，其中的第6次到第9次比較都是多餘的。它們都對a與c進行了比較，但其實，在第6次比較失敗後，接下來3次比較的結果已經是可以預測的了。

讓我們換一個更加容易理解的短字符串來分析一下。
abab 。按照之前的結論，它對應的next數組應該是[-1, 0, 0, 1] 。其中，第4位的1代表的含義也可以這樣解釋：如果第4位匹配失敗，則下一次對第2位（就是上面的結論，3 - 1 = 2）的字符進行匹配。但是，在這個字符串中，第4位與第2位都是b。如果第4位匹配失敗，那麼第2位也一定會失敗，這會產生一次多餘的比較。

要優化這個問題也是非常簡單。只需要在next數組求解的過程中，若發現當前位（簡稱位A）與當前位匹配失敗後下一次匹配的位（簡稱位B）相等，則當前位的next值要替換爲位B的值。

優化後的next數組求解代碼

	public static int[] getNext_new(String target){
		int[] next = new int[target.length()];
		next[0] = -1;
		int i = 1;
		int j = 0;
		
		while(i < target.length() - 1){
			if(j == -1 || target.charAt(j) == target.charAt(i)){
				j++;
				i++;
				//新加的判斷
				if(target.charAt(j) != target.charAt(i)){
					next[i] = j;
				}else {
					next[i] = next[j];
				}
				
			}else {
				j = next[j];
			}
		}
		return next;
	}

這樣，字符串abab 的next數組就從[-1, 0, 0, 1] 變爲 [-1, 0, -1, 0] 。其他代碼與之前一樣。

kmp算法講解 java

文章目錄

暴力匹配算法

kmp算法

next數組

暴力求解next數組

求解next數組

kmp算法代碼

kmp算法優化

arcgis js 簡單示例

css 超出長度顯示省略號 text-overflow: ellipsis;

記錄一下win10筆記本安裝jdk時的問題

springboot項目mybatis日誌自定義設置無法生效

搭建一個簡易的springboot-maven項目

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結