BWT

Time Limit: 12000/6000 MS (Java/Others) Memory Limit: 65535/32768 K (Java/Others)
Total Submission(s): 114 Accepted Submission(s): 38

Problem Description

When the problem to match S string in T string is mentioned, people always put KMP, Aho-Corasick and Suffixarray forward. But Mr Liu tells Canoe that there is an algorithm called Burrows–Wheeler Transform(BWT) which is quite amazing and high-efficiency to solve the problem.
But how does BWT work to solve the matching S-in-T problem? Mr Liu tells Canoe the firstly three steps of it.
Firstly, we append the ‘$’ to the end of T and for convenience, we still call the new string T. And then for every suffix of T string which starts from i, we append the prefix of T string which ends at (i – 1) to its end. Secondly, we sort these new strings by the dictionary order. And we call the matrix formed by these sorted strings Burrows Wheeler Matrix. Thirdly, we pick characters of the last column to get a new string. And we call the string of the last column BWT(T). You can get more information from the example below.

Then Mr Liu tells Canoe that we only need to save the BWT(T) to solve the matching problem. But how and can it? Mr Liu smiles and says yes. We can find whether S strings like “aac” are substring of T string like “acaacg” or not only knowing the BWT(T)! What an amazing algorithm BWT is! But Canoe is puzzled by the tricky method of matching S strings in T string. Would you please help Canoe to find the method of it? Given BWT(T) and S string, can you help Canoe to figure out whether S string is a substring of string T or not?

Input

There are multiple test cases.
First Line: the BWT(T) string (1 <= length(BWT(T)) <= 100086).
Second Line: an integer n ( 1 <=n <= 10086) which is the number of S strings.
Then n lines comes.
There is a S string (n * length(S) will less than 2000000, and all characters of S are lowercase ) in every line.

Output

For every S, if S string is substring of T string, then put out “YES” in a line. If S string is not a substring of T string, then put out “NO” in a line.

Sample Input

gc$aaac
2
aac
gc

Sample Output

YES
NO
Hint

A naive method will not be accepted.
 

Source

2013 Multi-University Training Contest 5

Recommend

zhuyuanchen520

解題思路：

關鍵就是找到BTW的逆過程，這個逆過程是這樣子的（直接把多校題解的表貼過來了）：

ADD 1

SORT 1

ADD 2

SORT 2

ac
ca

ADD 3

g$a

caa

$ac

aac
aca

acg

cg$

SORT 3

$ac

aac
aca

acg

caa

cg$

g$a

ADD 4

g$ac

caac
$aca

aacg

acaa

acg$

cg$a

SORT 4

$aca

aacg

acaa

acg$

caac

cg$a

g$ac

ADD 5

g$aca

caacg

$acaa

aacg$

acaac

acg$a

cg$ac

SORT 5

$acaa

aacg$

acaac

acg$a

caacg

cg$ac

g$aca

ADD 6

g$acaa

caacg$

$acaac

aacg$a

acaacg

acg$ac

cg$aca

SORT 6

$acaac

aacg$a

acaacg

acg$ac

caacg$

cg$aca

g$acaa

ADD 7

g$acaac

caacg$a

$acaacg

aacg$ac

acaacg$

acg$aca

cg$acaa

SORT 7

$acaacg

aacg$ac

acaacg$

acg$aca

caacg$a

cg$acaa

g$acaac

詳細的介紹可以看這裏：https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform
這個逆過程還原原串的算法可以O(n)實現的，方法就是這樣的：

其實我們只需要關注這個矩陣的第一行就行了，而且每次排序交換的行都是一樣的，其實就等同於不斷對s排序。也就是我們只需要知道排序後的第一行的最後一個元素是誰就可以了，我們可以追蹤它的位置的。細細想一下就會得到如下的方法：

首先對s排序，並記錄排序後的元素的原位置。

如：

gc$aaac
1234567

排序後是

$aaaccg

3456271

可以記Tran[]={3,4,5,6,2,7,1}

那麼只需簡單的從1開始沿着Tran[i]這條路徑走就可以還原原串了。

3->5->2->4->6->7->1

$->a->c->a->a->c->g

代碼：

#include<cstdio>
#include<iostream>
#include<cstring>
#include<algorithm>
using namespace std;
char str[2000100];
char t[100100];
pair<int,int>s[100100];
int index[100100];
int n;
int next[100100];
void get_next(char* str,int len){
	next[0]=0;
	for(int i=1;i<len;i++){
		int j=next[i-1];
		while(j&&str[i]!=str[j])j=next[j-1];
		if(str[i]==str[j])j++;
		next[i]=j;
	}
}
bool match(char *s,int len){
	int k=0;
	for(int i=0;i<n;i++){
		while(k&&str[k]!=s[i])k=next[k-1];
		if(str[k]==s[i])k++;
		if(k==len)return true;
	}
	return false;
}
int main(){
	while(~scanf("%s",str)){
		n=strlen(str);
		for(int i=0;i<n;i++){
			s[i].first=str[i];
			s[i].second=i;
		}
		stable_sort(s,s+n);
		for(int i=0;i<n;i++)index[i]=s[i].second;
		int now=index[0];
		n--;
		for(int i=0;i<n;i++){
			t[i]=s[now].first;
			now=index[now];
		}
		t[n]='\0';
		int q;
		scanf("%d",&q);
		while(q--){
			scanf("%s",str);
			int len=strlen(str);
			get_next(str,len);
			if(match(t,len))printf("YES\n");
			else printf("NO\n");
		}
	}
}

hdu4644 BWT

BWT

解題思路：

代碼：

Window 安裝 Python 失敗 0x80070643，發生嚴重錯誤

hdu 4654 k-edge connected components

hdu4644 BWT

hdu 4057 Rescue the Rabbit AC自動機+DP

有源匯上下界最大流 ZOJ 3229 代碼

hdu4126 Genghis Khan the Conqueror 樹形dp+最小生成樹

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結