hdu4644 BWT

BWT

Time Limit: 12000/6000 MS (Java/Others)    Memory Limit: 65535/32768 K (Java/Others)
Total Submission(s): 114    Accepted Submission(s): 38


Problem Description
When the problem to match S string in T string is mentioned, people always put KMP, Aho-Corasick and Suffixarray forward. But Mr Liu tells Canoe that there is an algorithm called Burrows–Wheeler Transform(BWT) which is quite amazing and high-efficiency to solve the problem. 
But how does BWT work to solve the matching S-in-T problem? Mr Liu tells Canoe the firstly three steps of it.
Firstly, we append the ‘$’ to the end of T and for convenience, we still call the new string T. And then for every suffix of T string which starts from i, we append the prefix of T string which ends at (i – 1) to its end. Secondly, we sort these new strings by the dictionary order. And we call the matrix formed by these sorted strings Burrows Wheeler Matrix. Thirdly, we pick characters of the last column to get a new string. And we call the string of the last column BWT(T). You can get more information from the example below.



Then Mr Liu tells Canoe that we only need to save the BWT(T) to solve the matching problem. But how and can it? Mr Liu smiles and says yes. We can find whether S strings like “aac” are substring of T string like “acaacg” or not only knowing the BWT(T)! What an amazing algorithm BWT is! But Canoe is puzzled by the tricky method of matching S strings in T string. Would you please help Canoe to find the method of it? Given BWT(T) and S string, can you help Canoe to figure out whether S string is a substring of string T or not?
 

Input
There are multiple test cases.
First Line: the BWT(T) string (1 <= length(BWT(T)) <= 100086). 
Second Line: an integer n ( 1 <=n <= 10086) which is the number of S strings. 
Then n lines comes. 
There is a S string (n * length(S) will less than 2000000, and all characters of S are lowercase ) in every line.
 

Output
For every S, if S string is substring of T string, then put out “YES” in a line. If S string is not a substring of T string, then put out “NO” in a line.
 

Sample Input
gc$aaac 2 aac gc
 

Sample Output
YES NO
Hint
A naive method will not be accepted.
 

Source
 

Recommend
zhuyuanchen520

解題思路:

關鍵就是找到BTW的逆過程,這個逆過程是這樣子的(直接把多校題解的表貼過來了):

ADD  1

g

c

$

a

a

a

c

SORT 1

$

a

a

a

c

c

g

ADD 2

g$

ca

$a

aa

ac

ac

cg

SORT 2

$a

aa

ac

ac
ca

cg

g$

ADD 3

g$a

caa

$ac

aac
aca

acg

cg$

SORT 3

$ac

aac
aca

acg

caa

cg$

g$a

ADD 4

g$ac

caac
$aca

aacg

acaa

acg$

cg$a

SORT 4

$aca

aacg

acaa

acg$

caac

cg$a

g$ac

ADD 5

g$aca

caacg

$acaa

aacg$

acaac

acg$a

cg$ac

SORT 5

$acaa

aacg$

acaac

acg$a

caacg

cg$ac

g$aca

ADD 6

g$acaa

caacg$

$acaac

aacg$a

acaacg

acg$ac

cg$aca

SORT 6

$acaac

aacg$a

acaacg

acg$ac

caacg$

cg$aca

g$acaa

ADD 7

g$acaac

caacg$a

$acaacg

aacg$ac

acaacg$

acg$aca

cg$acaa

SORT 7

$acaacg

aacg$ac

acaacg$

acg$aca

caacg$a

cg$acaa

g$acaac

 

 

詳細的介紹可以看這裏:https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform
這個逆過程還原原串的算法可以O(n)實現的,方法就是這樣的:
其實我們只需要關注這個矩陣的第一行就行了,而且每次排序交換的行都是一樣的,其實就等同於不斷對s排序。也就是我們只需要知道排序後的第一行的最後一個元素是誰就可以了,我們可以追蹤它的位置的。細細想一下就會得到如下的方法:
首先對s排序,並記錄排序後的元素的原位置。
如:
gc$aaac
1234567
排序後是
$aaaccg
3456271
可以記Tran[]={3,4,5,6,2,7,1}
那麼只需簡單的從1開始沿着Tran[i]這條路徑走就可以還原原串了。
3->5->2->4->6->7->1
$->a->c->a->a->c->g

代碼:

#include<cstdio>
#include<iostream>
#include<cstring>
#include<algorithm>
using namespace std;
char str[2000100];
char t[100100];
pair<int,int>s[100100];
int index[100100];
int n;
int next[100100];
void get_next(char* str,int len){
	next[0]=0;
	for(int i=1;i<len;i++){
		int j=next[i-1];
		while(j&&str[i]!=str[j])j=next[j-1];
		if(str[i]==str[j])j++;
		next[i]=j;
	}
}
bool match(char *s,int len){
	int k=0;
	for(int i=0;i<n;i++){
		while(k&&str[k]!=s[i])k=next[k-1];
		if(str[k]==s[i])k++;
		if(k==len)return true;
	}
	return false;
}
int main(){
	while(~scanf("%s",str)){
		n=strlen(str);
		for(int i=0;i<n;i++){
			s[i].first=str[i];
			s[i].second=i;
		}
		stable_sort(s,s+n);
		for(int i=0;i<n;i++)index[i]=s[i].second;
		int now=index[0];
		n--;
		for(int i=0;i<n;i++){
			t[i]=s[now].first;
			now=index[now];
		}
		t[n]='\0';
		int q;
		scanf("%d",&q);
		while(q--){
			scanf("%s",str);
			int len=strlen(str);
			get_next(str,len);
			if(match(t,len))printf("YES\n");
			else printf("NO\n");
		}
	}
}



發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章