BWT
Time Limit: 12000/6000 MS (Java/Others) Memory Limit: 65535/32768 K (Java/Others)Total Submission(s): 114 Accepted Submission(s): 38
But how does BWT work to solve the matching S-in-T problem? Mr Liu tells Canoe the firstly three steps of it.
Firstly, we append the ‘$’ to the end of T and for convenience, we still call the new string T. And then for every suffix of T string which starts from i, we append the prefix of T string which ends at (i – 1) to its end. Secondly, we sort these new strings by the dictionary order. And we call the matrix formed by these sorted strings Burrows Wheeler Matrix. Thirdly, we pick characters of the last column to get a new string. And we call the string of the last column BWT(T). You can get more information from the example below.
Then Mr Liu tells Canoe that we only need to save the BWT(T) to solve the matching problem. But how and can it? Mr Liu smiles and says yes. We can find whether S strings like “aac” are substring of T string like “acaacg” or not only knowing the BWT(T)! What an amazing algorithm BWT is! But Canoe is puzzled by the tricky method of matching S strings in T string. Would you please help Canoe to find the method of it? Given BWT(T) and S string, can you help Canoe to figure out whether S string is a substring of string T or not?
First Line: the BWT(T) string (1 <= length(BWT(T)) <= 100086).
Second Line: an integer n ( 1 <=n <= 10086) which is the number of S strings.
Then n lines comes.
There is a S string (n * length(S) will less than 2000000, and all characters of S are lowercase ) in every line.
解題思路:
ADD 1 g c $ a a a c |
SORT 1 $ a a a c c g |
ADD 2 g$ ca $a aa ac ac cg |
SORT 2 $a aa ac ac cg g$ |
ADD 3 g$a caa $ac aac acg cg$ |
SORT 3 $ac aac acg caa cg$ g$a |
ADD 4 g$ac caac aacg acaa acg$ cg$a |
SORT 4 $aca aacg acaa acg$ caac cg$a g$ac |
ADD 5 g$aca caacg $acaa aacg$ acaac acg$a cg$ac |
SORT 5 $acaa aacg$ acaac acg$a caacg cg$ac g$aca |
ADD 6 g$acaa caacg$ $acaac aacg$a acaacg acg$ac cg$aca |
SORT 6 $acaac aacg$a acaacg acg$ac caacg$ cg$aca g$acaa |
ADD 7 g$acaac caacg$a $acaacg aacg$ac acaacg$ acg$aca cg$acaa |
SORT 7 $acaacg aacg$ac acaacg$ acg$aca caacg$a cg$acaa g$acaac |
|
|
這個逆過程還原原串的算法可以O(n)實現的,方法就是這樣的:
1234567
代碼:
#include<cstdio>
#include<iostream>
#include<cstring>
#include<algorithm>
using namespace std;
char str[2000100];
char t[100100];
pair<int,int>s[100100];
int index[100100];
int n;
int next[100100];
void get_next(char* str,int len){
next[0]=0;
for(int i=1;i<len;i++){
int j=next[i-1];
while(j&&str[i]!=str[j])j=next[j-1];
if(str[i]==str[j])j++;
next[i]=j;
}
}
bool match(char *s,int len){
int k=0;
for(int i=0;i<n;i++){
while(k&&str[k]!=s[i])k=next[k-1];
if(str[k]==s[i])k++;
if(k==len)return true;
}
return false;
}
int main(){
while(~scanf("%s",str)){
n=strlen(str);
for(int i=0;i<n;i++){
s[i].first=str[i];
s[i].second=i;
}
stable_sort(s,s+n);
for(int i=0;i<n;i++)index[i]=s[i].second;
int now=index[0];
n--;
for(int i=0;i<n;i++){
t[i]=s[now].first;
now=index[now];
}
t[n]='\0';
int q;
scanf("%d",&q);
while(q--){
scanf("%s",str);
int len=strlen(str);
get_next(str,len);
if(match(t,len))printf("YES\n");
else printf("NO\n");
}
}
}