FASTA格式又稱Pearson的格式,該種序列格式要求序列的標題行以大於號">"開頭,下一行起爲具體的序列。一般建議每行的字符數不超過60個,以方便程序處理。多條核苷酸序列格式即將該格式連續列出即可.
- This format contains a single header line providing the sequence name, and optionally a description, followed by lines of sequence data.
- Sequences in FASTA formatted files are preceded by a line starting with a " >" symbol.
- The first word on this line is the name of the sequence. The rest of the line is a description of the sequence.
Term Entry Name Molecule Type Gene Name Sequence Length |
e.g. FOSB_MOUSE Protein fosB 338 bp |
- The remaining lines contain the sequence itself, usually formated to 60 characters per line.
- Depending on the application blank lines in a FASTA file are ignored or treated as terminating the sequence
- Depending on the application spaces or other non-sequence symbols (dashes, underscores, periods) in a sequence are either ignored or treated as gaps.
- FASTA files containing multiple sequences are just the same, with one sequence listed right after another. This format is accepted for many multiple sequence alignment programs.