FASTA Format

FASTA格式又稱Pearson的格式,該種序列格式要求序列的標題行以大於號">"開頭,下一行起爲具體的序列。一般建議每行的字符數不超過60個,以方便程序處理。多條核苷酸序列格式即將該格式連續列出即可.

  • This format contains a single header line providing the sequence name, and optionally a description, followed by lines of sequence data.
  • Sequences in FASTA formatted files are preceded by a line starting with a " >" symbol.
  • The first word on this line is the name of the sequence. The rest of the line is a description of the sequence.
Term     Entry Name        Molecule Type     Gene Name     Sequence Length
e.g.       FOSB_MOUSE     Protein                fosB                 338 bp
    • The remaining lines contain the sequence itself, usually formated to 60 characters per line.
    • Depending on the application blank lines in a FASTA file are ignored or treated as terminating the sequence
    • Depending on the application spaces or other non-sequence symbols (dashes, underscores, periods) in a sequence are either ignored or treated as gaps.
    • FASTA files containing multiple sequences are just the same, with one sequence listed right after another. This format is accepted for many multiple sequence alignment programs.
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章