Sequences are expected to be represented in the standard
IUB/IUPAC nucleic acid code, with these
exceptions: lower-case letters are accepted and are mapped
into upper-case; any characters other than A,C,G,T are converted into 'N' (unknown)
The nucleic acid codes supported are:
A --> Adenine
C --> Cytosine
G --> Guanine
T --> Thymine
N --> A G C T (any)
Multi-Fasta format consists of alternating description lines followed
by sequence data. It is important that each ">" symbol appear on a new line.
For example:
>human
AGTGAGACACGACGAGCCTACTATCAGGACGAGAGCAGGAGAGTGATGATGAGTAGCG
CACAGCGACGATCATCACGAGAGAGTAAGAAGCAGTGATGATGTAGAGCGACGAGAGC
ACAGCGGCGACTACTACTAGG
>mouse
AGTGTGTCTCGTCGTGCCTACTTTCAGGACGAGAGCAGGTGAGTGTTGATGAGTTGCG
CTCTGCGACGTTCATCTCGAGTGAGTTAGAAAGTGAAGGTATAACACAAGGTGTGAAG
GCAGTGATGATGTAGAGCGACGAGAGCACAGCGGCGGGATGATATATCTAGGAGGATG
CCCAATTTTTTTTT
>platypus
CTCTGCGGCGTTCGTCTCGGGTGGGTTGGGGGGTGGGGGTGTGGCGCAAGGTGTGAAG
CACGACGACGATCTACGACGAGCGAGTGATGAGAGTGATGAGCGACGACGAGCACTAG
AAGCGACGACTACTATCGACGAGCAGCCGAGATGATGATGAAAGAGAGAGAA