The AVX format for local and global alignments

AVX is a format for output of multiple local and global alignments. The format is based on the clustalw format which is user-browser-friendly and the fasta alignment format which has the advantage of allowing for diverse sequence identifiers. We have included range information in the format to allow for recording both local and global alignment, and a unique strand identifier for each sequence.

Specification:

1. An avx file is a text file describing alignments between multiple sequences. We denote the sequences by s_1,s_2,...,s_k. 2. The file is divided into blocks. Each block represents a single multiple alignment between any r of the sequences (2 <= r <= k). A block begins with identifiers for each sequence. The identifiers are of fasta style, namely they consist of lines beginning with the ">" symbol. The ">" is followed by a pair of numbers in brackets [a,b] designating the beginning and ending position in the sequence for this alignment block and then is followed by either a + or a - to indicated the strand which had alignment. Following this there is a regular text expression identifying the sequence. For example, a block could begin:
> [10,100] + human region
> [2,150] - mouse region
> [3,1729] + dog region
3. The identifiers for a block are followed by the multiple alignment, showing the sequence characters with a "-" for gaps. For example:
ACGTGA---AGTGA
-CGAGATT-AATGA
ACGA--TTAAC--A

Tools for working with the avx format

(coming online soon).