MAVID Frequently Asked Questions

What is MAVID?

MAVID is a multiple alignment program that is suitable for alignments of large numbers of DNA sequences. The sequences can be small mitochondrial genomes or large genomic regions up to megabases long. MAVID is also integrated with various phlogenetic tree construction programs, conservation identification, and visualization tools, and can be used to identify conserved regions in phylogenetically related sequences. It is important to note that MAVID is currently a DNA sequence alignment program and cannot align protein sequences. MAVID is a progressive alignment algorithm. Maximum likelihood estimates of the ancestral sequences are aligned recursively using the AVID pairwise alignment method. AVID is a hierarchical alignment algorithm which works by iteratively aligning large anchors, finally aligning weakly conserved regions using the Smith-Waterman algorithm.

Which organisms will MAVID work on?

MAVID has been used to align verteberate genomic sequences, mitochondrial DNA, viruses, and other sequences. MAVID is not currently well suited to aligning sequences where global alignment is not necessarily biologically meaningful (for example because of rearrangements or inversion). A more appropriate tool at this time for such alignments is BLAST.

How do I cite MAVID in a journal publication?

AVID and MAVID are described in the following publication:

Bray N, Pachter L: MAVID: Constrained Ancestral Alignment of Multiple Sequences, Genome Research 14 (2004), p 574--579.

Under what license is MAVID available?

The MAVID source code is available for download and is free for academic or non-profit use.

What is the difference between MAVID and AVID?

AVID is a pairwise alignment program. MAVID is a multiple alignment program. MAVID has been built using AVID, and encompasses it: if two sequences are submitted to MAVID it reduces to AVID.

How does MAVID compare to other tools?

The running time of MAVID is linear in the number of sequences submitted and almost linear in their length. It is therefore able to process much larger alignment problems than previously possible.

Where do I send bugreports/comments/questions?

Nicolas Bray and Lior Pachter

Can MAVID run on draft?

Yes, MAVID can handle draft, however this functionality is not available on the web server and is only operational in the standalone package. Support for this feature on the webserver is under development.

How does MAVID handle overlapping genes/rearrangements/duplications etc.?

MAVID is currently unable to deal with these issues.

How many sequences can MAVID align, and how long can they be??

The limit is set by the hardware used, in particular its memory restrictions. We have found that MAVID requires roughly 0.5 gigabyte of RAM per 1 megabase of aligned sequence. We have aligned up to 1000 sequences, although there is no reason that more couldn't be aligned.

Are the sequences repeatmasked before the gene prediction?

Yes and No. MAVID uses DUST as pre-processing step before the MAVID algorithm begins, but repeats are identified, not masked. Therefore repeats are used to reduce alignment errors, but repeats are aligned.

Are the sequences pre-processed in any other way besides identification of repeats?

No.

What programs can be used to visualize MAVID alignments?

The MAVID server automatically generates VISTA plots and boxshade figures (for reasonably sized sequence inputs). The MAVID multi-FASTA alignment format distributed on the results page can be used with a variety of other tools, such as phylovista or AltaVist. These and other tools have not been incorporated into the server for a variety of reasons (bugs in the software, incompatible with some browsers, software not freely available etc.) If you know of a useful visualization tool that is freely available for academic purposes and supports all browser platforms please let us know and we'll add it to the MAVID server.

Where do I find out about the input options to MAVID?

Please see the online user guide. The standalone package comes with additional documentation.

What does MAVID do if one of the sequences is reversed?

MAVID runs on the sequences as they are given, and the result of a reversed sequence would be that no alignment is found. Thus, if nothing is predicted in a region where you know there are genes this could be the reason. Try to run again with one sequence reversed.

Can I make feature requests?

Yes. Suggestions and comments are appreciated. Please email Nicolas Bray and Lior Pachter

What additional software do I need to run MAVID?

The standalone MAVID package requires a repeat masking tool such as RepeatMasker or Dust. The tree construction can be based on any package: we have found ML methods to be slow for some of the large alignment problems and for those a neighbor joining program is more suitable. Implementations of the tree building algorithms can be found in the PHYLIP, fastDNAml or CLUSTALW packages.

I would like to automate data analysis through the MAVID webserver. Can I do this?

We receive an increasing amount of requests every day, so the risk is that our server would be overloaded. If you are a frequent user a more efficient solution is for you to install MAVID locally. We are more than happy to assist you in doing this.