Multiple Sequence Alignment - Alignment Visualization and Quality Control

Alignment Visualization and Quality Control

The necessary use of heuristics for multiple alignment means that for an arbitrary set of proteins, there is always a good chance that an alignment will contain errors. For example, an evaluation of several leading alignment programs using the BAliBase benchmark found that at least 24% of all pairs of aligned amino acids were incorrectly aligned. These errors can arise because of unique insertions into one or more regions of sequences, or through some more complex evolutionary process leading to proteins that do not align easily by sequence alone. As the number of sequence and their divergence increases many more errors will be made simply because of the heuristic nature of MSA algorithms. Multiple sequence alignment viewers enable alignments to be visually reviewed, often by inspecting the quality of alignment for annotated functional sites on two or more sequences. Many also enable the alignment to be edited to correct these (usually minor) errors, in order to obtain an optimal 'curated' alignment suitable for use in phylogenetic analysis or comparative modeling.

However, as the number of sequences increases and especially in genome-wide studies that involve many MSAs it is impossible to manually curate all alignments. Furthermore, manual curation is subjective. And finally, even the best expert cannot confidently align the more ambiguous cases of highly diverged sequences. In such cases it is common practice to use automatic procedures to exclude unreliably aligned regions from the MSA. For the purpose of phylogeny reconstruction (see below) the Gblocks program is widely used to remove alignment blocks suspect of low quality, according to various cutoffs on the number of gapped sequences in alignment columns. However, these criteria may excessively filter out regions with insertion/deletion events that may still be aligned reliably, and these regions might be desirable for other purposes such as detection of positive selection. A few alignment algorithms output site-specific scores that allow the selection of high-confidence regions. Such a service was first offered by the SOAP program, which tests the robustness of each column to perturbation in the parameters of the popular alignment program CLUSTALW. The TCOFFEE program uses a library of alignments in the construction of the final MSA, and its output MSA is colored according to confidence scores that reflect the agreement between different alignments in the library regarding each aligned residue. Another alignment program that can output an MSA with confidence scores is FSA, which uses a statistical model that allows calculation of the uncertainty in the alignment. The HoT (Heads-Or-Tails) score can be used as a measure of site-specific alignment uncertainty due to the existence of multiple co-optimal solutions. The GUIDANCE program calculates a similar site-specific confidence measure based on the robustness of the alignment to uncertainty in the guide tree that is used in progressive alignment programs. An alternative, more statistically justified approach to assess alignment uncertainty is the use of probabilistic evolutionary models for joint estimation of phylogeny and alignment. A Bayesian approach allows calculation of posterior probabilities of estimated phylogeny and alignment, which is a measure of the confidence in these estimates. In this case, a posterior probability can be calculated for each site in the alignment. Such an approach was implemented in the program BAli-Phy.

Read more about this topic:  Multiple Sequence Alignment

Famous quotes containing the words quality and/or control:

    Anybody who knows the difference between the kind of conversation you have walking in the woods and the kind of conversation you have between the segments of a show on Nickelodeon can tell you that quality time exists. Quality time is when you and your child are together and keenly aware of each other. You are enjoying the same thing at the same time, even if it is just being in a room or going for a drive in the car. You are somehow in tune, even while daring to be silent together.
    Louise Lague (20th century)

    The glance is natural magic. The mysterious communication established across a house between two entire strangers, moves all the springs of wonder. The communication by the glance is in the greatest part not subject to the control of the will. It is the bodily symbol of identity with nature. We look into the eyes to know if this other form is another self, and the eyes will not lie, but make a faithful confession what inhabitant is there.
    Ralph Waldo Emerson (1803–1882)