
An Overview of Multiple Sequence Alignment


  • Abstract
  • Table of Contents
  • Figures
  • Literature Cited



Multiple sequence alignment is perhaps the most commonly applied bioinformatics technique. It often leads to fundamental biological insight into sequence?structure?function relationships of nucleotide or protein sequence families. In this unit, an overview of multiple sequence alignment techniques is presented, covering a history of nearly 30 years from the early pioneering methods to the current state?of?the?art techniques. Methodological and biological issues and end?user considerations, as well as alignment evaluation issues, are discussed.

Table of Contents

  • MSA Methodology
  • MSA Methods
  • Assessment of MSA
  • Conclusion
  • Literature Cited
  • Figures
  • Tables
  •   Figure Figure 3.7.1 Representation of the progressive alignment strategy comprising compilation and scoring of all pairwise alignments, yielding a similarity matrix, which is used to construct a guide tree. The resulting MSA is constructed in the order as given by the guide tree. The arrow in brackets represents alignment iteration.
  •   Figure Figure 3.7.2 An MSA gives rise to a similarity matrix containing all pairwise distances, which can be clustered and represented by a phylogenetic guide tree.
  •   Figure Figure 3.7.3 Illustration of all phylogenetic trees for a set of four sequences. (A ) There are three different possible unrooted trees. (B ) There are two different tree topologies and a total of 15 different rooted trees.
  •   Figure Figure 3.7.4 The phylogenetic tree of the flavodoxin family. The numbers at the ancestral nodes are bootstrap values.
  •   Figure Figure 3.7.5 Computational times of parallelized PRALINE on different numbers of nodes for three sets of 200, 100 and 50 sequences, each 200 residues long.
  •   Figure Figure 3.7.6 The Partial Order Graph (POA) alignment representation of the C‐termini of a pair of flavodoxin proteins. (A ) The alignment in standard format; (B ) The alignment in POA representation (adapted from Lee et al., ).
  •   Figure Figure 3.7.7 An MSA of the flavodoxin family members (13 proteins) created by PRALINE (Heringa, ) using local preprocessing with a threshold of 300. The bottom sequence is the cheY sequence (PDB code 3chy), which is an outlier (Fig. ), with very low sequence similarity but the same basic flavodoxin fold. Note that the PRALINE alignment shows reasonably matched secondary structure elements, also for the cheY sequence.
Literature Cited

Key References
   Dayhoff et al., 1978. See above.
   This atlas represents a seminal approach to sequence alignment. The evolutionary model that is still used today in most methods is introduced here, together with the now classical PAM series of amino acid exchange matrices. The evolutionary model is often referred to as the Dayhoff model, while the most widely used early matrix in the PAM series, the PAM250 matrix, is commonly known as the Dayhoff matrix.
   Felsenstein, 1981. See above.
   In this paper, the important evolutionary method of maximum likelihood is introduced. The method, which attempts to find the tree that maximizes the probability that the observed data will fit the tree under a given evolutionary model, is now generally accepted as the most accurate strategy.
   Hogeweg and Hesper, 1984. See above.
   This paper introduces the progressive multiple alignment strategy, which is still the most widely used multiple alignment technique. In this early paper, alignment iteration is already addressed. Another interesting feature of the paper is the use of so‐called internode sequences, which are additionally inferred sequences ancestral to subgroups of sequences in the phylogenetic tree calculated for the query sequence set.
   Needleman and Wunsch, 1970. See above.
   This is one of the most quoted early papers on sequence alignment. In this paper, the global dynamic programming algorithm is introduced to the biological community and applied to pairwise amino acid sequences. The basic dynamic programming algorithm had been conceived before by the physicist Richard Belman, who published a large series of papers and books on the topic during the 1950s and 60s.
   Smith and Waterman, 1981. See above.
   Following the approach by Needleman and Wunsch (), Smith and Waterman derived the now classical algorithm for local pair‐wise sequence alignment. Most widely used homology search engines such as BLAST are fast approximations of the Smith and Waterman algorithm and perform local alignment.
Internet Resources
