Multiple Sequence Alignment Using ClustalW and ClustalX

互联网2013-12-31

2684

Abstract
Table of Contents
Materials
Figures
Literature Cited

Abstract

The Clustal programs are widely used for carrying out automatic multiple alignment of nucleotide or amino acid sequences. The most familiar version is ClustalW, which uses a simple text menu system that is portable to more or less all computer systems. ClustalX features a graphical user interface and some powerful graphical utilities for aiding the interpretation of alignments and is the preferred version for interactive usage. Users may run Clustal remotely from several sites using the Web or the programs may be downloaded and run locally on PCs, Macintosh, or Unix computers. The protocols in this unit discuss how to use ClustalX and ClustalW to construct an alignment, and create profile alignments by merging existing alignments.

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Basic Protocol 1: Using ClustalW and ClustalX to Do Multiple Alignments
Alternate Protocol 1: Using ClustalW and ClustalX for Profile Alignments
Support Protocol 1: Obtaining the ClustalW and ClustalX Programs
Guidelines for Understanding Results
Commentary
Figures

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Using ClustalW and ClustalX to Do Multiple Alignments

Necessary Resources

Hardware
- Unix (including Linux) workstation (e.g., Sun, Alpha, Silicon Graphics, PC), PC with MS Windows, or Power Macintosh

Software
- ClustalW or ClustalX program (see protocol 3 )

Files
- Sequences can be input to both ClustalW and ClustalX in one of seven file formats. All sequences must be in the same file. The formats that are automatically recognized are: NBRF/PIR, EMBL/Swiss‐Prot, Pearson ( FASTA ; appendix 1B ), Clustal, GCG/MSF, GCG9/RSF, and GDE flat file. The sequences must be all nucleotide or all amino acid, and the program will attempt to guess which by the composition of the letters. Upper‐ or lowercase can be used and most symbols and numbers will be ignored (removed); unrecognized residues will be counted as X or N.
If using a word processor to prepare the input file, save the file as plain text with line breaks—i.e., as a simple ASCII file. ClustalX cannot deal with native word processor formats.

Alternate Protocol 1: Using ClustalW and ClustalX for Profile Alignments

Necessary Resources

Hardware
- Unix (including Linux) workstation (e.g., Sun, Alpha, Silicon Graphics, PC), PC with MS Windows, or Power Macintosh

Software
- ClustalW or ClustalX program (see protocol 3 )

Files
- Sequences and existing alignments can be input to both ClustalW and ClustalX in one of seven file formats. All sequences must be in the same file. The formats that are automatically recognized are: NBRF/PIR, EMBL/Swiss‐Prot, Pearson ( FASTA ; appendix 1B ), Clustal, GCG/MSF, GCG9/RSF, and GDE flat file. In the examples here, unaligned sequences are in FASTA format and existing alignments are in Clustal and GCG/MSF formats.

Support Protocol 1: Obtaining the ClustalW and ClustalX Programs

Necessary Resources

Hardware
- Unix (including Linux) workstation (Sun, Alpha, Silicon Graphics, PC), PC with either MS‐DOS or MS Windows, Power Macintosh, or any other computer supporting a C compiler

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Figures

Figure 2.3.1 The ClustalX window on a Unix workstation before any sequences are loaded.

View Image
Figure 2.3.2 The input file selection window for ClustalX.

View Image
Figure 2.3.3 ClustalX with five loaded but unaligned sequences.

View Image

Figure 2.3.4 Changing the format of the multiple alignment output in ClustalX. Clustal format is the default.

View Image

Figure 2.3.5 Selecting the names for the output files for the dendrogram (1wit.dnd is offered as the default) and the multiple alignment (1wit.aln is the default) for an input file called 1wit.

View Image

Figure 2.3.6 ClustalX after a multiple alignment has been carried out on the five sequences. The alignment has been written to a text file which can be used for further analysis. The user can also choose to analyse this alignment further within ClustalX (e.g., to calculate a phylogenetic tree).

View Image

Figure 2.3.7 The windows containing the buttons and (default) settings for the pairwise alignment parameters (left) and the multiple alignment parameters (right).

View Image

Figure 2.3.8 Producing a new multiple alignment (1wit.aln) using an old guide tree file (1wit.dnd).

View Image

Figure 2.3.9 Window displayed upon selecting the Show Low Scoring Segments option from the Quality menu.

View Image

Figure 2.3.10 The Save As menu from ClustalX which is used to save an alignment after it is produced. Alignments are written to output files by default anyway, but this option allows users to save the output afterwards, perhaps in a different format. The full alignment is saved by default; here the user has chosen to save residues 10 to 55.

View Image

Figure 2.3.11 The PostScript output menu from ClustalX. This is used to save the colored alignment with or without some of the ornamentation in the window.

View Image

Figure 2.3.12 ClustalX in profile alignment mode before any sequences or profiles are loaded. The two empty windows will hold the two profiles (existing alignments) or groups of sequences.

View Image

Figure 2.3.13 ClustalX in profile alignment mode after the first profile (a five‐sequence alignment) has been loaded (only three are visible in scrollable window).

View Image

Figure 2.3.14 ClustalX in profile alignment mode with both profiles loaded. Alignment was based on secondary structure superposition and manually adjusted.

View Image

Figure 2.3.15 Window displayed upon loading a profile with a structure mask in Profile Alignment Mode.

View Image

Figure 2.3.16 The default file names for the output files from the profile alignment.

View Image

Figure 2.3.17 The two profiles after they have been aligned together. They are still in separate windows but have been locked together by pressing the Lock Scroll button. They are moved together by the single scroll bar at the bottom of the screen.

View Image

Figure 2.3.18 The final profile alignment can be viewed in a single window by reverting back to Multiple Alignment Mode (from Profile Alignment Mode).

View Image

Figure 2.3.19 A sample text output file (x.aln) showing the alignment (obtained with default parameters) of seven globin sequences. The stars, dots and colons below the alignment indicate degree of conservation in the columns.

View Image

Figure 2.3.20 Dendrogram of the alignment shown in Figure .

View Image

Videos

Literature Cited

Literature Cited
	Doolittle, R.F. 1986. Of URFs and ORFs: A primer on how to analyze derived amino acid sequences. University Science Books, Mill Valley, Ca.
	Felsenstein, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783‐791.
	Feng, D.‐F. and Doolittle, R.F. 1987. Progressive sequence alignment as a pre‐requisite to correct phylogenetic trees. J. Mol. Evol. 25:351‐360.
	Gotoh, O. 1982. An improved algorithm for matching biological sequences. J. Mol. Biol. 162:705‐708.
	Gribskov, M., McLachlan, A.D., and Eisenberg, D. 1987. Profile analysis: Detection of distantly related proteins. Proc. Natl. Acad. Sci. U.S.A. 84:4355‐4358.
	Higgins, D.G. and Sharp, P.M. 1988. CLUSTAL: A package for performing multiple sequence alignments on a microcomputer. Gene 73:237‐244.
	Higgins, D.G. and Sharp, P.P. 1989. Fast and sensitive multiple sequence alignments on a microcomputer. CABIOS 5:151‐153.
	Higgins, D.G., Bleasby, A.J., and Fuchs, R. 1992. CLUSTAL V: Improved software for multiple sequence alignment. Comp. Appl. Biosci. 8:189‐191.
	Hogeweg, P. and Hesper, B. 1984. The alignment of sets of sequences and the construction of phyletic trees: an integrated method. J. Mol. Evol. 20:175‐186.
	Myers, E.W. and Miller, W. 1988. Optimal alignments in linear space. CABIOS 4:11‐17.
	Needleman, S.B. and Wunsch, C.D. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48:443‐453.
	Pearson, W.R. 2000. Flexible sequence similarity searching with the FASTA3 program package. Methods Mol. Biol. 132:185‐219.
	Rost, B. 1999. Twilight zone of protein sequence alignments. Protein Eng. 12:85‐94.
	Saitou, N. and Nei, M. 1987. The neighbor‐joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406‐425.
	States, D.J., Gish, W., and Altschul, S.F. 1991. Improved sensitivity of nucleic acid database searches using application‐specific scoring matrices. Methods 3:66‐70.
	Taylor, WR. 1988. A flexible method to align large numbers of biological sequences. J. Mol. Evol. 28:161‐169.
	Thompson, J.D., Higgins, D.G., and Gibson, T.J. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position‐specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673‐4680.
	Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., and Higgins, D.G. 1997. The CLUSTAL_X windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25:4876‐4882.
	Thompson, J.D., Plewniak, F., and Poch, O. 1999. A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. 27:2682‐2690.
	Wilbur, W.J. and Lipman, D.J. 1983. Rapid similarity searches of nucleic acid and protein data banks. Proc. Natl. Acad. Sci. U.S.A. 80:726‐730.
Key References
	Jeanmougin, F., Thompson, J.D., Gouy, M., Higgins, D.G., and Gibson, T.J. 1998. Multiple sequence alignment with ClustalX. Trends Biochem Sci. 23:403‐405.
	Both of these articles give extensive background and descriptive details as to what exactly happens when you try to use Clustal and what all of the parameters mean. They are intended for a lay, nontechnical audience.
	Higgins, D.G., Thompson, J.D., and Gibson, T.J. 1996. Using CLUSTAL for multiple sequence alignments. Methods Enzymol. 266:383‐402.
Internet Resources
	http://www‐igbmc.u‐strasbg.fr/BioInfo/ClustalX/Top.html
	Get information on or download ClustalX.
	http://www.ebi.ac.uk/clustalw/
	Run ClustalW at the EBI using the Web.
	http://cmgm.stanford.edu/phylip/
	PHYLIP (Phylogeny Inference Package) version 3.5c., by J. Felsenstein. Department of Genetics, University of Washington, Seattle.