丁香实验_LOGO
登录
提问
我要登录
|免费注册
点赞
收藏
wx-share
分享

Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences

互联网

1596
  • Abstract
  • Table of Contents
  • Figures
  • Literature Cited

Abstract

 

RepeatMasker is a popular software tool widely used in computational genomics to identify, classify, and mask repetitive elements, including low?complexity sequences and interspersed repeats. RepeatMasker searches for repetitive sequence by aligning the input genome sequence against a library of known repeats, such as Repbase. Here, we describe two Basic Protocols that provide detailed guidelines on how to use RepeatMasker, either via the Web interface or command?line Unix/Linux system, to analyze repetitive elements in genomic sequences. Sequence comparisons in RepeatMasker are usually performed by the alignment program cross_match, which requires significant processing time for larger sequences. An Alternate Protocol describes how to reduce the processing time using an alternative alignment program, such as WU?BLAST. Further, the advantages, limitations, and known bugs of the software are discussed. Finally, guidelines for understanding the results are provided. Curr. Protoc. Bioinform. 25:4.10.1?4.10.14. © 2009 by John Wiley & Sons, Inc.

Keywords: RepeatMasker; genome annotation; repetitive elements; repeat library; cross_match; WU?BLAST; RECON

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Using RepeatMasker via the Web Interface
  • Basic Protocol 2: Using the Command‐Line Unix/Linux Version of RepeatMasker to Study Repetitive Elements in Genomic Sequences
  • Alternate Protocol 1: Running RepeatMasker with WU‐BLAST
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  •   Figure 4.10.1 Sequences with length >100 kb cannot be processed via the Web interface; user is informed by the RepeatMasker to consider alternate methods.
    View Image
  •   Figure 4.10.2 Web RepeatMasker result from an example run showing the repetitive elements annotations section, which lists cross_match summary lines; this result is available in Text File Format (A ) and XHTML format (B ). See and Table for explanation.
    View Image
  •   Figure 4.10.3 Web RepeatMasker result from an example run showing the Masked Sequence annotations section, which lists the repetitive elements masked sequences (replaced with Ns). See for explanation.
    View Image
  •   Figure 4.10.4 Web RepeatMasker result from an example run showing the Summary section, which summarizes and categorizes repetitive elements found in the query DNA sequence. See for explanation.
    View Image
  •   Figure 4.10.5 Alignments between query sequence and consensus repetitive elements are shown if the option Show Alignments is selected.
    View Image

Videos

Literature Cited

   Bao, Z. and Eddy, S.R. 2002. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 12:1269‐1276.
   Bedell, J.A., Korf, I., and Gish, W. 2000. MaskerAid: A performance enhancement to RepeatMasker. Bioinformatics 16:1040‐1041.
   Jurka, J. 2001. Repbase update, a database and an electronic journal of repetitive elements. Trends Genet. 16:418‐420.
   Jurka, J., Kapitonov, V.V., Pavlicek, A., Klonowski, P., Kohany, O., and Walichiewicz, J. 2005. Repbase update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110:462‐467.
   Price, A.L., Jones, N.C., and Pevzner, P.A. 2005. De novo identification of repeat families in large genomes. Bioinformatics 21:Suppl 1:i351‐358.
   Smith, T.F. and Waterman, M.S. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147:195‐197.
   Stein, L.D., Bao, Z., Blasiar, D., Blumenthal, T., Brent, M., Chen, N., Chinwalla, A., Clarke, L., Clee, C., Coghlan, A., Coulson, A., D'Eustachio, P., Fitch, D.H.A., Fulton, L., Fulton, R., Griffiths‐Jones, S., Harris, T.W., Hillier, L.W., Kamath, R., Kuwabara, P.E., Marra, M., Mardis, E., Miner, T., Minx, P., Mullikin, J.C., Plumb, R.W., Rogers, J., Schein, J., Sohrmann, M., Spieth, J., Stajich, J.E., Wei, C., Willey, D., Wilson, R., Durbin, R., and Waterston, R. 2003. The genome sequence of Caenorhabditis briggsae: A platform for comparative genomics. PLoS Biol. 1:E45.
Internet Resources
   http://www.repeatmasker.org/
   RepeatMasker Web server
   http://www.girinst.org/
   Repbase Update
   http://selab.janelia.org/recon.html
   RECON Web site
   http://bix.ucsd.edu/repeatscout/
   RepeatScout Web site
   http://www.phrap.org/consed/consed.html#howToGet
   cross_match Web site
   http://blast.wustl.edu/
   WU‐BLAST Web sites
   http://genome.ucsc.edu/cgi‐bin/hgGateway
   UCSC Genome Browser
   ftp://ftp.wormbase.org/pub/wormbase/genomes/elegans/sequences/dna/
   WormBaseFTP site
   http://www.repeatmasker.org/RepeatModeler.html
   RECON site, the newest version of RECON is available from the RepeatMasker
   http://www.bioperl.org
   BioPerl Web site
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library
 
提问
扫一扫
丁香实验小程序二维码
实验小助手
丁香实验公众号二维码
关注公众号
反馈
TOP
打开小程序