Identification of Novel and Known miRNAs in Deep‐Sequencing Data with miRDeep2

互联网2013-12-31

1300

Abstract
Table of Contents
Figures
Literature Cited

Abstract

miRNAs comprise an abundant class of small non?coding RNAs that play important roles in a wide range of biological processes by post?transcriptional regulation of a large fraction of animal genes. High?throughput sequencing machines and the availability of completely sequenced genomes make it possible to reliably identify miRNAs with computational methods. This unit documents how to use the miRDeep2 software package to identify novel and known microRNAs in small RNA deep?sequencing data. Moreover, the usage of miRDeep2 to profile miRNA expression across samples is illustrated. Curr. Protoc. Bioinform. 36:12.10.1?12.10.15. © 2011 by John Wiley & Sons, Inc.

Keywords: miRDeep2; miRNA; microRNA; gene prediction; sequencing; expression; profiling

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Introduction
Basic Protocol 1: Using the Mapper Module to Preprocess and Map Raw Illumina Deep Sequencing Data
Basic Protocol 2: Using the Quantifier Module to Profile miRNA Expression Across Distinct Samples
Basic Protocol 3: Using the miRDeep2 Module to Identify Novel and Known miRNAs in Deep‐Sequencing Data
Support Protocol 1: Installing the miRDeep2 Package
Guidelines for Understanding Results
Commentary
Literature Cited
Figures

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Figures

Figure 12.10.1 Flowchart of the miRDeep2 package. Data preprocessed by the mapper module can directly be input to the quantifier module for miRNA expression profiling and the miRDeep2 module for identification of novel and known miRNAs. Output from the quantifier module can also be used as additional input for the miRDeep2 module.

View Image

Figure 12.10.2 (A ) An example line of a typical qseq.txt file produced by an Illumina sequencing machine. (B ) Excerpt of a fasta file produced after preprocessing a raw sequencing file with sequences coming from two distinct samples. The identifier contains the information of the sample of origin, a running number and the number of occurrences of this sequence in the sample. (C ) Typical output lines in an arf file that contains the information about read mappings.

View Image

Figure 12.10.3 An html result file produced by the quantifier module. Raw read counts and normalized read counts (in parentheses) are shown for each sample and miRNA analyzed.

View Image

Figure 12.10.4 pdf output produced by the quantifier module. It shows the number of reads that map to mature and star sequences, the precursor‐miRNA secondary structure, a density plot of reads that mapped to the precursor and the read mappings itself for each sample.

View Image

Figure 12.10.5 Typical miRDeep2 result html file. (A ) Overview of input files used by miRDeep2. (B ) Survey of miRDeep2 performance for different score cut‐offs. Column definitions: miRDeep2 score, minimal miRDeep2 score for a precursor to be included in the table; predicted by miRDeep2, total number of predicted precursors; estimated false positives, number of estimated false positive predictions from all predictions; estimated true positives, number of estimated true positive predictions from all predictions; in species, number of mature miRNAs in file mature_ref_miRNAs.fa (see above); in data, number of mature miRNAs that have at least one sequencing read mapped to them; detected by miRDeep2, number of mature miRNAs that are found in the predicted precursors; estimated signal‐to‐noise, quotient of total miRNAs identified/total estimated false positives; excision gearing, minimum number of reads that were necessary to excise a precursor from the genome.

View Image

Figure 12.10.6 Tables listing the identified novel (upper) and known (lower) miRNAs in deep sequencing data. Column definitions: provisional Id, Id assigned to the predicted hairpin; miRDeep2 score, score assigned to the predicted hairpin; estimated probability that the miRNA candidate is a true positive, probability for being a true positive candidate; rfam alert, indicates if parts of the candidate match with small RNA entries in the Rfam database; total read count, number of reads that mapped to the precursor; mature read count, number of reads that mapped to the mature sequence; loop read count, number of reads that mapped to the loop sequence; star read count, number of reads that mapped to the star sequence; significant randfold p ‐value, indicates if the randfold p ‐value is lower than 0.05; miRBase miRNA, the name of a mature miRNA of the reference species in miRBase that matches the precursor sequence predicted by miRDeep2; example miRBase miRNA with the same seed, the name of a mature miRNA of a related species in miRBase that has the same seed sequence as the candidates mature sequence; UCSC browser, link to a blat search of the precursor sequence at UCSC; NCBI blastn, link to a blastn search at NCBI; consensus mature sequence, mature consensus sequence of the precursor inferred by miRDeep2; consensus star sequence, star consensus sequence of the precursor inferred by miRDeep2; consensus precursor sequence, precursor consensus sequence inferred by miRDeep2; precursor coordinate, coordinates of the excised precursor in the genome file supplied to miRDeep2.

View Image

Figure 12.10.7 pdf file example of an identified novel miRNA. (A ) Survey of different score contributions that made up the final miRDeep2 score and the folded precursor‐miRNA. Nucleotides in red designate the mature miRNA, nucleotides in yellow designate the loop region, and nucleotides in violet indicate the star miRNA sequence. (B ) Density plot of reads that mapped to the precursor‐miRNA. Colors have the same meaning as in A. (C ) miRNA‐precursor sequence including some flanking region with the color codes as described above and a tag called “obs.” This means that the mature, loop, and star sequence were all estimated by miRDeep2 based on the sequencing reads. Below follows the same sequence again with a different color for the star sequence and a tag called “exp.” This time the star sequence is assigned to be in the expected position to the mature sequence and is sometimes different to the star sequence identified by miRDeep2. Then follows a line with the folding structure of the precursor in dot bracket notation produced by the RNAfold tool. After that comes the so called read signature of the miRNA precursor. This illustrates at which position in the precursor a particular read was aligned to. On the right side you can find the abundance of this read in the sequencing data, the number of mismatches in the read mapping to the precursor sequence and the sample from which the read is coming from.

View Image

Videos

Literature Cited

	Bonnet, E., Wuyts, J., Rouze, P., and Van de Peer, Y. 2004. Evidence that microRNA precursors, unlike other non‐coding RNAs, have lower folding free energies than random sequences. Bioinformatics 20:2911‐2917.
	Friedländer, M.R., Chen, W., Adamidi, C., Maaskola, J., Einspanier, R., Knespel, S., and Rajewsky, N. 2008. Discovering microRNAs from deep sequencing data using miRDeep. Nat. Biotechnol. 26:407‐415.
	Friedländer, M.R., Mackowiak, S.D., Li, N., Chen, W., and Rajewsky, N. 2011. miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucleic Acids Res. [Epub ahead of print]
	Griffiths‐Jones, S., Moxon, S., Marshall, M., Khanna, A., Eddy, S.R., and Bateman, A. 2005. Rfam: Annotating non‐coding RNAs in complete genomes. Nucleic Acids Res. 33:D121‐D124.
	Griffiths‐Jones, S., Grocock, R.J., van Dongen, S., Bateman, A., and Enright, A.J. 2006. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 34:D140‐D144.
	Hackenberg, M., Sturm, M., Langenberger, D., Falcon‐Perez, J.M., and Aransay, A.M. 2009. miRanalyzer: A microRNA detection and analysis tool for next‐generation sequencing experiments. Nucleic Acids Res. 37:W68‐W76.
	Hendrix, D., Levine, M., and Shi, W. 2010. miRTRAP, a computational method for the systematic identification of miRNAs from high throughput sequencing data. Genome Biol. 11:R39.
	Hofacker, I.L. 2003. Vienna RNA secondary structure server. Nucleic Acids Res. 31:3429‐3431.
	Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., and Haussler, D. 2002. The human genome browser at UCSC. Genome Res. 12:996‐1006.
	Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. 2009. Ultrafast and memory‐efficient alignment of short DNA sequences to the human genome. Genome Biol. 10:R25.
	Lee, R.C., Feinbaum, R.L., and Ambros, V. 1993. The C. elegans heterochronic gene lin‐4 encodes small RNAs with antisense complementarity to lin‐14. Cell 75:843‐854.
	Mathelier, A. and Carbone, A. 2010. MIReNA: Finding microRNAs with high accuracy and no learning at genome scale and from deep sequencing data. Bioinformatics 26:2226‐2234.
	Winter, J., Jung, S., Keller, S., Gregory, R.I., and Diederichs, S. 2009. Many roads to maturity: microRNA biogenesis pathways and their regulation. Nat. Cell Biol. 11:228‐234.
Key References
	Friedländer et al., 2008. See above.
	Describes the first version of the software package with information of how the algorithm works in detail.
Internet Resources
	http://www.mdc‐berlin.de/en/research/research_teams/systems_biology_of_gene_regulatory_elements/projects/miRDeep/index.html
	Site at which the miRDeep2 software can be downloaded.