Data Mining as a Discovery Tool for Imprinted Genes
互联网
482
This chapter serves as an introduction to the collection of genome-wide sequence and epigenomic data, as well as the use of these data in training generalized linear models (glm) to predicted imprinted status. This is meant to be an introduction to the method, so only the most straightforward examples will be covered. For instance, the examples given below refer to 11 classes of genomic regions (the entire gene body, introns, exons, 5′ UTR, 3′ UTR, and 1, 10, and 100 kb upstream and downstream of each gene). One could also build models based on combinations of these regions. Likewise, models could be built on combinations of epigenetic features, or on combinations of both genomic regions and epigenetic features.
This chapter relies heavily on computational methods, including basic programming. However, this chapter is not meant to be an introduction to programming. Throughout the chapter, the reader will be provided with example code in the Perl programming language.