Creating Databases for Biological Information: An Introduction
互联网
- Abstract
- Table of Contents
- Figures
- Literature Cited
Abstract
The essence of bioinformatics is dealing with large quantities of information. Whether it be sequencing data, microarray data files, mass spectrometric data (e.g., fingerprints), the catalog of strains arising from an insertional mutagenesis project, or even large numbers of PDF files, there inevitably comes a time when the information can simply no longer be managed with files and directories. This is where databases come into play. This unit briefly reviews the characteristics of several database management systems, including flat file, indexed file, relational databases, and NoSQL databases. It compares their strengths and weaknesses and offers some general guidelines for selecting an appropriate database management system. Curr. Protoc. Bioinform. 42:9.1.1?9.1.10. © 2013 by John Wiley & Sons, Inc.
Keywords: bioinformatics; bioinformatics fundamentals; biological databases
Table of Contents
- Introduction
- DBMS Characteristics
- Choosing a DBMS
- RDBMSs
- Figures
Materials
Figures
-
Figure 9.1.1 A relational schema for protein sequences separates information in distinct tables to minimize redundancy. View Image -
Figure 9.1.2 A flat‐file representation of the same data will cause two proteins that share the same function of taxon to duplicate the information in “common_name,” “genus,” “species,” “go‐accession,” and “description.” View Image -
Figure 9.1.3 Two entries in the protein database represented in the NoSQL engine MongoDB. View Image