Bioinformatic Tools for Gene and Protein Sequence Analysis
The rapid development of efficient, automated DNA-sequencing methods has strongly advanced the genome-sequencing era, culminating in the determination of the entire human genome in 2001 (1 ,2 ). An enormous amount of DNA sequence data are available and databases still grow exponentially (see Fig. 1 ). Analysis of this overwhelming amount of data, including hundreds of genomes from both prokaryotes and eukaryotes, has given rise to the field of bioinformatics. Development of bioinformatic tools has evolved rapidly in order to identify genes that encode functional proteins or RNA. This is an important task, considering that even in the best studied bacterium Escherichia coli more than 30‰ of the identified open reading frames (ORFs) represent hypothetical genes with no known function. Future challenges of genome-sequence analysis will include the understanding of diseases, gene regulation, and metabolic pathway reconstruction. In addition, a set of methods for protein analysis summarized under the term proteomics holds tremendous potential for biomedicine and biotechnology (141 ). The large number of bioinformatic tools that have been made available to scientists during the last few years has presented the problem of which to use and how best to obtain scientifically valid answers (3 ). In this chapter, we will provide a guide for the most efficient way to analyze a given sequence or to collect information regarding a gene, protein, structure, or interaction of interest by applying current publicly available software and databases that mainly use the World Wide Web. All links to services or download sites are given in the text or listed in Table 1 ; the succession of tools is briefly summarized in Fig. 2 .