Introduction to Cheminformatics
互联网
- Abstract
- Table of Contents
- Literature Cited
Abstract
Cheminformatics is a relatively new field of information technology that focuses on the collection, storage, analysis, and manipulation of chemical data. The chemical data of interest typically includes information on small molecule formulas, structures, properties, spectra, and activities (biological or industrial). Cheminformatics originally emerged as a vehicle to help the drug discovery and development process, however cheminformatics now plays an increasingly important role in many areas of biology, chemistry, and biochemistry. The intent of this unit is to give readers some introduction into the field of cheminformatics and to show how cheminformatics not only shares many similarities with the field of bioinformatics, but that it can also enhance much of what is currently done in bioinformatics.
Keywords: Cheminformatics; bioinformatics; chemical genomics; drug; chemical
Table of Contents
- The Intersection Between Cheminformatics and Bioinformatics
- Databases in Cheminformatics
- Database Searching in Cheminformatics
- Property Prediction in Cheminformatics
- Conclusion
- Acknowledgements
- Literature Cited
- Tables
Materials
Figures
Videos
Literature Cited
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI‐BLAST: A new generation of protein database search programs. Nucl. Acids Res. 25:3389‐3402. | |
Bairoch, A., Apweiler, R., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Natale, D.A., O'Donovan, C., Redaschi, N., and Yeh, L.S. 2005. The Universal Protein Resource (UniProt). Nucl. Acids Res. 33:D154‐D159. | |
Brooksbank, C., Cameron, G., and Thornton, J. 2005. The European Bioinformatics Institute's data resources: Towards systems biology. Nucl. Acids Res. 33:D46‐D53. | |
Brown, F.K. 1998. Chemoinformatics: What is it and how does it impact drug discovery. Ann. Rep. Med. Chem. 33:375‐384. | |
Caspi, R., Foerster, H., Fulcher, C.A., Hopkinson, R., Ingraham, J., Kaipa, P., Krummenacker, M., Paley, S., Pick, J., Rhee, S.Y., Tissier, C., Zhang, P., and Karp, P.D. 2006. MetaCyc: A multiorganism database of metabolic pathways and enzymes. Nucl. Acids Res. 34:D511‐D516. | |
Chen, X., Ji, Z.L., and Chen, Y.Z. 2002. TTD: Therapeutic Target Database. Nucl. Acids Res. 30:412‐415. | |
Dietmann, S., Park, J., Notredame, C., Heger, A., Lappe, M., and Holm, L. 2001. A fully automatic evolutionary classification of protein folds: Dali Domain Dictionary version 3. Nucl. Acids Res. 29:55‐57. | |
Feng, Z., Chen, L., Maddula, H., Akcan, O., Oughtred, R., Berman, H.M., and Westbrook, J. 2004. Ligand Depot: A data warehouse for ligands bound to macromolecules. Bioinformatics 20:2153‐2155. | |
Geldenhuys, W.J., Gaasch, K.E., Watson, M., Allen, D.D., and Van der Schyf, C.J. 2006. Optimizing the use of open‐source software applications in drug discovery. Drug Discov. Today 11:127‐132. | |
Gibrat, J.F., Madej, T., and Bryant, S.H. 1996. Surprising similarities in structure comparison. Curr. Opin. Struct. Biol. 6:377‐385. | |
Golovin, A., Dimitropoulos, D., Oldfield, T., Rachedi, A., and Henrick, K. 2005. MSDsite: A database search and retrieval system for the analysis and viewing of bound ligands and active sites. Proteins 58:190‐199. | |
Halgren, T.A., Murphy, R.B., Friesner, R.A., Beard, H.S., Frye, L.L., Pollard, W.T., and Banks, J.L. 2004. Glide: A new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. J. Med. Chem. 47:1750‐1759. | |
Hansch, C. and Zhang, L. 1993. Quantitative structure‐activity relationships of cytochrome P‐450. Drug Metab. Rev. 25:1‐48. | |
Hewett, M., Oliver, D.E., Rubin, D.L., Easton, K.L., Stuart, J.M., Altman, R.B., and Klein, T.E. 2002. PharmGKB: The Pharmacogenetics Knowledge Base. Nucl. Acids Res. 30:163‐165. | |
Hou, T.J. and Xu, X.J. 2003. ADME evaluation in drug discovery. 3. Modeling blood‐brain barrier partitioning using simple molecular descriptors. J. Chem. Inf. Comput. Sci. 43:2137‐2152. | |
Ihlenfeldt, W.D., Voigt, J.H., Bienfait, B., Oellien, F., and Nicklaus, M.C. 2002. Enhanced CACTVS browser of the Open NCI Database. J. Chem. Inf. Comput. Sci. 42:46‐57. | |
Irwin, J.J. and Shoichet, B.K. 2005. ZINC‐a free database of commercially available compounds for virtual screening. J. Chem. Inf. Model. 45:177‐182. | |
Kanehisa, M., Goto, S., Hattori, M., Aoki‐Kinoshita, K.F., Itoh, M., Kawashima, S., Katayama, T., Araki, M., and Hirakawa, M. 2006. From genomics to chemical genomics: New developments in KEGG. Nucl. Acids Res. 34:D354‐D357. | |
Kramer, B., Rarey, M., and Lengauer, T. 1997. CASP2 experiences with docking flexible ligands using FlexX. Proteins Suppl. 1:221‐225. | |
Milne, G.W.A., Nicklaus, M.C., Driscoll, J.S., Wang, S., and Zaharevitz, D. 1994. The NCI Drug Information System 3D Database. J. Chem. Inf. Comput. Sci. 34:1219‐1224. | |
Mishra, G.R., Suresh, M., Kumaran, K., Kannabiran, N., Suresh, S., Bala, P., Shivakumar, K., Anuradha, N., Reddy, R., Raghavan, T.M., Menon, S. Hanumanthu, G., Gupta, M., Upendran, S., Gupta, S., Mahesh, M., Jacob, B., Mathew, P., Chatterjee, P., Arun, K.S., Sharma, S., Chandrika, K.N., Deshpande, N., Palvankar, K., Raghavnath, R., Krishnakanth, R., Karathia, H., Rekha, B., Nayak, R., Vishnupriya, G., Kumar, H.G., Nagini, M., Kumar, G.S., Jose, R., Deepthi, P., Mohan, S.S., Gandhi, T.K., Harsha, H.C., Deshpande, K.S., Sarker, M., Prasad, T.S., and Pandey, A. 2006. Human protein reference database‐2006 update. Nucl. Acids Res. 34:D411‐D414. | |
Needleman, S.B. and Wunsch, C.D. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48:443‐453. | |
O'Donovan, C., Martin, M.J., Gattiker, A., Gasteiger, E., Bairoch, A., and Apweiler, R. 2002. High‐quality protein knowledge resource: SWISS‐PROT and TrEMBL. Brief. Bioinformatics 3:275‐284. | |
Rebhan, M., Chalifa‐Caspi, V., Prilusky, J., and Lancet, D. 1998. GeneCards: A novel functional genomics compendium with automated data mining and query reformulation support. Bioinformatics 14:656‐664. | |
Sadowski, J. and Gasteiger, J. 1993. From atoms to bonds to three‐dimensional atomic coordinates: Automatic model builders. Chem. Rev. 93:2567‐2581. | |
Schlotterbeck, G., Ross, A., Dieterle, F., and Senn, H. 2006. Metabolic profiling technologies for biomarker discovery in biomedicine and drug development. Pharmacogenomics 7:1055‐1075. | |
Schnackenberg, L.K. and Beger, R.D. 2006. Monitoring the health to disease continuum with global metabolic profiling and systems biology. Pharmacogenomics 7:1077‐1086. | |
Shindyalov, I.N. and Bourne, P.E. 2001. A database and tools for 3‐D protein structure comparison and alignment using the Combinatorial Extension (CE) algorithm. Nucl. Acids Res. 29:228‐229. | |
Shoichet, B.K. and Kuntz, I.D. 1993. Matching chemistry and shape in molecular docking. Protein Eng. 6:723‐732. | |
Tetko. I.V. 2003. The WWW as a tool to obtain molecular parameters. Mini Rev. Med. Chem. 3:809‐820. | |
Ullman, J.R. 1976. An algorithm for sub‐graph isomorphism. J. ACM 23:31‐42. | |
Van de Waterbeemd, H. and De Groot, M. 2002. Can the Internet help to meet the challenges in ADME and e‐ADME? SAR QSAR Environ. Res. 13:391‐401. | |
Voigt, J.H., Bienfait, B., Wang, S., and Nicklaus, M.C. 2001. Comparison of the NCI open database with seven large chemical structural databases. J. Chem. Inf. Comput. Sci. 41:702‐712. | |
Weininger, D. 1988. SMILES 1. Introduction and Encoding Rules. J. Chem. Inf. Comput. Sci. 28:31‐38. | |
Westbrook, J., Feng, Z., Jain, S., Bhat, T.N., Thanki, N., Ravichandran, V., Gilliland, G.L., Bluhm, W., Weissig, H., Greer, D.S., Bourne, P.E., and Berman, H.M. 2002. The Protein Data Bank: Unifying the archive. Nucl. Acids Res. 30:245‐248. | |
Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio, M., Edgar, R., Federhen, S., Geer, L.Y., Helmberg, W., Kapustin, Y., Kenton, D.L., Khovayko, O., Lipman, D.J., Madden, T.L., Maglott, D.R., Ostell, J., Pruitt, K.D., Schuler, G.D., Schriml, L.M., Sequeira, E., Sherry, S.T., Sirotkin, K., Souvorov, A., Starchenko, G., Suzek, T.O., Tatusov, R., Tatusova, T.A., Wagner, L., and Yaschenko, E. 2006. Database resources of the National Center for Biotechnology Information. Nucl. Acids Res. 34:D173‐D180. | |
Wishart, D.S., Knox, C., Guo, A., Shrivastava, S., Hassanali, M., Stothard, P., and Woolsey, J. 2006. DrugBank: A comprehensive resource for in silico drug discovery and exploration. Nucl. Acids Res. 34:D668‐D672. | |
Wishart, D. S., Tzur, D., Knox, C., Eisner, R., Guo, A. C., Young, N., Cheng, D., Jewell, K., Arndt, D., Sawhney, S., Fung, C., Nikolai, L., Lewis, M., Coutouly, M. A., Forsythe, I., Tang, P., Shrivastava, S., Jeroncic, K., Stothard, P., Amegbey, G., Block, D., Hau, D. D., Wagner, J., Miniaci, J., Clements, M., Gebremedhin, M., Guo, N., Zhang, Y., Duggan, G. E., Macinnis, G. D., Weljie, A. M., Guo, N., Dowlatabadi, R., Bamforth, F., Clive, D., Greiner, R., Li, L., Marrie, T., Sykes, B. D., Vogel, H. J., Querengesser, L., 2007. HMDB:the Human Metabolome Database. Nucl. Acids Res 35:D521‐526. | |
Yang, X., Parker, D., Whitehead, L., Ryder, N.S., Weidmann, B., Stabile‐Harris, M., Kizer, D., McKinnon, M., Smellie, A., and Powers, D. 2006. A collaborative hit‐to‐lead investigation leveraging medicinal chemistry expertise with high throughput library design, synthesis and purification capabilities. Comb. Chem. High Throughput Screen. 9:123‐130. | |
Key References | |
Doucet, J‐P. and Weber, J. 1996. Computer‐Aided Molecular Design: Theory and Applications. Academic Press, London. | |
An excellent introduction to the concepts and algorithms used in drug design and molecular modeling. This textbook covers methods and tools for both proteins and small molecule chemicals. Don't let the date be deceiving. | |
Jonsdottir, S.O., Jorgensen, F.S., and Brunak, S. 2005. Prediction methods and databases within chemoinformatics: Emphasis on drugs and drug candidates. Bioinformatics 21:2145‐2160. | |
A superb review, with a nice summary of both open source and commercial databases. This review also provides useful assessments and descriptions of chemical property prediction and drug metabolism software. | |
Geldenhuys, W.J., Gaasch, K.E., Watson, M., Allen, D.D., and Van der Schyf, C.J. 2006. Optimizing the use of open‐source software applications in drug discovery. Drug Discov. Today 11:127‐132. | |
A very current and very readable review of open‐source software and databases, with a special emphasis on their applications to drug discovery. | |
Wishart, D.S. 2005. Bioinformatics in drug development and assessment. Drug Metab. Rev. 37:279‐310. | |
This review touches on a number of the topics introduced in this section in somewhat more detail. The focus is more on predicting drug metabolism and drug toxicology. It is a good complement to the Jonsdottir et al. () paper. | |
Internet Resources | |
http://www.pharmabase.org | |
Pharmabase is a cellular physiology and pharmacology database. | |
http://www.ccdc.cam.ac.uk | |
The Cambridge Structure Database contains the 3‐D coordinates of chemical structures that have been experimentally determined. | |
http://cactus.nci.nih.gov/services/translate/ | |
Cactus online Converter can take stick figure diagrams (MOL and SDF files) or SMILES strings and generate high quality 3‐D coordinates in PDB file format. | |
http://cactus.nci.nih.gov/ncidb2/download.html | |
Web site containing downloadable structure files of NCI Open database compounds. | |
http://iris12.colby.edu/∼www/sconv.cgi | |
Web site for Molecular Structure File Converter, which facilitates conversion between MOL, SDF, PDB, SMILES, and InChI formats. | |
http://cactus.nci.nih.gov/services/translate/ | |
Cactus Structure File Converter, which facilitates conversion between MOL, SDF, PDB, SMILES, and InChI formats. | |
http://inchi.info/converter_en.html | |
InChI converter, used to facilitate conversion between MOL, SDF, PDB, SMILES, and InChI formats. | |
http://www.actelion.com/uninet/www/www_main_p.nsf/Content/Technologies+Property+Explorer | |
Web site for the Actelion Property explorer a Web‐enabled Java applet that allows users to draw chemical structures and then rapidly calculate various drug‐related properties. | |
http://preadmet.bmdrc.org/preadmet/index.php | |
Web site for Pre‐ADMET, which offers a wide range of ADME and toxicological property calculations for any submitted chemical compound. |