Loading and Preparing Data for Analysis in Spotfire
互联网
- Abstract
- Table of Contents
- Figures
- Literature Cited
Abstract
This unit strictly focuses on data preparation within Spotfire. Microarray data exist in a variety of formats, which often depend on the particular array technology and detection instruments used. The first protocols in this unit describe loading Affymetrix and GenePix data into Spotfire. Once the data are loaded, it is necessary to filter and preprocess the data prior to analysis. Subsequently, the data transformation and normalization techniques presented here, are critical to correctly performing powerful microarray data mining expeditions. These steps extract or enhance meaningful data characteristics and prepare the data for the application of certain analysis methods such as statistical tests to compute significance and clustering methods?which mostly require data to be normally distributed. The unit outlines several methods for normalizing the data within an experiment and between multiple experiments.
Table of Contents
- Basic Protocol 1: Uploading GenePix Data Into Spotfire
- Alternate Protocol 1: Uploading Affymetrix Text Data Into Spotfire
- Support Protocol 1: Filtering and Preprocessing Microarray Data
- Support Protocol 2: Log Transformation of Microarray Data
- Basic Protocol 2: Normalization of Microarray Data within an Experiment
- Basic Protocol 3: Normalization of Microarray Data Between Experiments
- Basic Protocol 4: Row Summarization
- Commentary
- Literature Cited
- Figures
Materials
Figures
-
Figure 7.8.1 Tools pane with the Import GenePix Files tab highlighted. View Image -
Figure 7.8.2 (A ) The Import Genepix Files dialog allows users to specify files to be uploaded into a Spotfire session. (B ) The Data Import Options allow users to chose all or any columns from the data set. View Image -
Figure 7.8.3 Tools pane with the Import Affymetrix v5 Files tab highlighted. View Image -
Figure 7.8.4 (A ) The Import Affymetrix Files dialog allows users to specify files to be uploaded into a Spotfire session. (B ) The Data Import Options allow users to chose all or any columns from the data set. View Image -
Figure 7.8.5 Guides pane with the Analyze Affymetrix absence/presence calls guide highlighted. View Image -
Figure 7.8.6 The data are binned on the basis of the number of times a particular Probe set was called Absent, Present, or Marginal, and presents a histogram to display the results. View Image -
Figure 7.8.7 The data generated from the use of the Affymetrix absence/presence guide is added to the Spotfire session as a new column and a new corresponding query device generated. View Image -
Figure 7.8.8 Query Device for a particular column of data can be modified from one type to another. View Image -
Figure 7.8.9 Clearing check box corresponding to “Binned_Present count 0‐1” alters the number of visible records (shown on the Activity Line). View Image -
Figure 7.8.10 The Filter Genes guide helps users to perform data preprocessing in a stepwise fashion. View Image -
Figure 7.8.11 The Filter Genes by Modulation guide bins data by the number of times a record (gene) crosses the specified threshold in the given experiments. View Image -
Figure 7.8.12 A new data column and a new query device are added to the Spotfire session, based on the Filter Genes>Modulation> p ‐value selection. View Image -
Figure 7.8.13 Clicking on the Filter Genes Guide allows users to perform preprocessing on GenePix data. View Image -
Figure 7.8.14 Preprocessing can be performed on GenePix data using the Flags or the SNR columns. View Image -
Figure 7.8.15 A new data column and a new query device are added to the Spotfire session, based on the Filter Genes>Modulation>Flags selection. View Image -
Figure 7.8.16 Clearing check box corresponding to Modulation by Flags column (category 6) alters the number of visible records (shown on the Activity Line). View Image -
Figure 7.8.17 The “Transform columns to log scale” guide allows the user to convert any numeric data column to its logarithm counterpart, allowing the user to chose log to base 2 or 10. View Image -
Figure 7.8.18 The Normalization dialog 1(2) allows the users to choose from several Normalization options. View Image -
Figure 7.8.19 The Normalization dialog 2(2) allows the users to choose Value column on which to perform Normalization and other variables. View Image -
Figure 7.8.20 The Row Summarization Tool is displayed. View Image -
Figure 7.8.21 Row Summarization dialog allows the users to chose the value columns on which to perform the summarization, as well as other variables such as which measure (e.g., Average, Standard Deviation) to use. View Image
Videos
Literature Cited
Literature Cited | |
Cheok, M.H., Yang, W., Pui, C.H., Downing, J.R., Cheng, C., Naeve, C.W., Relling, M.V., and Evans, W.E. 2003. Treatment‐specific changes in gene expression discriminate in vivo drug response in human leukemia cells. Nat. Genet. 34:85‐90. | |
Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. 1998. Cluster analysis and display of genome‐wide expression patterns. Proc. Natl. Acad. Sci. U.S.A. 95:14863‐14868. | |
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., and Lander, E.S. 1999. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531‐537. | |
Iyer, V.R., Eisen, M.B., Ross, D.T., Schuler, G., Moore, T., Lee, J.C., Trent, J.M., Staudt, L.M., Hudson, J. Jr., Boguski, M.S., Lashkari, D., Shalon, D., Botstein, D., and Brown, P.O. 1999. The transcriptional program in the response of human fibroblasts to serum. Science 283:83‐87. | |
Jolliffe, I.T. 1986. Principal Component Analysis. Springer Series in Statistics. Springer‐Verlag, New York. | |
Kerr, M.K. and Churchill, G.A. 2001. Experimental design for gene expression microarrays. Biostatistics 2:183‐201. | |
Kozal, M.J., Shah, N., Shen, N., Yang, R., Fucini, R., Merigan, T.C., Richman, D.D., Morris, D., Hubbell, E., Chee, M., and Gingeras, T.R. 1996. Extensive polymorphisms observed in HIV‐1 clade B protease gene using high‐density oligonucleotide arrays. Nat. Med. 2:753‐759. | |
Lee, T.I., Rinaldi, N.J., Robert, F., Odom, D.T., Bar‐Joseph, Z., Gerber, G.K., Hannett, N.M., Harbison, C.T., Thompson, C.M., Simon, I., Zeitlinger, J., Jennings, E.G., Murray, H.L., Gordon, D.B., Ren, B., Wyrick, J.J., Tagne, J.B., Volkert, T.L., Fraenkel, E., Gifford, D.K., and Young, R.A. 2002. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298:799‐804. | |
Leung, Y.F. and Cavalieri, D. 2003. Fundamentals of cDNA microarray data analysis. Trends Genet. 19:649‐659. | |
MacQueen, J. 1967. Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics and Probability 1967:281‐297. | |
Sankoff, D. and Kruskal, J.B. 1983. Time Warps, String Edits, and Macromolecules. The Theory and Practice of Sequence Comparison. Addison‐Wesley, Reading Mass. | |
Schena, M., Shalon, D., Davis, R.W., and Brown, P.O. 1995. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270:467‐470. | |
Schena, M., Heller, R.A., Theriault, T.P., Konrad, K., Lachenmeier, E., and Davis, R.W. 1998. Microarrays: Biotechnology's discovery platform for functional genomics. Trends Biotechnol. 16:301‐306. | |
Smyth, G.K. and Speed, T. 2003. Normalization of cDNA microarray data. Methods 31:265‐273. | |
Smyth, G.K., Yang, Y.H., and Speed, T. 2003. Statistical issues in cDNA microarray data analysis. Methods Mol. Biol. 224:111‐136. | |
Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., and Church, G.M. 1999. Systematic determination of genetic network architecture. Nat. Genet. 22:281‐285. | |
Yang, Y., Buckley, M.J., Dudoit, S., and Speed, T.R. 2002. Comparison of methods for image analysis on cDNA microarray data. J. Comp. Stat. 11:108‐136. | |
Yeoh, E.J., Ross, M.E., Shurtleff, S.A., Williams, W.K., Patel, D., Mahfouz, R., Behm, F.G., Raimondi, S.C., Relling, M.V., Patel, A., Cheng, C., Campana, D., Wilkins, D., Zhou, X., Li, J., Liu, H., Pui, C.H., Evans, W.E., Naeve, C., Wong, L., Downing, J.R. 2002. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. 2002. Cancer Cell 1:133‐143. |