丁香实验_LOGO
登录
提问
我要登录
|免费注册
点赞
收藏
wx-share
分享

Loading and Preparing Data for Analysis in Spotfire

互联网

866
  • Abstract
  • Table of Contents
  • Figures
  • Literature Cited

Abstract

 

This unit strictly focuses on data preparation within Spotfire. Microarray data exist in a variety of formats, which often depend on the particular array technology and detection instruments used. The first protocols in this unit describe loading Affymetrix and GenePix data into Spotfire. Once the data are loaded, it is necessary to filter and preprocess the data prior to analysis. Subsequently, the data transformation and normalization techniques presented here, are critical to correctly performing powerful microarray data mining expeditions. These steps extract or enhance meaningful data characteristics and prepare the data for the application of certain analysis methods such as statistical tests to compute significance and clustering methods?which mostly require data to be normally distributed. The unit outlines several methods for normalizing the data within an experiment and between multiple experiments.

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: Uploading GenePix Data Into Spotfire
  • Alternate Protocol 1: Uploading Affymetrix Text Data Into Spotfire
  • Support Protocol 1: Filtering and Preprocessing Microarray Data
  • Support Protocol 2: Log Transformation of Microarray Data
  • Basic Protocol 2: Normalization of Microarray Data within an Experiment
  • Basic Protocol 3: Normalization of Microarray Data Between Experiments
  • Basic Protocol 4: Row Summarization
  • Commentary
  • Literature Cited
  • Figures
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  •   Figure Figure 7.8.1 Tools pane with the Import GenePix Files tab highlighted.
    View Image
  •   Figure Figure 7.8.2 (A ) The Import Genepix Files dialog allows users to specify files to be uploaded into a Spotfire session. (B ) The Data Import Options allow users to chose all or any columns from the data set.
    View Image
  •   Figure Figure 7.8.3 Tools pane with the Import Affymetrix v5 Files tab highlighted.
    View Image
  •   Figure Figure 7.8.4 (A ) The Import Affymetrix Files dialog allows users to specify files to be uploaded into a Spotfire session. (B ) The Data Import Options allow users to chose all or any columns from the data set.
    View Image
  •   Figure Figure 7.8.5 Guides pane with the Analyze Affymetrix absence/presence calls guide highlighted.
    View Image
  •   Figure Figure 7.8.6 The data are binned on the basis of the number of times a particular Probe set was called Absent, Present, or Marginal, and presents a histogram to display the results.
    View Image
  •   Figure Figure 7.8.7 The data generated from the use of the Affymetrix absence/presence guide is added to the Spotfire session as a new column and a new corresponding query device generated.
    View Image
  •   Figure Figure 7.8.8 Query Device for a particular column of data can be modified from one type to another.
    View Image
  •   Figure Figure 7.8.9 Clearing check box corresponding to “Binned_Present count 0‐1” alters the number of visible records (shown on the Activity Line).
    View Image
  •   Figure Figure 7.8.10 The Filter Genes guide helps users to perform data preprocessing in a stepwise fashion.
    View Image
  •   Figure Figure 7.8.11 The Filter Genes by Modulation guide bins data by the number of times a record (gene) crosses the specified threshold in the given experiments.
    View Image
  •   Figure Figure 7.8.12 A new data column and a new query device are added to the Spotfire session, based on the Filter Genes>Modulation> p ‐value selection.
    View Image
  •   Figure Figure 7.8.13 Clicking on the Filter Genes Guide allows users to perform preprocessing on GenePix data.
    View Image
  •   Figure Figure 7.8.14 Preprocessing can be performed on GenePix data using the Flags or the SNR columns.
    View Image
  •   Figure Figure 7.8.15 A new data column and a new query device are added to the Spotfire session, based on the Filter Genes>Modulation>Flags selection.
    View Image
  •   Figure Figure 7.8.16 Clearing check box corresponding to Modulation by Flags column (category 6) alters the number of visible records (shown on the Activity Line).
    View Image
  •   Figure Figure 7.8.17 The “Transform columns to log scale” guide allows the user to convert any numeric data column to its logarithm counterpart, allowing the user to chose log to base 2 or 10.
    View Image
  •   Figure Figure 7.8.18 The Normalization dialog 1(2) allows the users to choose from several Normalization options.
    View Image
  •   Figure Figure 7.8.19 The Normalization dialog 2(2) allows the users to choose Value column on which to perform Normalization and other variables.
    View Image
  •   Figure Figure 7.8.20 The Row Summarization Tool is displayed.
    View Image
  •   Figure Figure 7.8.21 Row Summarization dialog allows the users to chose the value columns on which to perform the summarization, as well as other variables such as which measure (e.g., Average, Standard Deviation) to use.
    View Image

Videos

Literature Cited

Literature Cited
   Cheok, M.H., Yang, W., Pui, C.H., Downing, J.R., Cheng, C., Naeve, C.W., Relling, M.V., and Evans, W.E. 2003. Treatment‐specific changes in gene expression discriminate in vivo drug response in human leukemia cells. Nat. Genet. 34:85‐90.
   Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. 1998. Cluster analysis and display of genome‐wide expression patterns. Proc. Natl. Acad. Sci. U.S.A. 95:14863‐14868.
   Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., and Lander, E.S. 1999. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531‐537.
   Iyer, V.R., Eisen, M.B., Ross, D.T., Schuler, G., Moore, T., Lee, J.C., Trent, J.M., Staudt, L.M., Hudson, J. Jr., Boguski, M.S., Lashkari, D., Shalon, D., Botstein, D., and Brown, P.O. 1999. The transcriptional program in the response of human fibroblasts to serum. Science 283:83‐87.
   Jolliffe, I.T. 1986. Principal Component Analysis. Springer Series in Statistics. Springer‐Verlag, New York.
   Kerr, M.K. and Churchill, G.A. 2001. Experimental design for gene expression microarrays. Biostatistics 2:183‐201.
   Kozal, M.J., Shah, N., Shen, N., Yang, R., Fucini, R., Merigan, T.C., Richman, D.D., Morris, D., Hubbell, E., Chee, M., and Gingeras, T.R. 1996. Extensive polymorphisms observed in HIV‐1 clade B protease gene using high‐density oligonucleotide arrays. Nat. Med. 2:753‐759.
   Lee, T.I., Rinaldi, N.J., Robert, F., Odom, D.T., Bar‐Joseph, Z., Gerber, G.K., Hannett, N.M., Harbison, C.T., Thompson, C.M., Simon, I., Zeitlinger, J., Jennings, E.G., Murray, H.L., Gordon, D.B., Ren, B., Wyrick, J.J., Tagne, J.B., Volkert, T.L., Fraenkel, E., Gifford, D.K., and Young, R.A. 2002. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298:799‐804.
   Leung, Y.F. and Cavalieri, D. 2003. Fundamentals of cDNA microarray data analysis. Trends Genet. 19:649‐659.
   MacQueen, J. 1967. Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics and Probability 1967:281‐297.
   Sankoff, D. and Kruskal, J.B. 1983. Time Warps, String Edits, and Macromolecules. The Theory and Practice of Sequence Comparison. Addison‐Wesley, Reading Mass.
   Schena, M., Shalon, D., Davis, R.W., and Brown, P.O. 1995. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270:467‐470.
   Schena, M., Heller, R.A., Theriault, T.P., Konrad, K., Lachenmeier, E., and Davis, R.W. 1998. Microarrays: Biotechnology's discovery platform for functional genomics. Trends Biotechnol. 16:301‐306.
   Smyth, G.K. and Speed, T. 2003. Normalization of cDNA microarray data. Methods 31:265‐273.
   Smyth, G.K., Yang, Y.H., and Speed, T. 2003. Statistical issues in cDNA microarray data analysis. Methods Mol. Biol. 224:111‐136.
   Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., and Church, G.M. 1999. Systematic determination of genetic network architecture. Nat. Genet. 22:281‐285.
   Yang, Y., Buckley, M.J., Dudoit, S., and Speed, T.R. 2002. Comparison of methods for image analysis on cDNA microarray data. J. Comp. Stat. 11:108‐136.
   Yeoh, E.J., Ross, M.E., Shurtleff, S.A., Williams, W.K., Patel, D., Mahfouz, R., Behm, F.G., Raimondi, S.C., Relling, M.V., Patel, A., Cheng, C., Campana, D., Wilkins, D., Zhou, X., Li, J., Liu, H., Pui, C.H., Evans, W.E., Naeve, C., Wong, L., Downing, J.R. 2002. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. 2002. Cancer Cell 1:133‐143.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library
 
提问
扫一扫
丁香实验小程序二维码
实验小助手
丁香实验公众号二维码
扫码领资料
反馈
TOP
打开小程序