Analyzing and Visualizing Expression Data with Spotfire
互联网
- Abstract
- Table of Contents
- Figures
- Literature Cited
Abstract
This unit assumes the reader is familiar with the Spotfire environment, has successfully installed Spotfire, and has uploaded and prepared data for analysis. It presents numerous methods for analyzing microarray data. Specifically, the first two protocols describe methods for identifying differentially expressed genes via the t ?test/ANOVA and the distinction calculation respectively. Another protocol discusses how to conduct a profile search. Additional protocols illustrate various clustering methods, such as hierarchical clustering, K?means clustering, and principal components analysis. A protocol explaining coincidence testing allows the reader to compare the results from multiple clustering methods. Additional protocols demonstrate querying the Internet for information based on the microarray data, mathematically transforming data within Spotfire to generate new data columns, and exporting Spotfire visualizations.
Table of Contents
- Basic Protocol 1: Identification of Differentially Expressed Genes Using t‐test/ANOVA
- Alternate Protocol 1: Identification of Differentially Expressed Genes using Distinction Calculation
- Basic Protocol 2: Identification of Genes Similar to a Given Profile: The Profile Search
- Support Protocol 1: Editing a Master Profile
- Basic Protocol 3: Coincidence Testing
- Basic Protocol 4: Hierarchical Clustering
- Support Protocol 2: Adding a Column from Hierarchical Clustering
- Alternate Protocol 2: Hierarchical Clustering on Keys
- Basic Protocol 5: K‐Means Clustering
- Basic Protocol 6: Principal Components Analysis
- Support Protocol 3: Transposing Data in Spotfire Decision Site
- Basic Protocol 7: Using Web Links to Query the Internet for Useful Information
- Basic Protocol 8: Generating New Columns of Data in Spotfire
- Basic Protocol 9: Exporting Spotfire Visualizations
- Guidelines for Understanding Results
- Commentary
- Literature Cited
- Figures
- Tables
Materials
Figures
-
Figure 7.9.1 The Treatment Comparison tool is shown. View Image -
Figure 7.9.2 The Treatment Comparison dialog box allows the users to group various Value Columns into different groups on which t ‐test/ANOVA is to be performed. View Image -
Figure 7.9.3 A profile chart is generated to display the results of t ‐test/ANOVA analysis. The “ t ‐test/ANOVA Query Device” (a range slider) can be manipulated to identify highly significant genes. The profile chart is colored in the Continuous Coloring mode based on the t ‐test/ANOVA p ‐values. View Image -
Figure 7.9.4 The Treatment Comparison dialog box allows the users to group various Value Columns into different groups on which Multiple Distinction is to be performed. View Image -
Figure 7.9.5 Results of Multiple Distinction are originally displayed in a profile chart. The users can however build a heat map based on these results. (A ) A set of genes on the basis of which eight experiments can be distinctly identified using the Multiple Distinction algorithm. (B ) A zoomed in version of the same heat map. View Image -
Figure 7.9.6 The Profile Search dialog box allows users to chose Value Columns to be used for this calculation as well as variables such as Similarity Measure and Calculation Options. View Image -
Figure 7.9.7 The Profile Search: Edit dialog box allows users to edit an existing profile to create an imaginary profile upon which to base the search. View Image -
Figure 7.9.8 The Coincidence Testing dialog box. View Image -
Figure 7.9.9 The Hierarchical Clustering algorithm can be accessed from the Tools as well as the Guides menu. View Image -
Figure 7.9.10 The Hierarchical clustering dialog box allows users to specify Value Columns to be included in the clustering calculation and various other calculation options such as the Clustering Method and Similarity Measure. View Image -
Figure 7.9.11 Hierarchical clustering results are displayed as a (default red‐green) heat map with an associated dendrogram. View Image -
Figure 7.9.12 Hierarchical Clustering visualization allows users to zoom in and out of the heat map as well as the dendrogram. Individual or a group of clusters can be marked and a data column added to the Spotfire session. View Image -
Figure 7.9.13 The K‐means Clustering Tool dialog box allows the users to specify the number of desired clusters, the method of choice for initiating centroids, the similarity measure, and other variables. View Image -
Figure 7.9.14 K‐means clustering results are displayed as a group of profile charts. Each group is uniquely colored as specified by the check‐box query device. View Image -
Figure 7.9.15 The PCA dialog box allows the users to specify which Value Columns should be included in the calculation. In addition, it allows users to define variables such as the number of desired components. View Image -
Figure 7.9.16 PCA results are displayed as 2‐D or 3‐D plots according to the users specifications. View Image -
Figure 7.9.17 The Web Links dialog box allows users to specify the Web site to search and the Identifier column from which to formulate the query. View Image -
Figure 7.9.18 Results of a Web Link query are displayed in a new Web browser window. In this particular example, a significant outlier list of genes (Genbank Accession numbers) was queried using a Gene Annotation Database (created at the Hartwell Center for Bioinformatics and Biotechnology) and the results returned included Gene Descriptions and Gene Ontologies ( UNIT ) for the queried records. View Image -
Figure 7.9.19 Right clicking in the Query Devices window allows generation of new columns. View Image -
Figure 7.9.20 The New Columns dialog box. View Image -
Figure 7.9.21 The Microsoft Word Presentation dialog box. View Image -
Figure 7.9.22 The Export as Web Page dialog box. View Image -
Figure 7.9.23 Data exported from a Spotfire session to the Web is displayed as a Web page report containing all the images as well as marked records. View Image -
Figure 7.9.24 Exporting currently active visualization using the Copy Special, Visualization mode. View Image -
Figure 7.9.25 The Export Visualization dialog box. View Image
Videos
Literature Cited
Literature Cited | |
Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. 1998. Cluster analysis and display of genome‐wide expression patterns. Proc. Natl. Acad. Sci. U.S.A. 95:14863‐14868. | |
Jolliffe, I.T. 1986. Springer Series in Statistics, 1986: Principal Component Analysis. Springer‐Verlag, New York. | |
Kerr, M.K. and Churchill, G.A. 2001. Experimental design for gene expression microarrays. Biostatistics 2:183‐201. | |
MacQueen, J. 1967. Some methods for classification and analysis of multivariate observations In Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics and Probability, Vol I. (L.M. Le Cam and J. Neyman, eds.) pp. 281‐297. University of California Press, Berkeley, Calif. | |
Sankoff, D. and Kruskal, J.B. 1983. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison‐Wesley Publishing, Reading, Mass. | |
Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M. 1999. Systematic determination of genetic network architecture. Nat. Genet. 22:281‐285. |