High-Performance Gene Expression Module Analysis Tool and Its Application to Chemical Toxicity Data

互联网2014-02-13

308

Gene clustering is one of the main themes of data mining approaches in bioinformatics. Although it has the power to analyze gene function, interpretation of the results becomes increasingly difficult when the number of experiments (samples) exceeds hundreds or more. A new type of clustering called “biclustering,” where genes and experiments are coclustered in a large-scale of gene expression data, has been extensively studied in the last decade. We have developed “SAMURAI,” an original program that detects all the biclusters or “gene modules” whose genes have similar expression patterns to query profile using the ultrafast data mining algorithm called Linear-time Closed itemset Miner (LCM). Using chemical toxicity dataset from J&J rat liver experiments, we compiled an exhaustive dictionary of gene modules by searching datasets of gene modules with each chemical exposure experiment as query. Through the module analysis, we found that our program can detect up/down-regulated gene sets that significantly represent particular GO functions or KEGG pathways, thereby unraveling reactions and mechanisms common to different toxicochemical treatments of hepatocytes.