One application of gene expression arrays is to derive molecular profiles, i.e., sets of genes, which discriminate well between two classes of samples, for example between tumour types. Users are confronted with a multitude of classification methods of varying complexity that can be applied to this task. To help decide which method to use in a given situation, we compare important characteristics of a range of classification methods, including simple univariate filtering, penalised likelihood methods and the random forest.Classification accuracy is an important characteristic, but the biological interpretability of molecular profiles is also important. This implies both parsimony and stability, in the sense that profiles should not vary much when there are slight changes in the training data. We perform a random resampling study to compare these characteristics between the methods and across a range of profile sizes. We measure stability by adopting the Jaccard index to assess the similarity of resampled molecular profiles.We carry out a case study on five well-established cancer microarray data sets, for two of which we have the benefit of being able to validate the results in an independent data set. The study shows that those methods which produce parsimonious profiles generally result in better prediction accuracy than methods which don't include variable selection. For very small profile sizes, the sparse penalised likelihood methods tend to result in more stable profiles than univariate filtering while maintaining similar predictive performance.

Editor-in-Chief: Stumpf, Michael P.H.
Editorial Board Member: Beaumont, Mark / Binder, Harald / Gupta, Mayetri / Hubbard, Alan E. / Husmeier, Dirk / Ji, Hongkai / Keles, Sunduz / Kerr, Kathleen / Lazzeroni, Laura / Lin, Shili / Ma, Ping / Marjoram, Paul / Mertens, Bart / Nerman, Olle / G. Petretto, Enrico / Plagnol, Vincent / Purdom, Elizabeth / Robin, Stéphane / Rzhetsky, Andrey / Sanguinetti, Guido / van der Laan, Mark J. / von Haeseler, Arndt / Weeks, Daniel E. / Wiuf, Carsten / Zhao, Hongyu
1 Issue per year
IMPACT FACTOR 2011: 1.517
5-year IMPACT FACTOR: 1.704
Rank 27 out of 116 in category Statistics & Probability in the 2011 Thomson Reuters Journal Citation Report/Science Edition
Issues
Volume 12 (2013)
Volume 11 (2012)
Volume 10 (2011)
Volume 9 (2010)
Volume 8 (2009)
Volume 7 (2008)
Volume 6 (2007)
Volume 5 (2006)
Volume 4 (2005)
Volume 3 (2004)
Volume 2 (2003)
Volume 1 (2002)
Most Downloaded Articles
- A General Framework for Weighted Gene Co-Expression Network Analysis by Zhang, Bin and Horvath, Steve
- Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments by Smyth, Gordon K
- Detecting Differential Expression in RNA-sequence Data Using Quasi-likelihood with Shrunken Dispersion Estimates by Lund, Steven P./ Nettleton, Dan/ McCarthy, Davis J. and Smyth, Gordon K.
- Adjusting for Spurious Gene-by-Environment Interaction Using Case-Parent Triads by Shin, Ji-Hyung/ Infante-Rivard, Claire/ Graham, Jinko and McNeney, Brad
- A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics by Schäfer, Juliane and Strimmer, Korbinian
Comparing the Characteristics of Gene Expression Profiles Derived by Univariate and Multivariate Classification Methods
1German Cancer Research Centre
1Imperial College London
1Imperial College London
Citation Information: Statistical Applications in Genetics and Molecular Biology. Volume 7, Issue 1, Pages –, ISSN (Online) 1544-6115, DOI: 10.2202/1544-6115.1307, February 2008
- Published Online:
- 2008-02-23
Keywords: microarrays; molecular signature; classification; multivariate analysis; penalised likelihood


















Comments (0)