Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido


IMPACT FACTOR 2018: 0.536
5-year IMPACT FACTOR: 0.764

CiteScore 2018: 0.49

SCImago Journal Rank (SJR) 2018: 0.316
Source Normalized Impact per Paper (SNIP) 2018: 0.342

Mathematical Citation Quotient (MCQ) 2018: 0.02

Online
ISSN
1544-6115
See all formats and pricing
More options …
Volume 7, Issue 1

Issues

Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

Comparing the Characteristics of Gene Expression Profiles Derived by Univariate and Multivariate Classification Methods

Manuela Zucknick / Sylvia Richardson / Euan A Stronach
Published Online: 2008-02-23 | DOI: https://doi.org/10.2202/1544-6115.1307

One application of gene expression arrays is to derive molecular profiles, i.e., sets of genes, which discriminate well between two classes of samples, for example between tumour types. Users are confronted with a multitude of classification methods of varying complexity that can be applied to this task. To help decide which method to use in a given situation, we compare important characteristics of a range of classification methods, including simple univariate filtering, penalised likelihood methods and the random forest.Classification accuracy is an important characteristic, but the biological interpretability of molecular profiles is also important. This implies both parsimony and stability, in the sense that profiles should not vary much when there are slight changes in the training data. We perform a random resampling study to compare these characteristics between the methods and across a range of profile sizes. We measure stability by adopting the Jaccard index to assess the similarity of resampled molecular profiles.We carry out a case study on five well-established cancer microarray data sets, for two of which we have the benefit of being able to validate the results in an independent data set. The study shows that those methods which produce parsimonious profiles generally result in better prediction accuracy than methods which don't include variable selection. For very small profile sizes, the sparse penalised likelihood methods tend to result in more stable profiles than univariate filtering while maintaining similar predictive performance.

Keywords: microarrays; molecular signature; classification; multivariate analysis; penalised likelihood

About the article

Published Online: 2008-02-23


Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 7, Issue 1, ISSN (Online) 1544-6115, DOI: https://doi.org/10.2202/1544-6115.1307.

Export Citation

©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston.Get Permission

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

[1]
Jason E McDermott, Jing Wang, Hugh Mitchell, Bobbie-Jo Webb-Robertson, Ryan Hafen, John Ramey, and Karin D Rodland
Expert Opinion on Medical Diagnostics, 2013, Volume 7, Number 1, Page 37
[2]
Michelle Carlsen, Guifang Fu, Shaun Bushman, and Christopher Corcoran
Genetics, 2016, Volume 202, Number 2, Page 411
[3]
Erika Cule, Paolo Vineis, and Maria De Iorio
BMC Bioinformatics, 2011, Volume 12, Number 1
[4]
Xingang Jia, Qiuhong Han, and Zuhong Lu
BMC Bioinformatics, 2018, Volume 19, Number 1
[5]
Shengyu Ni and Martin Vingron
Journal of Computational Biology, 2012, Volume 19, Number 6, Page 766
[6]
Yiyou Guo, Jinsheng Ji, Hong Huo, Tao Fang, and Deren Li
EURASIP Journal on Image and Video Processing, 2018, Volume 2018, Number 1
[7]
Nicolai Meinshausen and Peter Bühlmann
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2010, Volume 72, Number 4, Page 417
[8]
Christine Porzelius, Martin Schumacher, and Harald Binder
Statistics and Computing, 2010, Volume 20, Number 2, Page 151
[9]
Andrea Bommert, Jörg Rahnenführer, and Michel Lang
Computational and Mathematical Methods in Medicine, 2017, Volume 2017, Page 1
[10]
Barbara Di Camillo, Tiziana Sanavia, Matteo Martini, Giuseppe Jurman, Francesco Sambo, Annalisa Barla, Margherita Squillario, Cesare Furlanello, Gianna Toffolo, Claudio Cobelli, and Jo-Ann L. Stanton
PLoS ONE, 2012, Volume 7, Number 3, Page e32200
[11]
Vlad Popovici, Weijie Chen, Brandon D Gallas, Christos Hatzis, Weiwei Shi, Frank W Samuelson, Yuri Nikolsky, Marina Tsyganova, Alex Ishkin, Tatiana Nikolskaya, Kenneth R Hess, Vicente Valero, Daniel Booser, Mauro Delorenzi, Gabriel N Hortobagyi, Leming Shi, W Fraser Symmans, and Lajos Pusztai
Breast Cancer Research, 2010, Volume 12, Number 1
[12]
Ludwig Lausser, Christoph Müssel, Markus Maucher, and Hans A. Kestler
Computational Statistics, 2013, Volume 28, Number 1, Page 51
[13]
Andrea Gobbi, Giuseppe Jurman, and Francisco J. Esteban
PLOS ONE, 2015, Volume 10, Number 6, Page e0128115
[14]
Paul Kirk, Aviva Witkover, Charles R.M. Bangham, Sylvia Richardson, Alexandra M. Lewin, and Michael P.H. Stumpf
Journal of Computational Biology, 2013, Volume 20, Number 12, Page 979
[15]
Zengyou He and Weichuan Yu
Computational Biology and Chemistry, 2010, Volume 34, Number 4, Page 215
[16]
Ana C. Lorena, Ivan G. Costa, Newton Spolaôr, and Marcilio C.P. de Souto
Neurocomputing, 2012, Volume 75, Number 1, Page 33

Comments (0)

Please log in or register to comment.
Log in