Jump to ContentJump to Main Navigation
Show Summary Details
In This Section

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Stumpf, Michael P.H.

6 Issues per year


IMPACT FACTOR 2016: 0.646
5-year IMPACT FACTOR: 1.191

CiteScore 2016: 0.94

SCImago Journal Rank (SJR) 2015: 0.954
Source Normalized Impact per Paper (SNIP) 2015: 0.554

Mathematical Citation Quotient (MCQ) 2015: 0.06

Online
ISSN
1544-6115
See all formats and pricing
In This Section
Volume 10, Issue 1 (Jul 2011)

Issues

High-Dimensional Regression and Variable Selection Using CAR Scores

Verena Zuber
  • University of Leipzig
/ Korbinian Strimmer
  • University of Leipzig
Published Online: 2011-07-18 | DOI: https://doi.org/10.2202/1544-6115.1730

Variable selection is a difficult problem that is particularly challenging in the analysis of high-dimensional genomic data. Here, we introduce the CAR score, a novel and highly effective criterion for variable ranking in linear regression based on Mahalanobis-decorrelation of the explanatory variables. The CAR score provides a canonical ordering that encourages grouping of correlated predictors and down-weights antagonistic variables. It decomposes the proportion of variance explained and it is an intermediate between marginal correlation and the standardized regression coefficient. As a population quantity, any preferred inference scheme can be applied for its estimation. Using simulations, we demonstrate that variable selection by CAR scores is very effective and yields prediction errors and true and false positive rates that compare favorably with modern regression techniques such as elastic net and boosting. We illustrate our approach by analyzing data concerned with diabetes progression and with the effect of aging on gene expression in the human brain. The R package “care” implementing CAR score regression is available from CRAN.

Keywords: variable importance; variable selection; decorrelation; lasso; elastic net; boosting; CAR score

About the article

Published Online: 2011-07-18



Citation Information: Statistical Applications in Genetics and Molecular Biology, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.2202/1544-6115.1730. Export Citation

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

[1]
S. Ejaz Ahmed and Bahadır Yüzbaşı
International Journal of Management Science and Engineering Management, 2016, Page 1
[2]
Holger Kirsten, Hoor Al-Hasani, Lesca Holdt, Arnd Gross, Frank Beutner, Knut Krohn, Katrin Horn, Peter Ahnert, Ralph Burkhardt, Kristin Reiche, Jörg Hackermüller, Markus Löffler, Daniel Teupser, Joachim Thiery, and Markus Scholz
Human Molecular Genetics, 2015, Volume 24, Number 16, Page 4746
[3]
M. Siwek, A. Slawinska, M. Rydzanicz, J. Wesoly, M. Fraszczak, T. Suchocki, J. Skiba, K. Skiba, and J. Szyda
Animal Genetics, 2015, Volume 46, Number 3, Page 247
[4]
Alexander Benedikt Leichtle, Uta Ceglarek, Peter Weinert, Christos T. Nakas, Jean-Marc Nuoffer, Julia Kase, Tim Conrad, Helmut Witzigmann, Joachim Thiery, and Georg Martin Fiedler
Metabolomics, 2013, Volume 9, Number 3, Page 677
[5]
Bingqing Lin and Zhen Pang
Journal of Computational and Graphical Statistics, 2014, Volume 23, Number 2, Page 478
[6]
Frank Niemeyer, Hans-Joachim Wilke, and Hendrik Schmidt
Journal of Biomechanics, 2012, Volume 45, Number 8, Page 1414
[7]
Tasadduq Imam, Kevin Tickle, Abdullahi Ahmed, and William Guo
Intelligent Systems in Accounting, Finance and Management, 2012, Volume 19, Number 1, Page 19

Comments (0)

Please log in or register to comment.
Log in