Jump to ContentJump to Main Navigation
Show Summary Details
In This Section

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Stumpf, Michael P.H.

6 Issues per year

IMPACT FACTOR 2016: 0.646
5-year IMPACT FACTOR: 1.191

CiteScore 2016: 0.94

SCImago Journal Rank (SJR) 2015: 0.954
Source Normalized Impact per Paper (SNIP) 2015: 0.554

Mathematical Citation Quotient (MCQ) 2015: 0.06

See all formats and pricing
In This Section
Volume 8, Issue 1 (Jan 2009)


Sparse Canonical Correlation Analysis with Application to Genomic Data Integration

Elena Parkhomenko
  • Hospital for Sick Children Research Institute
/ David Tritchler
  • University of Toronto, State University of New York at Buffalo, Ontario Cancer Institute
/ Joseph Beyene
  • Hospital for Sick Children Research Institute, University of Toronto
Published Online: 2009-01-06 | DOI: https://doi.org/10.2202/1544-6115.1406

Large scale genomic studies with multiple phenotypic or genotypic measures may require the identification of complex multivariate relationships. In multivariate analysis a common way to inspect the relationship between two sets of variables based on their correlation is canonical correlation analysis, which determines linear combinations of all variables of each type with maximal correlation between the two linear combinations. However, in high dimensional data analysis, when the number of variables under consideration exceeds tens of thousands, linear combinations of the entire sets of features may lack biological plausibility and interpretability. In addition, insufficient sample size may lead to computational problems, inaccurate estimates of parameters and non-generalizable results. These problems may be solved by selecting sparse subsets of variables, i.e. obtaining sparse loadings in the linear combinations of variables of each type. In this paper we present Sparse Canonical Correlation Analysis (SCCA) which examines the relationships between two types of variables and provides sparse solutions that include only small subsets of variables of each type by maximizing the correlation between the subsets of variables of different types while performing variable selection. We also present an extension of SCCA - adaptive SCCA. We evaluate their properties using simulated data and illustrate practical use by applying both methods to the study of natural variation in human gene expression.

Keywords: canonical correlation; sparseness; data integration

About the article

Published Online: 2009-01-06

Citation Information: Statistical Applications in Genetics and Molecular Biology, ISSN (Online) 1544-6115, DOI: https://doi.org/10.2202/1544-6115.1406. Export Citation

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Chongliang Luo, Jin Liu, Dipak K. Dey, and Kun Chen
Biostatistics, 2016, Page kxw001
Xin-Guo Liu, Xue-Feng Wang, and Wei-Guo Wang
SIAM Journal on Matrix Analysis and Applications, 2015, Volume 36, Number 4, Page 1489
Ines Wilms and Christophe Croux
Biometrical Journal, 2015, Page n/a
Oliver P. Günther, Heesun Shin, Raymond T. Ng, W. Robert McMaster, Bruce M. McManus, Paul A. Keown, Scott. J. Tebbutt, and Kim-Anh Lê Cao
OMICS: A Journal of Integrative Biology, 2014, Volume 18, Number 11, Page 682
Yusuke Fujiwara, Yoichi Miyawaki, and Yukiyasu Kamitani
Neural Computation, 2013, Volume 25, Number 4, Page 979
Brian McWilliams and Giovanni Montana
Statistical Analysis and Data Mining, 2012, Volume 5, Number 4, Page 304
Xi Chen and Han Liu
Statistics in Biosciences, 2012, Volume 4, Number 1, Page 3
Ana Conesa, José M. Prats-Montalbán, Sonia Tarazona, Ma José Nueda, and Alberto Ferrer
Chemometrics and Intelligent Laboratory Systems, 2010, Volume 104, Number 1, Page 101
Maria Vounou, Thomas E. Nichols, and Giovanni Montana
NeuroImage, 2010, Volume 53, Number 3, Page 1147
Xiaohong Chen, Songcan Chen, and Hui Xue
Applied Mathematics and Computation, 2011, Volume 217, Number 22, Page 9041

Comments (0)

Please log in or register to comment.
Log in