Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido

6 Issues per year


IMPACT FACTOR 2017: 0.812
5-year IMPACT FACTOR: 1.104

CiteScore 2017: 0.86

SCImago Journal Rank (SJR) 2017: 0.456
Source Normalized Impact per Paper (SNIP) 2017: 0.527

Mathematical Citation Quotient (MCQ) 2017: 0.04

Online
ISSN
1544-6115
See all formats and pricing
More options …
Ahead of print

Issues

Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

A variable selection approach in the multivariate linear model: an application to LC-MS metabolomics data

Marie Perrot-Dockès / Céline Lévy-Leduc / Julien Chiquet / Laure Sansonnet / Margaux Brégère / Marie-Pierre Étienne / Stéphane Robin / Grégory Genta-Jouve
Published Online: 2018-09-08 | DOI: https://doi.org/10.1515/sagmb-2017-0077

Abstract

Omic data are characterized by the presence of strong dependence structures that result either from data acquisition or from some underlying biological processes. Applying statistical procedures that do not adjust the variable selection step to the dependence pattern may result in a loss of power and the selection of spurious variables. The goal of this paper is to propose a variable selection procedure within the multivariate linear model framework that accounts for the dependence between the multiple responses. We shall focus on a specific type of dependence which consists in assuming that the responses of a given individual can be modelled as a time series. We propose a novel Lasso-based approach within the framework of the multivariate linear model taking into account the dependence structure by using different types of stationary processes covariance structures for the random error matrix. Our numerical experiments show that including the estimation of the covariance matrix of the random error matrix in the Lasso criterion dramatically improves the variable selection performance. Our approach is successfully applied to an untargeted LC-MS (Liquid Chromatography-Mass Spectrometry) data set made of African copals samples. Our methodology is implemented in the R package MultiVarSel which is available from the Comprehensive R Archive Network (CRAN).

Keywords: metabolomics; multivariate linear model; time series; variable selection

References

  • Audoin, C., V. Cocandeau, O. Thomas, A. Bruschini, S. Holderith, and G. Genta-Jouve (2014): “Metabolome consistency: additional parazoanthines from the mediterranean zoanthid parazoanthus axinellae,” Metabolites, 4, 421–432.CrossrefPubMedGoogle Scholar

  • Bates, D. and M. Maechler (2017): Matrix: sparse and dense matrix classes and methods. R package version 1.2-8. https://CRAN.R-project.org/package=Matrix.

  • Boccard, J. and S. Rudaz (2016): “Exploring omics data from designed experiments using analysis of variance multiblock orthogonal partial least squares,” Anal. Chim. Acta, 920, 18–28.Web of SciencePubMedCrossrefGoogle Scholar

  • Brockwell, P. and R. Davis (1991): Time series: theory and methods, Springer Series in Statistics, Springer-Verlag, New York.Google Scholar

  • Dieterle, F., A. Ross, G. Schlotterbeck, and H. Senn (2006): “Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. application in 1h nmr metabonomics,” Anal. Chem., 78, 4281–4290.PubMedCrossrefGoogle Scholar

  • Faraway, J. J.(2004): Linear models with R, Chapman & Hall/CRC, New York.Google Scholar

  • Friedman, J., T. Hastie, and R. Tibshirani (2010): “Regularization paths for generalized linear models via coordinate descent,” J. Stat. Softw., 33, 1–22.PubMedGoogle Scholar

  • Hrydziuszko, O. and M. R. Viant (2012): “Missing values in mass spectrometry based metabolomics: an undervalued step in the data processing pipeline,” Metabolomics, 8, 161–174.Web of ScienceCrossrefGoogle Scholar

  • Kirwan, J., D. Broadhurst, R. Davidson, and M. Viant (2013): “Characterising and correcting batch variation in an automated direct infusion mass spectrometry (dims) metabolomics workflow,” Anal. Bioanal. Chem., 405, 5147–5157.Web of ScienceCrossrefGoogle Scholar

  • Kuhl, C., R. Tautenhahn, C. Boettcher, T. R. Larson, and S. Neumann (2012): “CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets,” Anal. Chem., 84, 283–289.PubMedCrossrefWeb of ScienceGoogle Scholar

  • Lê Cao, K.-A., S. Boitard, and P. Besse (2011): “Sparse pls discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems,” BMC Bioinformatics, 12, 253.Web of ScienceCrossrefPubMedGoogle Scholar

  • Mardia, K., J. Kent, and J. Bibby (1979): Multivariate analysis, Probability and mathematical statistics, Academic Press, Londan.Google Scholar

  • Meinshausen, N. and P. Buhlmann (2010): “Stability selection,” J. R. Stat. Soc., 72, 417–473.CrossrefGoogle Scholar

  • Muller, K. E. and P. W. Stewart (2006): Linear model theory: univariate, multivariate, and mixed models, John Wiley & Sons.Google Scholar

  • Nicholson, J. K., J. C. Lindon, and E. Holmes ( 1999): “‘Metabonomics’: understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data,” Xenobiotica, 29, 1181–1189.CrossrefPubMedGoogle Scholar

  • Perrot-Dockès, M., C. Lévy-Leduc, L. Sansonnet, and J. Chiquet (2018): “Variable selection in multivariate linear models with high-dimensional covariance matrix estimation,” J. Multivar. Anal., 166, 78–97.CrossrefWeb of ScienceGoogle Scholar

  • R Core Team (2017): R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria.Google Scholar

  • Ren, S., A. A. Hinzman, E. L. Kang, R. D. Szczesniak, and L. J. Lu (2015): “Computational and statistical analysis of metabolomics data,” Metabolomics, 11, 1492–1513.Web of ScienceCrossrefGoogle Scholar

  • Rothman, A. J., E. Levina, and J. Zhu ( 2010): “Sparse multivariate regression with covariance estimation,” J. Comput. Graph. Stat., 19, 947–962.CrossrefPubMedGoogle Scholar

  • Saccenti, E., H. C. J. Hoefsloot, A. K. Smilde, J. A. Westerhuis, and M. M. W. B. Hendriks (2013): “Reflections on univariate and multivariate analysis of metabolomics data,” Metabolomics, 10, 361–374.Web of ScienceGoogle Scholar

  • Smith, C., E. Want, G. O’Maille, R. Abagyan, and G. Siuzdak, (2006): “XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification,” Anal. Chem., 78, 779–787.PubMedCrossrefGoogle Scholar

  • Smith, R., A. Mathis, and J. Prince (2014): “Proteomics, lipidomics, metabolomics: a mass spectrometry tutorial from a computer scientist’s point of view,” BMC Bioinformatics, 15, S9.CrossrefWeb of SciencePubMedGoogle Scholar

  • Tibshirani, R. (1996): “Regression shrinkage and selection via the Lasso,” J. R. Stat. Soc. B, 58, 267–288.Google Scholar

  • Verdegem, D., D. Lambrechts, P. Carmeliet, and B. Ghesquière (2016): “Improved metabolite identification with midas and magma through ms/ms spectral dataset-driven parameter optimization,” Metabolomics, 12, 1–16.Web of ScienceGoogle Scholar

  • Zhang, A., H. Sun, P. Wang, Y. Han, and X. Wang ( 2012): “Modern analytical techniques in metabolomics analysis,” Analyst, 137, 293–300.PubMedCrossrefWeb of ScienceGoogle Scholar

  • Zhang, H., Y. Zheng, G. Yoon, Z. Zhang, T. Gao, B. Joyce, W. Zhang, J. Schwartz, P. Vokonas, E. Colicino, A. Baccarelli, L. Hou, and L. Liu (2017): “Regularized estimation in sparse high-dimensional multivariate regression, with application to a DNA methylation study,” Stat. Appl. Genet. Mol. Biol. 16, 159–171.Web of SciencePubMedGoogle Scholar

About the article

Published Online: 2018-09-08


Citation Information: Statistical Applications in Genetics and Molecular Biology, 20170077, ISSN (Online) 1544-6115, DOI: https://doi.org/10.1515/sagmb-2017-0077.

Export Citation

©2018 Walter de Gruyter GmbH, Berlin/Boston.Get Permission

Comments (0)

Please log in or register to comment.
Log in