Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido

6 Issues per year

IMPACT FACTOR 2017: 0.812
5-year IMPACT FACTOR: 1.104

CiteScore 2017: 0.86

SCImago Journal Rank (SJR) 2017: 0.456
Source Normalized Impact per Paper (SNIP) 2017: 0.527

Mathematical Citation Quotient (MCQ) 2017: 0.04

See all formats and pricing
More options …
Volume 17, Issue 5


Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

A variable selection approach in the multivariate linear model: an application to LC-MS metabolomics data

Marie Perrot-Dockès / Céline Lévy-Leduc / Julien Chiquet / Laure Sansonnet / Margaux Brégère / Marie-Pierre Étienne / Stéphane Robin / Grégory Genta-Jouve
Published Online: 2018-09-08 | DOI: https://doi.org/10.1515/sagmb-2017-0077


Omic data are characterized by the presence of strong dependence structures that result either from data acquisition or from some underlying biological processes. Applying statistical procedures that do not adjust the variable selection step to the dependence pattern may result in a loss of power and the selection of spurious variables. The goal of this paper is to propose a variable selection procedure within the multivariate linear model framework that accounts for the dependence between the multiple responses. We shall focus on a specific type of dependence which consists in assuming that the responses of a given individual can be modelled as a time series. We propose a novel Lasso-based approach within the framework of the multivariate linear model taking into account the dependence structure by using different types of stationary processes covariance structures for the random error matrix. Our numerical experiments show that including the estimation of the covariance matrix of the random error matrix in the Lasso criterion dramatically improves the variable selection performance. Our approach is successfully applied to an untargeted LC-MS (Liquid Chromatography-Mass Spectrometry) data set made of African copals samples. Our methodology is implemented in the R package MultiVarSel which is available from the Comprehensive R Archive Network (CRAN).

Keywords: metabolomics; multivariate linear model; time series; variable selection


  • Audoin, C., V. Cocandeau, O. Thomas, A. Bruschini, S. Holderith, and G. Genta-Jouve (2014): “Metabolome consistency: additional parazoanthines from the mediterranean zoanthid parazoanthus axinellae,” Metabolites, 4, 421–432.Google Scholar

  • Bates, D. and M. Maechler (2017): Matrix: sparse and dense matrix classes and methods. R package version 1.2-8. https://CRAN.R-project.org/package=Matrix.

  • Boccard, J. and S. Rudaz (2016): “Exploring omics data from designed experiments using analysis of variance multiblock orthogonal partial least squares,” Anal. Chim. Acta, 920, 18–28.Google Scholar

  • Brockwell, P. and R. Davis (1991): Time series: theory and methods, Springer Series in Statistics, Springer-Verlag, New York.Google Scholar

  • Dieterle, F., A. Ross, G. Schlotterbeck, and H. Senn (2006): “Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. application in 1h nmr metabonomics,” Anal. Chem., 78, 4281–4290.Google Scholar

  • Faraway, J. J.(2004): Linear models with R, Chapman & Hall/CRC, New York.Google Scholar

  • Friedman, J., T. Hastie, and R. Tibshirani (2010): “Regularization paths for generalized linear models via coordinate descent,” J. Stat. Softw., 33, 1–22.Google Scholar

  • Hrydziuszko, O. and M. R. Viant (2012): “Missing values in mass spectrometry based metabolomics: an undervalued step in the data processing pipeline,” Metabolomics, 8, 161–174.Google Scholar

  • Kirwan, J., D. Broadhurst, R. Davidson, and M. Viant (2013): “Characterising and correcting batch variation in an automated direct infusion mass spectrometry (dims) metabolomics workflow,” Anal. Bioanal. Chem., 405, 5147–5157.Google Scholar

  • Kuhl, C., R. Tautenhahn, C. Boettcher, T. R. Larson, and S. Neumann (2012): “CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets,” Anal. Chem., 84, 283–289.Google Scholar

  • Lê Cao, K.-A., S. Boitard, and P. Besse (2011): “Sparse pls discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems,” BMC Bioinformatics, 12, 253.Google Scholar

  • Mardia, K., J. Kent, and J. Bibby (1979): Multivariate analysis, Probability and mathematical statistics, Academic Press, Londan.Google Scholar

  • Meinshausen, N. and P. Buhlmann (2010): “Stability selection,” J. R. Stat. Soc., 72, 417–473.Google Scholar

  • Muller, K. E. and P. W. Stewart (2006): Linear model theory: univariate, multivariate, and mixed models, John Wiley & Sons.Google Scholar

  • Nicholson, J. K., J. C. Lindon, and E. Holmes ( 1999): “‘Metabonomics’: understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data,” Xenobiotica, 29, 1181–1189.Google Scholar

  • Perrot-Dockès, M., C. Lévy-Leduc, L. Sansonnet, and J. Chiquet (2018): “Variable selection in multivariate linear models with high-dimensional covariance matrix estimation,” J. Multivar. Anal., 166, 78–97.Google Scholar

  • R Core Team (2017): R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria.Google Scholar

  • Ren, S., A. A. Hinzman, E. L. Kang, R. D. Szczesniak, and L. J. Lu (2015): “Computational and statistical analysis of metabolomics data,” Metabolomics, 11, 1492–1513.Google Scholar

  • Rothman, A. J., E. Levina, and J. Zhu ( 2010): “Sparse multivariate regression with covariance estimation,” J. Comput. Graph. Stat., 19, 947–962.Google Scholar

  • Saccenti, E., H. C. J. Hoefsloot, A. K. Smilde, J. A. Westerhuis, and M. M. W. B. Hendriks (2013): “Reflections on univariate and multivariate analysis of metabolomics data,” Metabolomics, 10, 361–374.Google Scholar

  • Smith, C., E. Want, G. O’Maille, R. Abagyan, and G. Siuzdak, (2006): “XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification,” Anal. Chem., 78, 779–787.Google Scholar

  • Smith, R., A. Mathis, and J. Prince (2014): “Proteomics, lipidomics, metabolomics: a mass spectrometry tutorial from a computer scientist’s point of view,” BMC Bioinformatics, 15, S9.Google Scholar

  • Tibshirani, R. (1996): “Regression shrinkage and selection via the Lasso,” J. R. Stat. Soc. B, 58, 267–288.Google Scholar

  • Verdegem, D., D. Lambrechts, P. Carmeliet, and B. Ghesquière (2016): “Improved metabolite identification with midas and magma through ms/ms spectral dataset-driven parameter optimization,” Metabolomics, 12, 1–16.Google Scholar

  • Zhang, A., H. Sun, P. Wang, Y. Han, and X. Wang ( 2012): “Modern analytical techniques in metabolomics analysis,” Analyst, 137, 293–300.Google Scholar

  • Zhang, H., Y. Zheng, G. Yoon, Z. Zhang, T. Gao, B. Joyce, W. Zhang, J. Schwartz, P. Vokonas, E. Colicino, A. Baccarelli, L. Hou, and L. Liu (2017): “Regularized estimation in sparse high-dimensional multivariate regression, with application to a DNA methylation study,” Stat. Appl. Genet. Mol. Biol. 16, 159–171.Google Scholar

About the article

Published Online: 2018-09-08

Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 17, Issue 5, 20170077, ISSN (Online) 1544-6115, DOI: https://doi.org/10.1515/sagmb-2017-0077.

Export Citation

©2018 Walter de Gruyter GmbH, Berlin/Boston.Get Permission

Comments (0)

Please log in or register to comment.
Log in