Model selection for factorial Gaussian graphical models with an application to dynamic regulatory networks

Veronica Vinciotti 1 , Luigi Augugliaro 2 , Antonino Abbruzzo 2  and Ernst C. Wit 3
  • 1 Department of Mathematics, Brunel University London, Uxbridge UB8 3PH, UK
  • 2 Statistics, University of Palermo, Viale delle Scienze, 90128 Palermo, Italy
  • 3 Johann Bernoulli Institute, University of Groningen, 9747 AG Groningen, The Netherlands
Veronica Vinciotti, Luigi Augugliaro, Antonino Abbruzzo and Ernst C. Wit

Abstract

Factorial Gaussian graphical Models (fGGMs) have recently been proposed for inferring dynamic gene regulatory networks from genomic high-throughput data. In the search for true regulatory relationships amongst the vast space of possible networks, these models allow the imposition of certain restrictions on the dynamic nature of these relationships, such as Markov dependencies of low order – some entries of the precision matrix are a priori zeros – or equal dependency strengths across time lags – some entries of the precision matrix are assumed to be equal. The precision matrix is then estimated by l1-penalized maximum likelihood, imposing a further constraint on the absolute value of its entries, which results in sparse networks. Selecting the optimal sparsity level is a major challenge for this type of approaches. In this paper, we evaluate the performance of a number of model selection criteria for fGGMs by means of two simulated regulatory networks from realistic biological processes. The analysis reveals a good performance of fGGMs in comparison with other methods for inferring dynamic networks and of the KLCV criterion in particular for model selection. Finally, we present an application on a high-resolution time-course microarray data from the Neisseria meningitidis bacterium, a causative agent of life-threatening infections such as meningitis. The methodology described in this paper is implemented in the R package sglasso, freely available at CRAN, http://CRAN.R-project.org/package=sglasso.

    • Supplementary material
  • Abegaz, F. and E. Wit (2013): “Sparse time series chain graphical models for reconstructing genetic networks,” Biostatistics, 14, 586–599.

    • Crossref
    • PubMed
    • Export Citation
  • Aderhold, A., D. Husmeier and M. Grzegorczyk (2014): “Statistical inference of regulatory networks for circadian regulation,” Stat. Appl. Genet. Mol. Biol., 13, 227–273.

    • PubMed
    • Export Citation
  • Akaike, H. (1973): Information theory and an extension of the maximum likelihood principle. In: B. N. Petrov and F. Czaki, eds., Second International Symposium on Information Theory, Akademiai Kiado, Budapest, 267–281.

  • Augugliaro, L., A. M. Mineo and E. C. Wit (2013): “Differential geometric least angle regression: a differential geometric approach to sparse generalized linear models,” J. Roy. Statist. Soc. Ser. B, 75, 471–498.

    • Crossref
    • Export Citation
  • Banerjee, O., L. El Ghaoui and A. d’Aspremont (2008): “Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data,” J. Mach. Learn. Res., 9, 485–516.

  • Bickel, P. J. and E. Levina (2008): “Regularized estimation of large covariance matrices,” Ann. Statist., 36, 199–227.

    • Crossref
    • Export Citation
  • Bühlmann, P. and S. Van De Geer (2011): Statistics for high-dimensional data: methods, theory and applications, Berlin, Heidelberg: Springer.

  • Efron, B. (1986): “How biased is the apparent error rate of a prediction rule?” J. Amer. Statist. Assoc., 81, 461–470.

    • Crossref
    • Export Citation
  • Efron, B. (2004): “The estimation of prediction error: covariance penalties and cross-validation,” J. Amer. Statist. Assoc., 99, 619–632.

    • Crossref
    • Export Citation
  • Efron, B., T. Hastie, I. Johnstone and R. Tibshirani (2004): “Least angle regression,” Ann. Statist., 32, 407–499.

    • Crossref
    • Export Citation
  • Fagnocchi, L., E. Pigozzi, V. Scarlato and I. Delany (2012): “In the NadR regulon, adhesins and diverse meningococcal functions are regulated in response to signals in human saliva,” J. Bacteriol., 194, 460–474.

    • Crossref
    • PubMed
    • Export Citation
  • Foygel, R. and M. Drton (2010): “Extended Bayesian information criteria for gaussian graphical models,” in: Advances in Neural Information Processing Systems, pp. 604–612.

  • Friedman, J., T. Hastie, H. Höfling and R. Tibshirani (2007): “Pathwise coordinate optimization,” Ann. Appl. Stat., 1, 302–332.

    • Crossref
    • Export Citation
  • Friedman, J., T. Hastie and R. Tibshirani (2008): “Sparse inverse covariance estimation with the graphical lasso,” Biostatistics, 9, 432–441.

    • Crossref
    • PubMed
    • Export Citation
  • Friedman, J. H., T. Hastie and R. Tibshirani (2010): “Regularization paths for generalized linear models via coordinate descent,” J. Stat. Softw., 33, 1–22.

    • PubMed
    • Export Citation
  • Gao, X., D. Q. Pu, Y. Wu and X. Xu (2012): “Tuning parameter selection for penalized likelihood estimation of Gaussian graphical model,” Statistica Sinica, 22, 1123–1146.

  • Genco, C. A. and L. M. Wetzler (2010): Neisseria: molecular mechanisms of pathogenesis, Norfolk, UK: Caister Academic Press.

  • Giuliani, M. M., J. Adu-Bobie, M. Comanducci, B. Aricò, S. Savino, L. Santini, B. Brunelli, S. Bambini, A. Biolchi, B. Capecchi, E. Cartocci, L. Ciucchi, F. Di Marcello, F. Ferlicca, B. Galli, E. Luzzi, V. Masignani, D. Serruto, D. Veggi, M. Contorni, M. Morandi, A. Bartalesi, V. Cinotti, D. Mannucci, F. Titta, E. Ovidi, J. A. Welsch, D. Granoff, R. Rappuoli and M. Pizza (2006): “A universal vaccine for serogroup B meningococcus,” Proc. Natl. Acad. Sci. USA, 103, 10834–10839.

    • Crossref
    • Export Citation
  • Grzegorczyk, M. and D. Husmeier (2011): “Non-homogeneous dynamic Bayesian networks for continuous data,” Mach. Learn., 83, 355–419.

    • Crossref
    • Export Citation
  • Guo, J., E. Levina, G. Michailidis and J. Zhu (2011): “Joint estimation of multiple graphical models,” Biometrika, 98, 1–15.

    • Crossref
    • PubMed
    • Export Citation
  • Højsgaard, S. and S. Lauritzen (2008): “Graphical Gaussian models with edge and vertex symmetries,” J. R. Stat. Soc. Series B Stat Methodol., 70, 1005–1027.

    • Crossref
    • Export Citation
  • Hoops, S., S. Sahle, R. Gauges, C. Lee, J. Pahle, N. Simus, M. Singhal, L. Xu, P. Mendes and U. Kummer (2006): “Copasia complex pathway simulator,” Bioinformatics, 22, 3067–3074.

    • Crossref
    • PubMed
    • Export Citation
  • Huang, C. Y. and J. E. Ferrell (1996): “Ultrasensitivity in the mitogen-activated protein kinase cascade,” Proc. Natl. Acad. Sci. USA, 93, 10078–10083.

    • Crossref
    • Export Citation
  • Jordan, P. and N. Saunders (2009): “Host iron binding proteins acting as niche indicators for Neisseria meningitidis,” PLoS One, 4, e5198.

    • Crossref
    • PubMed
    • Export Citation
  • Leloup, J.-C. and A. Goldbeter (1999): “Chaos and birhythmicity in a model for circadian oscillations of the {PER} and {TIM} proteins in drosophila,” J. Theor. Biol., 198, 445–459.

    • Crossref
    • Export Citation
  • Liu, H., K. Roeder and L. Wasserman (2010): “Stability approach to regularization selection (stARS) for high dimensional graphical models,” In: J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel and A. Culotta, eds., Advances in Neural Information Processing Systems 23, Red Hook, NY: Curran Associates, Inc., pp. 1432–1440.

  • Lysen, S. (2009): Permuted inclusion criterion: a variable selection technique, PhD thesis, University of Pennsylvania.

  • Meinshausen, N. and P. Bühlmann (2006): “High-dimensional graphs and variable selection with the lasso,” Ann. Statist., 34, 1436–1462.

    • Crossref
    • Export Citation
  • Miller, K. S. (1981): “On the inverse of the sum of matrices,” Mathematics Magazine, 54, 67–72.

    • Crossref
    • Export Citation
  • Pizza, M. and R. Rappuoli (2015): “Neisseria meningitidis: pathogenesis and immunity,” Curr. Opin. Microbiol., 23, 68–72.

    • Crossref
    • PubMed
    • Export Citation
  • Rhein, R. O. and K. Strimmer (2007): “From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data,” BMC Syst. Biol., 1, 37.

    • Crossref
    • PubMed
    • Export Citation
  • Rothman, A., P. J. Bickel, E. Levina and J. Zhu (2008): “Sparse permutation invariant covariance estimation,” Electron. J. Stat., 2, 494–515.

    • Crossref
    • Export Citation
  • Ryan, K. J., and L. G. Ray (2010): “Influenza, parainfluenza, respiratory syncytial virus, adenovirus and other respiratory viruses, Chap 9,” In: Sherris Medical Microbiology, 5th edn. New York: McGraw Hill, pp. 167–187.

  • Saunders, N. and J. Davies (2012): “The use of the pan-Neisseria microarray and experimental design for transcriptomics studies of neisseria,” Methods Mol Biol., 799, 295–317.

    • Crossref
    • PubMed
    • Export Citation
  • Schielke, S., C. Huebner, C. Spatz, V. Nägele, N. Ackermann, M. Frosch, O. Kurzai and A. Schubert-Unkmeir (2009): “Expression of the meningococcal adhesin NadA is controlled by a transcriptional regulator of the MarR family,” Mol. Microbiol., 72, 1054–1067.

    • Crossref
    • Export Citation
  • Schoen, C., L. Kischkies, J. Elias and B. J. Ampattuu (2014): “Metabolism and virulence in Neisseria meningitidis,” Front. Cell. Infect. Microbiol., 4, 114.

    • PubMed
    • Export Citation
  • Schwarz, G. (1978): “Estimating the dimension of a model,” Ann. Statist., 6, 461–464.

    • Crossref
    • Export Citation
  • Signorelli, M., V. Vinciotti and E. C. Wit (2015): pnea: Parametric Network Enrichment Analysis, URL http://CRAN.R-project.org/package=pnea, r package version 1.2.0.

  • Tettelin, H., N. J. Saunders, J. Heidelberg, A. C. Jeffries, K. E. Nelson, J. A. Eisen, K. A. Ketchum, D. W. Hood, J. F. Peden, R. J. Dodson, W. C. Nelson, M. L. Gwinn, R. DeBoy, J. D. Peterson, E. K. Hickey, D. H. Haft, S. L. Salzberg, O. White, R. D. Fleischmann, B. A. Dougherty, T. Mason, A. Ciecko, D. S. Parksey, E. Blair, H. Cittone, E. B. Clark, M. D. Cotton, T. R. Utterback, H. Khouri, H. Qin, J. Vamathevan, J. Gill, V. Scarlato, V. Masignani, M. Pizza, G. Grandi, L. Sun, H. O. Smith, C. M. Fraser, E. R. Moxon, R. Rappuoli and J. Craig Venter (2000): “Complete genome sequence of neisseria meningitidis serogroup B strain MC58,” Science, 287, 1809–1815.

    • Crossref
    • PubMed
    • Export Citation
  • Vujačić, I., A. Abbruzzo and E. Wit (2015): “A computationally fast alternative to cross-validation in penalized Gaussian graphical models,” J. Stat. Comput. Simul., 85, 3628–3640.

    • Crossref
    • Export Citation
  • Wang, C., D. Sun and K. Toh (2010): “Solving log-determinant optimization problems by a Newton-CG primal proximal point algorithm,” SIAM J. Optim., 20, 2994.

    • Crossref
    • Export Citation
  • Wit, E. and A. Abbruzzo (2015): “Factorial graphical models for dynamic networks,” Net. Sci., 3, 37–57.

    • Crossref
    • Export Citation
  • Wit, E., E. v. d. Heuvel and J.-W. Romeijn (2012): “All models are wrong...: an introduction to model uncertainty,” Statistica Neerlandica, 66, 217–236.

    • Crossref
    • Export Citation
  • Wu, T. T. and K. Lange (2008): “Coordinate descent algorithms for lasso penalized regression,” Ann. Appl. Statist., 2, 224–244.

    • Crossref
    • Export Citation
  • Zou, H., T. Hastie and R. Tibshirani (2007): “On the “degrees of freedom” of the lasso,” Ann. Statist., 35, 2173–2192.

    • Crossref
    • Export Citation
Purchase article
Get instant unlimited access to the article.
$42.00
Log in
Already have access? Please log in.


Journal + Issues

SAGMB publishes significant research on the application of statistical ideas to problems arising from computational biology. The range of topics includes linkage mapping, association studies, gene finding and sequence alignment, protein structure prediction, design and analysis of microarrary data, molecular evolution and phylogenetic trees, DNA topology, and data base search strategies.

Search