Show Summary Details
More options …

# Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido

6 Issues per year

IMPACT FACTOR 2017: 0.812
5-year IMPACT FACTOR: 1.104

CiteScore 2017: 0.86

SCImago Journal Rank (SJR) 2017: 0.456
Source Normalized Impact per Paper (SNIP) 2017: 0.527

Mathematical Citation Quotient (MCQ) 2017: 0.04

Online
ISSN
1544-6115
See all formats and pricing
More options …
Volume 13, Issue 3

# Statistical inference of regulatory networks for circadian regulation

• School of Mathematics and Statistics, University of Glasgow, 15 University Gardens, Glasgow G12 8QW, UK
• Other articles by this author:
/ Dirk Husmeier
• Corresponding author
• School of Biology, Sir Harold Mitchell Building, University of St Andrews, St Andrews, Fife KY16 9TH, UK
• Email
• Other articles by this author:
/ Marco Grzegorczyk
• Johann Bernoulli Institute (JBI), Groningen University, Nijenborgh 9, 9747 AG Groningen, The Netherlands
• Other articles by this author:
Published Online: 2014-05-26 | DOI: https://doi.org/10.1515/sagmb-2013-0051

## Abstract

We assess the accuracy of various state-of-the-art statistics and machine learning methods for reconstructing gene and protein regulatory networks in the context of circadian regulation. Our study draws on the increasing availability of gene expression and protein concentration time series for key circadian clock components in Arabidopsis thaliana. In addition, gene expression and protein concentration time series are simulated from a recently published regulatory network of the circadian clock in A. thaliana, in which protein and gene interactions are described by a Markov jump process based on Michaelis-Menten kinetics. We closely follow recent experimental protocols, including the entrainment of seedlings to different light-dark cycles and the knock-out of various key regulatory genes. Our study provides relative network reconstruction accuracy scores for a critical comparative performance evaluation, and sheds light on a series of highly relevant questions: it quantifies the influence of systematically missing values related to unknown protein concentrations and mRNA transcription rates, it investigates the dependence of the performance on the network topology and the degree of recurrency, it provides deeper insight into when and why non-linear methods fail to outperform linear ones, it offers improved guidelines on parameter settings in different inference procedures, and it suggests new hypotheses about the structure of the central circadian gene regulatory network in A. thaliana.

## References

• Ahmed, A. and E. P. Xing (2009): “Recovering time-varying networks of dependencies in social and biological studies,” Proc. Natl. Acad. Sci., 106, 11878–11883.

• Äijö, T. and H. Lähdesmäki (2009): “Learning gene regulatory networks from gene expression measurements using non-parametric molecular kinetics,” Bioinformatics, 25, 2937–2944.

• Andrieu, C. and A. Doucet (1999): “Joint Bayesian model selection and estimation of noisy sinusoids via reversible jump MCMC,” IEEE T Signal Proces., 47, 2667–2676.

• Barenco, M., D. Tomescu, D. Brewer, R. Callard, J. Stark, and M. Hubank (2006): “Ranked prediction of p53 targets using hidden variable dynamic modeling,” Genome Biology, 7, R25.

• Beal, M., F. Falciani, Z. Ghahramani, C. Rangel, and D. Wild (2005): “A Bayesian approach to reconstructing genetic regulatory networks with hidden factors,” Bioinformatics, 21, 349–356.

• Beal, M. (2003): Variational Algorithms for Approximate Bayesian Inference, Ph.D. thesis, Gatsby Computational Neuroscience Unit, University College London, UK.Google Scholar

• Bengtsson, M., M. Hemberg, P. Rorsman, and A. Ståhlberg (2008): “Quantification of mRNA in single cells and modeling of RT-qPCR induced noise,” BMC Molecular Biology, 9, 63.

• Bishop, C. M. (2006): Pattern Recognition and Machine Learning, Singapore: Springer.Google Scholar

• Brandt, S. (1999): Data Analysis: Statistical and Computational Methods for Scientists and Engineers, New York, USA: Springer.Google Scholar

• Brooks, S. and A. Gelman (1999): “General methods for monitoring convergence of iterative simulations,” J. Comput. Graph. Stat., 7, 434–455.Google Scholar

• Butte, A. J. and I. S. Kohane (2000): “Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements,” in Pacific Symposium on Biocomputing, volume 5, 418–429.Google Scholar

• Ciocchetta, F. and J. Hillston (2009): “Bio-PEPA: A framework for the modeling and analysis of biological systems,” Theor. Comput. Sci., 410, 3065–3084.

• Davies, J. and M. Goadrich (2006): “The relationship between Precision-Recall and ROC curves,” Proceedings of the 23rd International Conference on Machine Learning, 233–240.Google Scholar

• Edwards, K., O. Akman, K. Knox, P. Lumsden, A. Thomson, P. Brown, A. Pokhilko, L. Kozma-Bognar, F. Nagy, D. Rand, A. J. Millar. (2010): “Quantitative analysis of regulatory flexibility under changing environmental conditions,” Mol. Syst. Biol., 6, 424.

• Feugier, F. and A. Satake (2012): “Dynamical feedback between circadian clock and sucrose availability explains adaptive response of starch metabolism to various photoperiods,” Front. Plant Sci., 3.

• Friedman, J., T. Hastie, and R. Tibshirani (2008): “Sparse inverse covariance estimation with the graphical Lasso,” Biostatistics, 9, 432–441.

• Friedman, J., T. Hastie, and R. Tibshirani (2010): “Regularization paths for generalized linear models via coordinate descent,” J. Stat. Softw., 33, 1–22.Google Scholar

• Friedman, N., M. Linial, I. Nachman, and D. Pe’er (2000): “Using Bayesian networks to analyze expression data,” J. Comput. Biol., 7, 601–620.

• Geiger, D. and D. Heckerman (1994): “Learning gaussian networks,” in International Conference on Uncertainty in Artificial Intelligence, Seattle, WA: Morgan Kaufmann Publishers, 235–243.Google Scholar

• Gelman, A. and D. Rubin (1992): “Inference from iterative simulation using multiple sequences,” Stat. Sci., 7, 457–472.

• Gillespie, D. (1977): “Exact stochastic simulation of coupled chemical reactions,” J. Phys. Chem., 81, 2340–2361.

• Grzegorczyk, M. and D. Husmeier (2012): “A non-homogeneous dynamic Bayesian network with sequentially coupled interaction parameters for applications in systems and synthetic biology,” Stat. Appl. Genet. Mol. Biol. (SAGMB), 11, article 7.Google Scholar

• Grzegorczyk, M. and D. Husmeier (2013): “Regularization of non-homogeneous dynamic Bayesian networks with global information-coupling based on hierarchical Bayesian models,” Mach. Learn., 91, 1–50.Google Scholar

• Guerriero, M., A. Pokhilko, A. Fernández, K. Halliday, A. Millar, and J. Hillston (2012): “Stochastic properties of the plant circadian clock,” J. R. Soc. Interface, 9, 744–756.

• Hanley, J. A. and B. J. McNeil (1982): “The meaning and use of the area under a receiver operating characteristic (ROC) curve,” Radiology, 143, 29–36.Google Scholar

• Hastie, T., R. Tibshirani, and J. J. H. Friedman (2001): The Elements of Statistical Learning, volume 1, New York: Springer.Google Scholar

• Herrero, E., E. Kolmos, N. Bujdoso, Y. Yuan, M. Wang, M. C. Berns, H. Uhlworm, G. Coupland, R. Saini, M. Jaskolski, A. Webb, J. Gonçalves, S. J. Davis. (2012): “EARLY FLOWERING4 recruitment of EARLY FLOWERING3 in the nucleus sustains the Arabidopsis circadian clock,” Plant Cell, 24, 428–443.Google Scholar

• Husmeier, D. (1999): Neural Networks for Conditional Probability Estimation: Forecasting Beyond Point Predictions, Perspectives in Neural Computing, London: Springer.Google Scholar

• Husmeier, D. (2003): “Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks,” Bioinformatics, 19, 2271–2282.

• Kalaitzis, A. A., A. Honkela, P. Gao, and N. D. Lawrence (2013): gptk: Gaussian processes tool-kit, URL http://CRAN.R-project.org/package=gptk, R package version 1.06.

• Ko, Y., C. Zhai, and S. Rodriguez-Zas (2007): “Inference of gene pathways using Gaussian mixture models,” in International Conference on Bioinformatics and Biomedicine, Fremont, CA, 362–367.Google Scholar

• Ko, Y., C. Zhai, and S. Rodriguez-Zas (2009): “Inference of gene pathways using mixture Bayesian networks,” BMC Syst. Biol., 3, 54.

• Kolmos, E., M. Nowak, M. Werner, K. Fischer, G. Schwarz, S. Mathews, H. Schoof, F. Nagy, J. M. Bujnicki, and S. J. Davis (2009): “Integrating ELF4 into the circadian system through combined structural and functional studies,” HFSP J, 3, 350–366.

• Lawrence, N. D., M. Girolami, M. Rattray, and G. Sanguinetti (2010): Learning and inference in computational systems biology, Cambridge, MA: MIT Press Cambridge.Google Scholar

• Lèbre, S., J. Becq, F. Devaux, G. Lelandais, and M. Stumpf (2010): “Statistical inference of the time-varying structure of gene-regulation networks,” BMC Syst. Biol., 4.

• Locke, J. C. W., M. M. Southern, L. Kozma-Bognár, V. Hibberd, P. E. Brown, M. S. Turner, and A. J. Millar (2005): “Extension of a genetic network model by iterative experimentation and mathematical analysis,” Mol. Syst. Biol., 1.

• Locke, J. C. W., L. Kozma-Bognár, P. D. Gould, B. Fehér, E. Kevei, F. Nagy, M. S. Turner, A. Hall, and A. J. Millar (2006): “Experimental validation of a predicted feedback loop in the multi-oscillator clock of Arabidopsis thaliana,” Mol. Syst. Biol., 2.

• MacKay, D. J. (1992): “Bayesian interpolation,” Neural Comput., 4, 415–447.

• Margolin, A. A., I. Nemenman, K. Basso, C. Wiggins, G. Stolovitzky, R. Dalla-Favera, and A. Califano (2006): “ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context,” BMC Bioinformatics, 7.

• Marin, J.-M. and C. P. Robert (2007): Bayesian core: A practical approach to computational Bayesian statistics, New York, USA: Springer.Google Scholar

• Meyer, P. E., F. Lafitte, and G. Bontempi (2008): “minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information,” BMC Bioinformatics, 9.

• Morrissey, E. R., M. A. Juárez, K. J. Denby, and N. J. Burroughs (2011): “Inferring the time-invariant topology of a nonlinear sparse gene regulatory network using fully Bayesian spline autoregression,” Biostatistics, 12, 682–694.

• Murphy, K. P. (2012): Machine learning: a probabilistic perspective, Cambridge, MA: MIT Press.Google Scholar

• Nabney, I. (2002): NETLAB: algorithms for pattern recognition, Springer.Google Scholar

• Neuneier, R., F. Hergert, W. Finnoff, and D. Ormoneit (1994): “Estimation of conditional densities: a comparison of neural network approaches,” in International Conference on Artificial Neural Networks, National Cheng Kung University, Taiwan: Springer, 689–692.Google Scholar

• Opgen-Rhein, R. and K. Strimmer (2007): “From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data,” BMC Syst. Biol., 1.Google Scholar

• Pokhilko, A., A. Fernández, K. Edwards, M. Southern, K. Halliday, and A. Millar (2012): “The clock gene circuit in Arabidopsis includes a repressilator with additional feedback loops,” Mol. Syst. Biol., 8, 574.

• Pokhilko, A., S. Hodge, K. Stratford, K. Knox, K. Edwards, A. Thomson, T. Mizuno, and A. Millar (2010): “Data assimilation constrains new connections and components in a complex, eukaryotic circadian clock model,” Mol. Syst. Biol., 6.

• Pokhilko, A., P. Mas, A. J. Millar, et al. (2013): “Modeling the widespread effects of TOC1 signaling on the plant circadian clock and its outputs,” BMC Syst. Biol., 7, 1–12.Google Scholar

• Rasmussen, C. E., R. M. Neal, G. E. Hinton, D. van Camp, M. Revow, Z. Ghahramani, R. Kustra, and R. Tibshirani (1996): “The DELVE manual,” URL http://www.cs.toronto.edu/delve.

• Rasmussen, C. E. (1996): Evaluation of Gaussian processes and other methods for non-linear regression, Ph.D. thesis, Citeseer.Google Scholar

• Rasmussen, C. and C. Williams (2006): Gaussian processes for machine learning, volume 1, MA: MIT press Cambridge.Google Scholar

• Rogers, S. and M. Girolami (2005): “A Bayesian regression approach to the inference of regulatory networks from gene expression data,” Bioinformatics, 21, 3131–3137.

• Schäfer, J. and K. Strimmer (2005): “A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics,” Stat. Appl. Genet. Mol. Biol., 4.

• Smith, M. and R. Kohn (1996): “Nonparametric regression using Bayesian variable selection,” J Econometrics, 75, 317–343.

• Solak, E., R. Murray-Smith, W. E. Leithead, D. J. Leith, and C. E. Rasmussen (2002): “Derivative observations in Gaussian process models of dynamic systems,” Advances in Neural Information Processing Systems, MIT Press: Vancouver, Canada, 1033–1040.Google Scholar

• Tibshirani, R. (1995): “Regression shrinkage and selection via the Lasso,” J. R. Stat. Soc. Series B, 58, 267–288.Google Scholar

• TiMet (2014): “The TiMet Project - Linking the clock to metabolism: URL http://timing-metabolism.eu.

• Tipping, M. and A. Faul (2003): “Fast marginal likelihood maximisation for sparse Bayesian models,” in Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West, FL, 1, 3–6.Google Scholar

• Tipping, M. (2001): “Spare Bayesian learning and the relevance vector machine,” Journal of Machine Learning Research, 1, 211–244.Google Scholar

• Vyshemirsky, V. and M. Girolami (2008): “Bayesian ranking of biochemical system models,” Bioinformatics, 24, 833–839.

• Weirauch, M. T., A. Cote, R. Norel, M. Annala, Y. Zhao, T. R. Riley, J. Saez-Rodriguez, T. Cokelaer, A. Vedenko, S. Talukder, DREAM5 Consortium, Bussemaker, H. J., Morris, Q. D., Bulyk, M. L., Stolvitzky, G, and T. R. Hughes (2013): “Evaluation of methods for modeling transcription factor sequence specificity,” Nat. Biotechnol., 31, 126–134.

• Werhli, A. V., M. Grzegorczyk, and D. Husmeier (2006): “Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical Gaussian models and Bayesian networks,” Bioinformatics, 22, 2523–2531.

• Wilkinson, D. J. (2009): “Stochastic modeling for quantitative description of heterogeneous biological systems,” Nat. Rev. Genet., 10, 122–133.

• Wilkinson, D. (2011): Stochastic modeling for systems biology, volume 44, Taylor & Francis, Boca Raton, FL: CRC press.Google Scholar

• Zoppoli, P., S. Morganella, and M. Ceccarelli (2010): “TimeDelay-ARACNE: Reverse engineering of gene networks from time-course data by ab information theoretic approach,” BMC Bioinformatics, 11.

• Zou, H. and T. Hastie (2005): “Regularization and variable selection via the Elastic Net,” J. R. Stat. Soc. Series B, 67, 301–320.

Corresponding author: Dirk Husmeier, School of Mathematics and Statistics, University of Glasgow, 15 University Gardens, Glasgow G12 8QW, UK, e-mail:

Published Online: 2014-05-26

Published in Print: 2014-06-01

Note that the sets of potential regulators are defined for each gene g specifically. That is, the potential regulators for two target variables yg and $yg′$ can be different, e.g., if certain (biologically-motivated) restrictions are imposed.

For consistency with the fundamental equation of transcription, equation (1), we will enforce that each regulator set πg for yg contains the concentration xg of g, symbolically xg∈πg.

Note that vector x·,t includes every available regulator without any dependency on the target gene g.

Note that the repeated bi-partitioning of the genes into targets and putative regulators renders Glasso equivalent to Lasso, as discussed on page 4 of Friedman et al. (2008). Lasso will be discussed in Section 2.3.

We set: ν=0.005, Aδ=2, and Bδ=0.2, as in Grzegorczyk and Husmeier (2012).

We note that the coupled variant of the non-homogeneous Bayesian regression model cannot be represented properly as a graphical model, as the regression parameter vectors are sequentially coupled among adjacent segments via equations (21–22).

For each yg we apply exactly the same permutation to order the realizations of the explanatory variables (covariates) and thereby ensure that the segment-specific design matrices are built properly.

In our study we follow Rogers and Girolami (2005) and use a slightly modified version of the fast marginal likelihood algorithm from Tipping et al. (2003) for optimization.

We use the authors’ terminology, although the model is not a proper Bayesian network.

More precisely, $μ g,h*$ is obtained by deleting the element corresponding to the target variable yg,t in μg,h, and $Σ g,h*$ is obtained by deleting the row and the column corresponding to yg,t in Σg,h.

Note that the abbreviation “BGe” was introduced by Geiger and Heckerman (1994) and stands for Bayesian metric for Gaussian networks having score equivalence; see Geiger and Heckerman (1994) for more details.

We turned off the translation of those proteins contributing to interactions we like to surpress.

In the model equations defined by Guerriero et al. (2012) the concentration of P only appears in a product with the binary light indicator L, where the light variable L is equal to zero in the absence of light.

For the Bayesian methods this can be enforced by setting the prior P(πg) to zero for all πg with xg∉πg.

Matlab software for Disciplined Convex Programming: http://cvxr.com/cvx/.

Note that the maximal number of hidden nodes n is restricted by the number of regulators, Gg. In our simulation study we analyzed various data sets, and we employed the lowest Gg as an upper bound on the number of hidden nodes n.

In our study we initialized the EM-algorithm with allocations obtained by the k-means cluster algorithm. Thereby the initial 𝕂g centers of the k-means algorithms were sampled from a multivariate Gaussian N(μ, I) distribution, where I is the identity matrix and μ is a random expectation vector with entries sampled independently from continuous uniform distributions on the interval [–1, +1]. To avoid that the EM-algorithm is initialized with allocations that possess unoccupied (empty) mixture components, we re-sampled the initial centers and re-ran the k-means algorithm whenever we obtained k-means outputs with empty components.

Loosely speaking, this setting (μ0=0 and T0=I) reflects our “prior belief” that all domain variables, i.e., the potential regulators and the target variable, are i.i.d. standard normally distributed.

The sensitivity is the proportion of true interactions that have been detected, the specificity is the proportion of non-interactions that have been avoided.

Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 13, Issue 3, Pages 227–273, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302,

Export Citation

©2014 by Walter de Gruyter Berlin/Boston.

## Citing Articles

[1]
Mircea Dumitru, Ali Mohammad-Djafari, and Simona Baghai Sain
EURASIP Journal on Bioinformatics and Systems Biology, 2016, Volume 2016, Number 1
[2]
Andrej Aderhold, Dirk Husmeier, and Marco Grzegorczyk
Statistics and Computing, 2017, Volume 27, Number 4, Page 1003
[3]
Marco Grzegorczyk, Andrej Aderhold, and Dirk Husmeier
Computational Statistics, 2017, Volume 32, Number 2, Page 717
[4]
Vinny Davies, Richard Reeve, William T. Harvey, Francois F. Maree, and Dirk Husmeier
Computational Statistics, 2017, Volume 32, Number 3, Page 803
[5]
Laurent Mombaerts, Alexandre Mauroy, and Jorge Gonçalves
IFAC-PapersOnLine, 2016, Volume 49, Number 26, Page 109