Accessible Unlicensed Requires Authentication Published by De Gruyter February 14, 2015

Inferring bi-directional interactions between circadian clock genes and metabolism with model ensembles

Marco Grzegorczyk ORCID logo, Andrej Aderhold and Dirk Husmeier

Abstract

There has been much interest in reconstructing bi-directional regulatory networks linking the circadian clock to metabolism in plants. A variety of reverse engineering methods from machine learning and computational statistics have been proposed and evaluated. The emphasis of the present paper is on combining models in a model ensemble to boost the network reconstruction accuracy, and to explore various model combination strategies to maximize the improvement. Our results demonstrate that a rich ensemble of predictors outperforms the best individual model, even if the ensemble includes poor predictors with inferior individual reconstruction accuracy. For our application to metabolomic and transcriptomic time series from various mutagenesis plants grown in different light-dark cycles we also show how to determine the optimal time lag between interactions, and we identify significant interactions with a randomization test. Our study predicts new statistically significant interactions between circadian clock genes and metabolites in Arabidopsis thaliana, and thus provides independent statistical evidence that the regulation of metabolism by the circadian clock is not uni-directional, but that there is a statistically significant feedback mechanism aiming from metabolism back to the circadian clock.


Corresponding author: Marco Grzegorczyk, Johann Bernoulli Institute (JBI), Faculteit Wiskunde en Natuurwetenschappen (FWN), Groningen University, Nijenborgh 9, Postbus 407, 9747 AG Groningen 9700 AK, The Netherlands, e-mail:

Acknowledgments

The work described in the present article is part of the TiMet project on linking the circadian clock to metabolism in plants. TiMet is a collaborative project (Grant Agreement 245143) funded by the European Commission FP7, in response to call FP7-KBBE-2009-3. A.A. is supported by the TiMet project. Parts of the work were done while M.G. was supported by the German Research Foundation (DFG), research grant GR3853/1-1. We would like to thank Catherine Higham for helpful discussions.

Appendix

Marginal likelihood for time delays

In the hierarchical Bayesian regression (HBR) model, described in Subsection 2.5 of Aderhold et al. (2014), the target observations yg are modelled independently for each target g (g=1, …, G):

(13)yg|(Xπg,σg2,wg,πg)~N(XπgTwg,σg2I) (13)

where wg are the regression parameter vectors, determined by the sets of regulators (covariates) πg, Xπg is the regressor matrix implied by the regulator set πg, and σg2 is the node-specific noise variance parameter. Gaussian priors are imposed on the regression parameter vectors:

(14)wg|(σg2,δg,πg)~N(0,δgσg2I) (14)

where δg is the target-specific SNR-hyperparameter. The noise variance parameters σg2 and the SNR-hyperparameters δg are assumed to be inverse Gamma distributed with fixed (hyper-)hyperparameters (g=1, …, G). A more detailed model description can be found in Subsection 2.5 of Aderhold et al. (2014). Here, we introduce a new “time lag” parameter τ∈{0, …, τMAX}, which indicates the time lag between the target observations yg and the observations of the regulators in πg. The time lag parameter τ describes how to shift the observations of the regulators in eq. (4), i.e., τ describes how to build the regressor matrices Xπg from the data D. Although Xπg depends on τ, we do not make this explicit in our notation.

As the targets g=1, …, G are modelled independently and the overall network structure ℳ is determined by the individual regulator sets, symbolically ℳ=(π1, …, πG), the joint marginal likelihood has a modular form:

(15)p(D|τ)=g=1Gδgπg(wgσg2p(yg|Xπg,σg2,wg,πg,δg,τ)p(σg2)p(wg|σg2,δg,πg)dσg2dwg)p(πg)p(δg)dδg (15)

For a given regulator set πg and a fixed SNR-hyperparameter δg marginalizing the HBR likelihood over the regression parameters and the noise variances:

(16)p(yg|Xπg,πg,δg,τ)=σg2(wgp(yg|Xπg,σg2,wg,πg,δg,τ)p(wg|σg2,δg,πg)dwg)p(σg2)dσg2 (16)

yields a closed-form solution, see eq (15) in Aderhold et al. (2014).[6] It then follows from eqns. (15–16):

(17)p(D|τ)=g=1Gδg(πgp(yg|Xπg,πg,δg,τ)p(πg))p(δg)dδg (17)

The prior, p(πg), on the regulator sets, πg, in the HBR model is assumed to be a uniform distribution subject to a constraint on the maximal cardinality ℱ of the number of regulators [see Grzegorczyk and Husmeier (2012) and Aderhold et al. (2014)]. We thus obtain:

(18)p(D|τ)=g=1Gδg(1Tgπg:|πg|p(yg|Xπg,πg,δg,τ))p(δg)dδg (18)

where Tg is the number of all valid regulator sets. Hence, if there are N potential regulators for a target g, then the number of valid parent sets Tg is given by:

(19)Tg=f=0(fN)=O(N) (19)

so that Tg grows polynomially with the power of ℱ.[7] If the inner sums can be computed by full-enumeration, the marginal likelihood can be estimated by repeatedly sampling δg (g=1, …, G) from their inverse Gamma prior distributions and computing the following Monte-Carlo estimator:

(20)p(D|τ^)=g=1G(1Ii=1I(1Tgπg:|πg|p(yg|Xπg,πg,δg(i),τ))) (20)

where δg(i) (g=1, … G; i=1, …, I) is the i-th sample drawn for target g from an inverse Gamma prior.[8] Alternatively, and in particular if the inner sums in eq. (18) are not computationally feasible, one can resort to standard numerical procedures based on MCMC, like Chib’s method (Chib and Jeliazkov, 2001).

To get an idea of the approximation error of our marginal likelihood estimator, we consider J independent Monte-Carlo estimators ψ^τ,1,,ψ^τ,J of size I, each computed with eq. (20). For each lag τ we can then consider the distributional form of these estimators to get an impression of (i.e., to bound) the approximation error. For the HBR model we have provided the technical details above, and we note that the marginal likelihoods of the HBR-nl and the HBR-light model can also be approximated along these lines. As we found that the marginal likelihoods of these three models show the same trends and peak at the same lag (a time shift of τ=2 h, referring to one single data point shift), we show only the results for the HBR model in Figure 8.

In our application the data set D consists of a set of individual time series. When the network interactions are modelled subject to a time lag corresponding to τ data points, then the first τ target observations have to be withdrawn from each time series.[9] For a fair comparison among different time lags τ we first choose a maximal lag τMAX, and we withdraw the first τMAX observations from all time series. Subsequently, the remaining target observations yg (g=1, …, G) can be used for all lags τ∈{0, …, τMAX}, i.e., this approach ensures that the target observations do not differ from τ to τ and that exactly the same target observations have to be modelled for all lags τ.[10]

Detailed simulation results

For our performance evaluation on the simulated data described in Section 4.1, we were running hundreds of simulations for a variety of different settings, related to the observation status of the molecular components (mRNA only versus mRNAs and proteins), the method for derivative estimation (described in Section 3.1), the regulatory network structure (shown in Figure 1), and the method applied for learning this structure from data (reviewed in Sections 3.2 and 3.3). The results from these studies are shown in Figures 1214. These results are complex, and patterns are not easily discernible by visual inspection. This has motivated the application of the ANOVA scheme described in Section 3.5.

Figure 12 Detailed AUROC scores: coarse gradient.The boxplots show a fraction of the results obtained from the simulated data described in Section 4.1. The AUROC scores were obtained from the coarse response gradients with 2-h intervals. The corresponding results for the fine gradient and the Gaussian process interpolation are displayed in Figures 13, 14. Left panel: Incomplete data, with mRNA but no protein concentrations. Right panel: Complete data that include both protein and mRNA concentrations. Each panel contains six subpanels, representing the six different network topologies shown in Figure 1.

Figure 12

Detailed AUROC scores: coarse gradient.

The boxplots show a fraction of the results obtained from the simulated data described in Section 4.1. The AUROC scores were obtained from the coarse response gradients with 2-h intervals. The corresponding results for the fine gradient and the Gaussian process interpolation are displayed in Figures 13, 14. Left panel: Incomplete data, with mRNA but no protein concentrations. Right panel: Complete data that include both protein and mRNA concentrations. Each panel contains six subpanels, representing the six different network topologies shown in Figure 1.

Figure 13 Detailed AUROC scores: fine gradient.This figure corresponds to Figure 12, but shows the AUROC scores obtained with the fine gradient (24-min intervals) rather than the coarse gradient (2-h intervals). See Figure 12 for details.

Figure 13

Detailed AUROC scores: fine gradient.

This figure corresponds to Figure 12, but shows the AUROC scores obtained with the fine gradient (24-min intervals) rather than the coarse gradient (2-h intervals). See Figure 12 for details.

Figure 14 Detailed AUROC scores: fine gradient.This figure corresponds to Figure 12, but shows the AUROC scores obtained with the gradient from the Gaussian process interpolation rather than the finite difference method. See Figure 12 for details.

Figure 14

Detailed AUROC scores: fine gradient.

This figure corresponds to Figure 12, but shows the AUROC scores obtained with the gradient from the Gaussian process interpolation rather than the finite difference method. See Figure 12 for details.

References

Aderhold, A., D. Husmeier and M. Grzegorczyk (2014): “Statistical inference of regulatory networks for circadian regulation,” Stat. Appl. Genet. Mol. Biol., (SAGMB), 13, 227–273. Search in Google Scholar

Ahmed, A. and E. P. Xing (2009): “Recovering time-varying networks of dependencies in social and biological studies,” Proc. Natl. Acad. Sci., 106, 11878–11883. Search in Google Scholar

Äijö, T. and H. Lähdesmäki (2009): “Learning gene regulatory networks from gene expression measurements using non-parametric molecular kinetics,” Bioinformatics, 25, 2937–2944. Search in Google Scholar

Barenco, M., D. Tomescu, D. Brewer, R. Callard, J. Stark and M. Hubank (2006): “Ranked prediction of p53 targets using hidden variable dynamic modeling,” Genome Biol., 7, R25. Search in Google Scholar

Beal, M. (2003): Variational algorithms for approximate bayesian inference, Ph.D. thesis, Gatsby Computational Neuroscience Unit, University College London, UK. Search in Google Scholar

Beal, M., F. Falciani, Z. Ghahramani, C. Rangel and D. Wild (2005): “A Bayesian approach to reconstructing genetic regulatory networks with hidden factors,” Bioinformatics, 21, 349–356. Search in Google Scholar

Bläsing, O. E., Y. Gibon, M. Günther, M. Höhne, R. Morcuende, D. Osuna, O. Thimm, B. Usadel, W.-R. Scheible and M. Stitt (2005): “Sugars and circadian regulation make major contributions to the global regulation of diurnal gene expression in arabidopsis,” The Plant Cell Online, 17, 3257–3281. Search in Google Scholar

Brandt, S. (1999): Data analysis: statistical and computational methods for scientists and engineers, Springer: New York, USA. Search in Google Scholar

Chevaleyre, Y., U. Endriss, J. Lang and N. Maudet (2007): A short introduction to computational social choice, Springer. Search in Google Scholar

Chib, S. and I. Jeliazkov (2001): “Marginal likelihood from the Metropolis-Hastings output,” J. Am. Stat. Assoc., 96, 270–281. Search in Google Scholar

Ciocchetta, F. and J. Hillston (2009): “Bio-PEPA: A framework for the modelling and analysis of biological systems,” Theor. Comput. Sci., 410, 3065–3084. Search in Google Scholar

Dalchau, N., S. J. Baek, H. M. Briggs, F. C. Robertson, A. N. Dodd, M. J. Gardner, M. A. Stancombe, M. J. Haydon, G.-B. Stan, J. M. Gonçalves, A. A. Webb (2011): “The circadian oscillator gene gigantea mediates a long-term response of the arabidopsis thaliana circadian clock to sucrose,” Proc. Natl. Acad. Sci., 108, 5104–5109. Search in Google Scholar

Davies, J. and M. Goadrich (2006): “The relationship between Precision-Recall and ROC curves,” Proc. 23rd Int. Conf. Mach. learn., 233–240. Search in Google Scholar

Feugier, F. and A. Satake (2012): “Dynamical feedback between circadian clock and sucrose availability explains adaptive response of starch metabolism to various photoperiods,” Frontiers in Plant Science, 3. Search in Google Scholar

Flis, A., P. Fernandez, T. Zielinski, R. Sulpice, A. Pokhilko, H. G. McWatters, A. J. Millar, M. Stitt and K. J. Halliday (2013): “Biological regulation identified by sharing timeseries data outside the ’omics,” Submitted. Search in Google Scholar

Friedman, N., M. Linial, I. Nachman and D. Pe’er (2000): “Using Bayesian networks to analyze expression data,” J. Comput. Biol., 7, 601–620. Search in Google Scholar

Fukushima, A., M. Kusano, N. Nakamichi, M. Kobayashi, N. Hayashi, H. Sakakibara, T. Mizuno and K. Saito (2009): “Impact of clock-associated arabidopsis pseudo-response regulators in metabolic coordination,” Proc. Natl. Acad. Sci., 106, 7251–7256. Search in Google Scholar

Geigenberger, P. (2011): “Regulation of starch biosynthesis in response to a fluctuating environment,” Plant Physiol., 155, 1566–1577. Search in Google Scholar

Geiger, D. and D. Heckerman (1994): “Learning gaussian networks,” In: International Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers, pp. 235–243. Search in Google Scholar

Gille, S., K. Cheng, M. E. Skinner, A. H. Liepman, C. G. Wilkerson and M. Pauly (2011): “Deep sequencing of voodoo lily (Amorphophallus konjac): an approach to identify relevant genes involved in the synthesis of the hemicellulose glucomannan,” Planta, 234, 515–526. Search in Google Scholar

Graf, A., A. Schlereth, M. Stitt and A. M. Smith (2010): “Circadian control of carbohydrate availability for growth in Arabidopsis plants at night,” Proc. Natl. Acad. Sci., 107, 9458–9463. Search in Google Scholar

Grzegorczyk, M. and D. Husmeier (2012): “A non-homogeneous dynamic Bayesian network with sequentially coupled interaction parameters for applications in systems and synthetic biology,” Stat. Appl. Genet. Mol. Biol. (SAGMB), 11, article 7. Search in Google Scholar

Grzegorczyk, M. and D. Husmeier (2013): “Regularization of non-homogeneous dynamic Bayesian networks with global information-coupling based on hierarchical Bayesian models,” Mach. Learn., 1–50. Search in Google Scholar

Guerriero, M., A. Pokhilko, A. Fernández, K. Halliday, A. Millar and J. Hillston (2012): “Stochastic properties of the plant circadian clock,” J. R. Soc. Interf., 9, 744–756. Search in Google Scholar

Hanley, J. A. and B. J. McNeil (1982): “The meaning and use of the area under a receiver operating characteristic (ROC) curve,” Radiology, 143, 29–36. Search in Google Scholar

Hastie, T., R. Tibshirani and J. J. H. Friedman (2001): The elements of statistical learning, volume 1, Springer New York. Search in Google Scholar

Haydon, M. J., O. Mielczarek, F. C. Robertson, K. E. Hubbard and A. A. Webb (2013): “Photosynthetic entrainment of the arabidopsis thaliana circadian clock,” Nature, 502, 689–692. Search in Google Scholar

Herrero, E., E. Kolmos, N. Bujdoso, Y. Yuan, M. Wang, M. C. Berns, H. Uhlworm, G. Coupland, R. Saini, M. Jaskolski, A. Webb, J. Gonçalves and S. J. Davis (2012): “EARLY FLOWERING4 recruitment of EARLY FLOWERING3 in the nucleus sustains the Arabidopsis circadian clock,” Plant Cell Online, 24, 428–443. Search in Google Scholar

Kalaitzis, A. A., A. Honkela, P. Gao and N. D. Lawrence (2013): gptk: Gaussian processes tool-kit, URL http://CRAN.R-project.org/package=gptk, R package version 1.06. Search in Google Scholar

Kikis, E. A., R. Khanna and P. H. Quail (2005): “ELF4 is a phytochrome-regulated component of a negative-feedback loop involving the central oscillator components CCA1 and LHY,” The Plant Journal, 44, 300–313. Search in Google Scholar

Ko, Y., C. Zhai and S. Rodriguez-Zas (2009): “Inference of gene pathways using mixture Bayesian networks,” BMC Systems Biol., 3, 54. Search in Google Scholar

Kolmos, E., M. Nowak, M. Werner, K. Fischer, G. Schwarz, S. Mathews, H. Schoof, F. Nagy, J. M. Bujnicki and S. J. Davis (2009): “Integrating ELF4 into the circadian system through combined structural and functional studies,” HFSP J., 3, 350–366. Search in Google Scholar

Kuncheva, L. I. (2004): Combining pattern classifiers: methods and algorithms, John Wiley & Sons, Inc., Hoboken, NJ. Search in Google Scholar

Lawrence, N. D., M. Girolami, M. Rattray and G. Sanguinetti (2010): Learning and inference in computational systems biology, MIT Press, Cambridge, MA. Search in Google Scholar

Locke, J. C. W., L. Kozma-Bognár, P. D. Gould, B. Fehér, E. Kevei, F. Nagy, M. S. Turner, A. Hall and A. J. Millar (2006): “Experimental validation of a predicted feedback loop in the multi-oscillator clock of Arabidopsis thaliana,” Mol. Systems Biol., 2:59. Search in Google Scholar

Marbach, D., J. C. Costello, R. Küffner, N. M. Vega, R. J. Prill, D. M. Camacho, K. R. Allison, M. Kellis, J. J. Collins, G. Stolovitzky, et al. (2012): “Wisdom of crowds for robust gene network inference,” Nat. Methods, 9, 796–804. Search in Google Scholar

Margolin, A. A., I. Nemenman, K. Basso, C. Wiggins, G. Stolovitzky, R. Dalla-Favera and A. Califano (2006): “ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context,” BMC Bioinformatics, 7 (Suppl. 1). Search in Google Scholar

Morrissey, E. R., M. A. Juárez, K. J. Denby and N. J. Burroughs (2011): “Inferring the time-invariant topology of a nonlinear sparse gene regulatory network using fully Bayesian spline autoregression,” Biostatistics, 12, 682–694. Search in Google Scholar

Pokhilko, A., S. Hodge, K. Stratford, K. Knox, K. Edwards, A. Thomson, T. Mizuno and A. Millar (2010): “Data assimilation constrains new connections and components in a complex, eukaryotic circadian clock model,” Mol. Systems Biol., 6:416. Search in Google Scholar

Pokhilko, A., A. Fernández, K. Edwards, M. Southern, K. Halliday and A. Millar (2012): “The clock gene circuit in Arabidopsis includes a repressilator with additional feedback loops,” Mol. Systems Biol., 8, 574. Search in Google Scholar

Pokhilko, A., P. Mas and A. J. Millar (2013): “Modelling the widespread effects of TOC1 signalling on the plant circadian clock and its outputs,” BMC Systems Biol., 7, 1–12. Search in Google Scholar

Polikar, R. (2006): “Ensemble based systems in decision making,” Circuits and Systems Magazine, IEEE, 6, 21–45. Search in Google Scholar

Rasmussen, C. E. (1996): Evaluation of Gaussian processes and other methods for non-linear regression, Ph.D. thesis, Graduate Department of Computer Science, University of Toronto, Canada. Search in Google Scholar

Rasmussen, C. E., R. M. Neal, G. E. Hinton, D. van Camp, M. Revow, Z. Ghahramani, R. Kustra and R. Tibshirani (1996): “The DELVE manual,” URL http://www.cs.toronto.edu/delve. Search in Google Scholar

Rogers, S. and M. Girolami (2005): “A Bayesian regression approach to the inference of regulatory networks from gene expression data,” Bioinformatics, 21, 3131–3137. Search in Google Scholar

Schäfer, J. and K. Strimmer (2005): “A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics,” Stat. Appl. Genom. Mol. Biol., 4(1), Article 32. Search in Google Scholar

Solak, E., R. Murray-Smith, W. E. Leithead, D. J. Leith and C. E. Rasmussen (2003): “Derivative observations in Gaussian process models of dynamic systems,” Advances in Neural Information Processing Systems, 15, (Becker, S., Thrun, S., and Obermayer, K. Eds.) MIT Press, 1033–1040. Search in Google Scholar

Tibshirani, R. (1995): “Regression shrinkage and selection via the Lasso,” J. R. Stat. Soc.. Series B (Methodological), 58, 267–288. Search in Google Scholar

Wang, R., K. Guegler, S. T. LaBrie and N. M. Crawford (2000): “Genomic analysis of a nutrient response in arabidopsis reveals diverse expression patterns and novel metabolic and potential regulatory genes induced by nitrate,” The Plant Cell Online, 12, 1491–1509. Search in Google Scholar

Published Online: 2015-2-14
Published in Print: 2015-4-1

©2015 by De Gruyter