Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Journal of Causal Inference

Ed. by Imai, Kosuke / Pearl, Judea / Petersen, Maya Liv / Sekhon, Jasjeet / van der Laan, Mark J.

See all formats and pricing
More options …

Invariant Causal Prediction for Nonlinear Models

Christina Heinze-Deml / Jonas Peters / Nicolai Meinshausen
Published Online: 2018-09-18 | DOI: https://doi.org/10.1515/jci-2017-0016


An important problem in many domains is to predict how a system will respond to interventions. This task is inherently linked to estimating the system’s underlying causal structure. To this end, Invariant Causal Prediction (ICP) [1] has been proposed which learns a causal model exploiting the invariance of causal relations using data from different environments. When considering linear models, the implementation of ICP is relatively straightforward. However, the nonlinear case is more challenging due to the difficulty of performing nonparametric tests for conditional independence.

In this work, we present and evaluate an array of methods for nonlinear and nonparametric versions of ICP for learning the causal parents of given target variables. We find that an approach which first fits a nonlinear model with data pooled over all environments and then tests for differences between the residual distributions across environments is quite robust across a large variety of simulation settings. We call this procedure “invariant residual distribution test”. In general, we observe that the performance of all approaches is critically dependent on the true (unknown) causal structure and it becomes challenging to achieve high power if the parental set includes more than two variables.

As a real-world example, we consider fertility rate modeling which is central to world population projections. We explore predicting the effect of hypothetical interventions using the accepted models from nonlinear ICP. The results reaffirm the previously observed central causal role of child mortality rates.

Keywords: causal structure learning; structural equation models; invariance


  • 1.

    Peters J, Bühlmann P, Meinshausen N. Causal inference using invariant prediction: identification and confidence intervals. J R Stat Soc, Ser B (with discussion). 2016;78(5):947–1012.CrossrefGoogle Scholar

  • 2.

    Pearl J. Causality: Models, Reasoning, and Inference. 2nd ed. New York, USA: Cambridge University Press; 2009.Google Scholar

  • 3.

    Spirtes P, Glymour C, Scheines R. Causation, Prediction, and Search. 2nd ed. Cambridge, USA: MIT Press; 2000.Google Scholar

  • 4.

    Peters J, Janzing D, Schölkopf B. Elements of Causal Inference: Foundations and Learning Algorithms. Cambridge, MA, USA: MIT Press; 2017.Google Scholar

  • 5.

    Chickering DM. Optimal structure identification with greedy search. J Mach Learn Res. 2002;3:507–54.Google Scholar

  • 6.

    Peters J, Mooij JM, Janzing D, Schölkopf B. Causal discovery with continuous additive noise models. J Mach Learn Res. 2014;15:2009–53.Google Scholar

  • 7.

    Heckerman D. A Bayesian approach to causal discovery. Technical report, Microsoft Research (MSR-TR-97-05). 1997.

  • 8.

    Hauser A, Bühlmann P. Jointly interventional and observational data: estimation of interventional Markov equivalence classes of directed acyclic graphs. J R Stat Soc B. 2015;77:291–318.CrossrefGoogle Scholar

  • 9.

    Pearl J. A constraint propagation approach to probabilistic reasoning. In: Proceedings of the 4th Annual Conference on Uncertainty in Artificial Intelligence (UAI). 1985. p. 31–42.Google Scholar

  • 10.

    Pearl J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Francisco, CA: Morgan Kaufmann Publishers Inc.; 1988.Google Scholar

  • 11.

    Heinze-Deml C, Maathuis MH, Meinshausen N. Causal structure learning. Annu Rev Stat App. 2018;5(1):371–91.CrossrefGoogle Scholar

  • 12.

    Maathuis M, Kalisch M, Bühlmann P. Estimating high-dimensional intervention effects from observational data. Ann Stat. 2009;37:3133–64.CrossrefGoogle Scholar

  • 13.

    Colombo D, Maathuis M, Kalisch M, Richardson T. Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann Stat. 2012;40:294–321.CrossrefGoogle Scholar

  • 14.

    Claassen T, Mooij JM, Heskes T. Learning sparse causal models is not NP-hard. In: Proceedings of the 29th Annual Conference on Uncertainty in Artificial Intelligence (UAI). 2013.Google Scholar

  • 15.

    Shimizu S, Hoyer PO, Hyvärinen A, Kerminen AJ. A linear non-Gaussian acyclic model for causal discovery. J Mach Learn Res. 2006;7:2003–30.Google Scholar

  • 16.

    Peters J, Bühlmann P. Identifiability of Gaussian structural equation models with equal error variances. Biometrika. 2014;101:219–28.CrossrefGoogle Scholar

  • 17.

    Hoyer PO, Janzing D, Mooij JM, Peters J, Schölkopf B. Nonlinear causal discovery with additive noise models. Adv Neural Inf Process Syst. 2009;21:689–96.Google Scholar

  • 18.

    Haavelmo T. The probability approach in econometrics. Econometrica. 1944;12:S1–115. (supplement).Google Scholar

  • 19.

    Aldrich J. Autonomy Oxford Economic Papers. 1989;41:15–34.CrossrefGoogle Scholar

  • 20.

    Hoover KD. The logic of causal inference. Econ Philos. 1990;6:207–34.CrossrefGoogle Scholar

  • 21.

    Schölkopf B, Janzing D, Peters J, Sgouritsa E, Zhang K, Mooij J. On causal and anticausal learning. In: Proceedings of the 29th International Conference on Machine Learning (ICML). 2012. p. 1255–62.Google Scholar

  • 22.

    Bergsma W, Dassios A. A consistent test of independence based on a sign covariance related to Kendall’s tau. Bernoulli. 2014;20:1006–28.CrossrefGoogle Scholar

  • 23.

    Hoeffding W. A non-parametric test of independence. Ann Math Stat. 1948;19:546–57. 12.CrossrefGoogle Scholar

  • 24.

    Blum JR, Kiefer J, Rosenblatt M. Distribution free tests of independence based on the sample distribution function. Ann Math Stat. 1961;32:485–98.CrossrefGoogle Scholar

  • 25.

    Rényi A. On measures of dependence. Acta Math Acad Sci Hung. 1959;10:441–51. ISSN 1588-2632.CrossrefGoogle Scholar

  • 26.

    Székely GJ, Rizzo ML, Bakirov NK. Measuring and testing dependence by correlation of distances. Ann Stat. 2007;35:2769–94.CrossrefGoogle Scholar

  • 27.

    Zhang K, Peters J, Janzing D, Schölkopf B. Kernel-based conditional independence test and application in causal discovery. In: Proceedings of the 27th Annual Conference on Uncertainty in Artificial Intelligence (UAI). 2011. p. 804–13.Google Scholar

  • 28.

    Goeman JJ, Solari A. Multiple testing for exploratory research. Statistical Science. 2011. 584–597.

  • 29.

    R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2017. https://www.R-project.org.Google Scholar

  • 30.

    Hirschman C. Why fertility changes. Annu Rev Sociol. 1994;20(1):203–33.CrossrefGoogle Scholar

  • 31.

    Huinink J, Kohli M, Ehrhardt J. Explaining fertility: The potential for integrative approaches. Demogr Res Monogr. 2015;33: 93.CrossrefGoogle Scholar

  • 32.

    Raftery A, Lewis S, Aghajanian A. Demand or ideation? Evidence from the Iranian marital fertility decline. Demography. 1995;32(2):159–82.CrossrefGoogle Scholar

  • 33.

    Angrist JD, Imbens GW, Rubin DB. Identification of causal effects using instrumental variables. J Am Stat Assoc. 1996;91:444–55.CrossrefGoogle Scholar

  • 34.

    Imbens GW. Instrumental variables: An econometrician’s perspective. Stat Sci. 2014;29(3):323–58.CrossrefGoogle Scholar

  • 35.

    Nations U. World population prospects: The 2012 revision. Population Division, Department of Economic and Social Affairs. New York: United Nations; 2013. https://esa.un.org/unpd/wpp/Download/Standard/ASCII/.Google Scholar

  • 36.

    Lauritzen SL. Graphical Models. New York, USA: Oxford University Press; 1996.Google Scholar

  • 37.

    Ernest J, Rothenhäusler D, Bühlmann P. Causal inference in partially linear structural equation models. Ann Stat. 2018;46:2904–38.CrossrefGoogle Scholar

  • 38.

    Shah RD, Peters J. The hardness of conditional independence testing and the generalised covariance measure. ArXiv e-prints. 2018. 1804.07203.Google Scholar

  • 39.

    Künsch HR. The jackknife and the bootstrap for general stationary observations. Annals of Statistics. 1989. 1217–1241.

  • 40.

    Breiman L. Random forests. Mach Learn. 2001;45:5–32.CrossrefGoogle Scholar

  • 41.

    Fukumizu K, Gretton A, Sun X, Schölkopf B. Kernel measures of conditional dependence. In: Advances in Neural Information Processing Systems 20. 2008. p. 489–96.Google Scholar

  • 42.

    Shah RD, Bühlmann P. Goodness-of-fit tests for high dimensional linear models. J R Stat Soc, Ser B, Stat Methodol. 2018;80(1):113–35. https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12234.CrossrefGoogle Scholar

  • 43.

    Williams CKI, Seeger M. Using the nyström method to speed up kernel machines. In: Advances in Neural Information Processing Systems 13. MIT Press; 2001. p. 682–8.Google Scholar

  • 44.

    Rahimi A, Recht B. Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems 20. Curran Associates, Inc.; 2008. p. 1177–84.Google Scholar

  • 45.

    Zeileis A, Hothorn T, Hornik K. Model-based recursive partitioning. J Comput Graph Stat. 2008;17(2):492–514.CrossrefGoogle Scholar

  • 46.

    Hothorn T, Zeileis A. partykit: A modular toolkit for recursive partytioning in R. J Mach Learn Res. 2015;16:3905–9.Google Scholar

  • 47.

    Bühlmann P, Peters J, Ernest J. CAM: Causal additive models, high-dimensional order search and penalized regression. Ann Stat. 2014;42:2526–56.CrossrefGoogle Scholar

  • 48.

    Gretton A, Bousquet O, Smola A, Schölkopf B. Measuring statistical dependence with Hilbert-Schmidt norms. In: Algorithmic Learning Theory. Springer; 2005. p. 63–78.Google Scholar

  • 49.

    Gretton A, Fukumizu K, Teo CH, Song L, Schölkopf B, Smola AJ. A kernel statistical test of independence. Proc Neural Inf Process Syst. 2007;20:1–8.Google Scholar

  • 50.

    Pfister N, Bühlmann P, Schölkopf B, Peters J. Kernel-based tests for joint independence. J R Stat Soc, Ser B. 2017;80:5–31.Google Scholar

  • 51.

    Wilson EB. Probable Inference, the Law of Succession, and Statistical Inference. J Am Stat Assoc. 1927;22:209–12.CrossrefGoogle Scholar

  • 52.

    Conover WJ. Practical nonparametric statistics. New York: John Wiley & Sons; 1971.Google Scholar

  • 53.

    Levene H. Robust tests for equality of variances. In: Olkin I, editor. Contributions to Probability and Statistics. Palo Alto, CA: Stanford University Press; 1960. p. 278–92.Google Scholar

  • 54.

    Gastwirth JL, Gel YR, Wallace Hui WL, Lyubchich V, Miao W, Noguchi K. lawstat: Tools for Biostatistics, Public Policy, and Law. 2015. https://CRAN.R-project.org/package=lawstat. R package version 3.0.Google Scholar

  • 55.

    Meinshausen N. Quantile regression forests. J Mach Learn Res. 2006;7:983–99.Google Scholar

About the article

Received: 2017-01-07

Revised: 2018-05-09

Accepted: 2018-08-24

Published Online: 2018-09-18

Published in Print: 2018-09-25

Citation Information: Journal of Causal Inference, Volume 6, Issue 2, 20170016, ISSN (Online) 2193-3685, DOI: https://doi.org/10.1515/jci-2017-0016.

Export Citation

© 2018 Walter de Gruyter GmbH, Berlin/Boston.Get Permission

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Peter Bühlmann
TEST, 2019, Volume 28, Number 2, Page 330

Comments (0)

Please log in or register to comment.
Log in