Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Journal of Causal Inference

Ed. by Imai, Kosuke / Pearl, Judea / Petersen, Maya Liv / Sekhon, Jasjeet / van der Laan, Mark J.

See all formats and pricing
More options …

Markov Boundary Discovery with Ridge Regularized Linear Models

Eric V. Strobl
  • Corresponding author
  • Center for Causal Discovery, Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Pittsburgh, PA 15206, USA
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Shyam Visweswaran
  • Center for Causal Discovery, Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Pittsburgh, PA 15206, USA
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2015-11-03 | DOI: https://doi.org/10.1515/jci-2015-0011


Ridge regularized linear models (RRLMs), such as ridge regression and the SVM, are a popular group of methods that are used in conjunction with coefficient hypothesis testing to discover explanatory variables with a significant multivariate association to a response. However, many investigators are reluctant to draw causal interpretations of the selected variables due to the incomplete knowledge of the capabilities of RRLMs in causal inference. Under reasonable assumptions, we show that a modified form of RRLMs can get “very close” to identifying a subset of the Markov boundary by providing a worst-case bound on the space of possible solutions. The results hold for any convex loss, even when the underlying functional relationship is nonlinear, and the solution is not unique. Our approach combines ideas in Markov boundary and sufficient dimension reduction theory. Experimental results show that the modified RRLMs are competitive against state-of-the-art algorithms in discovering part of the Markov boundary from gene expression data.

Keywords: Markov boundary; ridge regularization; linear models


  • 1. Neapolitan RE. Learning Bayesian networks, Prentice Hall series in artificial intelligence. Upper Saddle River, NJ: Pearson Prentice Hall, 2004.Google Scholar

  • 2. Tsamardinos I, Aliferis CF. Towards principled feature selection: relevancy, filters and wrappers. Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics. 2003.

  • 3. Pearl J. Probabilistic reasoning in intelligent systems: networks of plausible inference, representation and reasoning. San Mateo, CA: Morgan Kaufmann, 1988.Google Scholar

  • 4. Statnikov A, Lytkin NI, Lemeire J, Aliferis CF. Algorithms for discovery of multiple Markov boundaries. J Mach Learn Res 2013;14:499–566.Google Scholar

  • 5. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genomewide expression patterns. Proc Nat Acad Sci 1998;95:14863–8.Google Scholar

  • 6. Holmes JH, Durbin DR, Winston FK. The learning classifier system: an evolutionary computation approach to knowledge discovery in epidemiologic surveillance. Artif Intell Med 2000;19:53–74.Google Scholar

  • 7. Li L, Weinberg CR, Darden TA, Pedersen LG. Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the Ga/Knn method. Bioinformatics 2001;17:1131–42.Google Scholar

  • 8. Zhou X, Kao MCJ, Wong WH. Transitive functional annotation by shortest-path analysis of gene expression data. Proc Nat Acad Sci 2002;99:12783–8.Google Scholar

  • 9. Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 1970;12:55–67.Google Scholar

  • 10. Vapnik V, Chapelle O. Bounds on error expectation for support vector machines. Neural Comput 2000;12:2013–36.Google Scholar

  • 11. Qian J, Hastie T, Friedman J, Tibshirani R, Simon N. 2013. Glmnet for Matlab 2013. Available at http://www.stanford.edu/~hastie/glmnet matlab/

  • 12. Chang CC, Lin CJ. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2011;2.Google Scholar

  • 13. Hall P, Li KC. On almost linearity of low dimensional projections from high dimensional data. Ann Stat 1993;21:867–89.Google Scholar

  • 14. Dawid AP. Conditional independence in statistical theory. J Roy Stat Soc: Ser B 1979;41:1–31.Google Scholar

  • 15. Peters J. On the intersection property of conditional independence and its application to causal discovery. J Causal Infer 2014;3:97–108.Google Scholar

  • 16. Lemeire J, Dominik J. Replacing causal faithfulness with algorithmic independence of conditionals. Minds Mach 2010;23:227–49.Web of ScienceGoogle Scholar

  • 17. Lemeire J, Meganck S, Cartella F. 2010. Robust independence-based causal structure learning in absence of adjacency faithfulness. Proceedings of the Fifth European Workshop on Probabilistic Graphical Models.

  • 18. Dougherty E, Brun M. On the number of close-to-optimal feature sets. Cancer Inf 2006;2:189–96.Google Scholar

  • 19. Ein-Dor L, Zuk O, Domany E. Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Nat Acad Sci USA 2006;103:5923–8.Google Scholar

  • 20. Statnikov A, Aliferis CF. Analysis and computational dissection of molecular signature multiplicity. PLoS Comput Biol 2010;65:el000790.Web of ScienceGoogle Scholar

  • 21. Cook DR. Regression graphics: ideas for studying regressions through graphics. Edited by Wiley Series in Probability and Statistics. Canada: John Wiley & Sons, 1998.Google Scholar

  • 22. Richardson TS, Spirtes P. Automated discovery of linear feedback models. In: Glymour C, and Cooper G, editors. Computation, causation, and discovery. Menlo Park: AAAI Press, 253–302.

  • 23. Richardson TS, Spirtes P. Ancestral graph Markov models. Ann Stat 2002;30:962–1030.Google Scholar

  • 24. Duan N, Li K. Slicing regression: a link-free regression method. Ann Stat 1991;19:505–30.Google Scholar

  • 25. Eaton ML. A characterization of spherical distributions. J Multivariate Anal 1986;20:272–6.Google Scholar

  • 26. Li B, Artemiou A, Li L. Principal support vector machines for linear and nonlinear sufficient dimension reduction. Ann Stat 2011;39:3182–210.Web of ScienceGoogle Scholar

  • 27. Ledoit O, Wolf M. Honey, I shrunk the sample covariance matrix. J Portfolio Manag 2004;30:110–19.Google Scholar

  • 28. Ledoit O, Wolf M. Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann Stat 2012;40:l024–l060.Google Scholar

  • 29. Aliferis CF, Tsamardinos I, Statnikov A. HITON: a novel Markov blanket algorithm for optimal variable selection. AMIA 2003 Annual Symposium Proceedings, 2003:21–5.

  • 30. Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XD. Local causal and Markov blanket induction for causal discovery and feature selection for classification part i: algorithms and empirical evaluation. J Mach Learn Res 2010a;11:171–234.Google Scholar

  • 31. Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XD. Local causal and Markov blanket induction for causal discovery and feature selection for classification part ii: analysis and extensions. J Mach Learn Res 2010b;11:235–84.Google Scholar

  • 32. Statnikov A, Lytkin A, Lemeire J, Aliferis CF. Causal explorer: a Matlab library of algorithms for causal discovery and variable selection for classification. In: Guyon I, Aliferis CF, Cooper GF, Elisseeff A, Pellet JP, Spirtes P and Statnikov A, editors. Challenges in machine learning. Volume 2: causation and prediction challenge. Bookline, MA: Microtome Publishing, 2010.Google Scholar

  • 33. Strobl Ev, Visweswaran S. Markov Banket Ranking using Kernel-based Conditional Dependence Measures. NIPS Workshop on Causality. 2013.

  • 34. Spirtes P. Directed cyclic graphical representations of feedback models. Uncertainty Artif Intell 1995:491–8.

  • 35. Scheines R, Spirtes P, Glymour C, Meek C, Richardson T. The TETRAD project: constraint based aids to causal model specification. Multivariate Behav Res 1998;33:65–117.Google Scholar

  • 36. Statnikov A, Henaff M, Lytkin NI, Aliferis CF. New methods for separating causes from effects in genomics data. BMC Genomics 2012;13:S22. doi: 10.1186/1471-2164-13-S8-S22. Epub 2012 Dec 17.Web of ScienceCrossrefGoogle Scholar

  • 37. Fukumizu K, Bach FR, Jordan MI. Kernel dimension reduction in regression. Ann Stat 2009;37:1871–905.Google Scholar

  • 38. Fukumizu K, Leng C. Gradient-based Kernel dimension reduction for regression. J Am Stat Assoc 2014;109:359–70.Google Scholar

  • 39. Pearl J, Dechter R. Identifying independencies in causal graphs with feedback. Uncertainty in Artificial Intelligence: Proceedings of the Twelfth Conference, 1996: 420–426.

  • 40. Chapelle O. Training a support vector machine in the primal. Neural Comput 2007;19:1155–78.Web of ScienceGoogle Scholar

  • 41. Cortes C, Vapnik V. Support vector networks. Mach Learn 1995;20:273–97.Google Scholar

  • 42. Wu Y, Liu Y. Robust truncated loss support vector machines. J Am Stat Assoc 2007;102:974–83.Google Scholar

About the article

Published Online: 2015-11-03

Published in Print: 2016-03-01

Funding: Research reported in this publication was supported by grant U54HG008540 awarded by the National Human Genome Research Institute through funds provided by the trans-NIH Big Data to Knowledge initiative. The research was also supported by the National Library of Medicine of the National Institutes of Health under award numbers T15LM007059 and R01LM012095.

Citation Information: Journal of Causal Inference, Volume 4, Issue 1, Pages 31–48, ISSN (Online) 2193-3685, ISSN (Print) 2193-3677, DOI: https://doi.org/10.1515/jci-2015-0011.

Export Citation

©2016 by De Gruyter.Get Permission

Comments (0)

Please log in or register to comment.
Log in