Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido

IMPACT FACTOR 2018: 0.536
5-year IMPACT FACTOR: 0.764

CiteScore 2018: 0.49

SCImago Journal Rank (SJR) 2018: 0.316
Source Normalized Impact per Paper (SNIP) 2018: 0.342

Mathematical Citation Quotient (MCQ) 2018: 0.02

See all formats and pricing
More options …
Volume 12, Issue 6


Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

Random forests on distance matrices for imaging genetics studies

Aaron Sim
  • Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, UK
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Dimosthenis Tsagkrasoulis / Giovanni Montana
  • Corresponding author
  • Statistics Section, Department of Mathematics, Imperial College London, UK
  • Department of Biomedical Engineering, King’s College London, UK
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2013-11-19 | DOI: https://doi.org/10.1515/sagmb-2013-0040


We propose a non-parametric regression methodology, Random Forests on Distance Matrices (RFDM), for detecting genetic variants associated to quantitative phenotypes, obtained using neuroimaging techniques, representing the human brain’s structure or function. RFDM, which is an extension of decision forests, requires a distance matrix as the response that encodes all pair-wise phenotypic distances in the random sample. We discuss ways to learn such distances directly from the data using manifold learning techniques, and how to define such distances when the phenotypes are non-vectorial objects such as brain connectivity networks. We also describe an extension of RFDM to detect espistatic effects while keeping the computational complexity low. Extensive simulation results and an application to an imaging genetics study of Alzheimer’s Disease are presented and discussed.

Keywords: genetic associations; random forests; quantitative traits; imaging genetics; Alzheimer’s Disease


  • Albert, M. S., S. T. DeKosky, D. Dickson, B. Dubois, H. H. Feldman, N. C. Fox, A. Gamst, D. M. Holtzman, W. J. Jagust, R. C. Petersen, P. J. Snyder, M. C. Carrillo, B. Thies and C. H. Phelps (2011): “The diagnosis of mild cognitive impairment due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease,” Alzheimers Dement.: J. Alzheimer’s Assoc., 7, 270–279.PubMedCrossrefGoogle Scholar

  • Alter, M. D., R. Kharkar, K. E. Ramsey, D. W. Craig, R. D. Melmed, T. A. Grebe, R. C. Bay, S. Ober-Reynolds, J. Kirwan, J. J. Jones, J. B. Turner, R. Hen and D. A. Stephan (2011): “Autism and Increased Paternal Age Related Changes in Global Levels of Gene Expression Regulation,” PLOS One, 6 (2), e16715.Google Scholar

  • Belkin, M. and P. Niyogi (2003): “Laplacian Eigenmaps for Dimensionality Reduction and Data Representation,” Neural. Comput., 15, 1373–1396.CrossrefGoogle Scholar

  • Braak, H. H. and E. E. Braak (1998): “Evolution of neuronal changes in the course of Alzheimer’s disease,” J. Neural Transm. Supplementum, 53, 127–140.CrossrefGoogle Scholar

  • Breiman, L. (1984): Classification and regression trees, US, Florida: Chapman & Hall/CRC.Google Scholar

  • Breiman, L. (2001): “Random forests - Springer,” Mach. Learn., 45, 5–32.CrossrefGoogle Scholar

  • Brier, M. R., J. B. Thomas, A. Z. Snyder, T. L. Benzinger, D. Zhang, M. E. Raichle, D. M. Holtzman, J. C. Morris and B. M. Ances (2012): “Loss of intranetwork and internetwork resting state functional connections with Alzheimer’s disease progression,” J. Neurosci., 32, 8890–8899.CrossrefGoogle Scholar

  • Bureau, A., J. Dupuis, K. Falls, K. Lunetta, B. Hayward, T. Keith and P. Van Eerdewegh (2005): “Identifying SNPs predictive of phenotype using random forests,” Genet. Epidimiol., 28, 171–182.CrossrefGoogle Scholar

  • Busoniu, L., R. Babuska, B. De Schutter and D. Ernst (2010): “Extremely randomized trees,” in Reinforcement learning and dynamic programming using function approximations, Automation and Control Engineering Series, Florida, US: CRC Press-Taylor & Francis Group, pp. 235–238.Google Scholar

  • Chen, L., G. Yu, C. D. Langefeld, D. J. Miller, R. T. Guy, J. Raghuram, X. Yuan, D. M. Herrington and Y. Wang (2011): “Comparative analysis of methods for detecting interacting loci,” BMC Genomics, 12, Article no. 344.PubMedCrossrefGoogle Scholar

  • Corder, E. H., A. M. Saunders, W. J. Strittmatter, D. E. Schmechel, P. C. Gaskell, G. W. Small, A. D. Roses, J. L. Haines and M. A. Pericak-Vance (1993): “Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer’s disease in late onset families,” Science (New York, N.Y.), 261, 921–923.CrossrefGoogle Scholar

  • Criminisi, A. (2012): “Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning,” Foundations and Trends in Computer Graphics and Vision, 7, 81–227.Google Scholar

  • Deza, M.-M. and E. Deza (2013): “Encyclopedia of Distances,” Springer-Verlag, ISBN 978-3-642-00233-5.Google Scholar

  • De Lobel, L., P. Geurts, G. Baele, F. Castro-Giner, M. Kogevinas and K. Van Steen (2010): “A screening methodology based on Random Forests to improve the detection of gene-gene interactions,” Eur. J. Hum. Genet., 18, 1127–1132.Google Scholar

  • Drzezga, A., T. Grimmer, G. Henriksen, I. Stangier, R. Pemeczky, J. Diehl-Schmid, C. A. Mathis, W. E. Klunck, J. Price, S. DeKosky, H.-J. Wester, M. Schwaiger and A. Kurz (2008): “Imaging of amyloid plaques and cerebral glucose metabolism in semantic dementia and Alzheimer’s disease,” NeuroImage, 39, 619–633.PubMedCrossrefGoogle Scholar

  • Förstner, W. and B. Moonen (1999): “A metric for covariance matrices,” Quo vadis geodesia, 113–.Google Scholar

  • Friedman, J., T. Hastie and R. Tibshirani (2008): “Sparse inverse covariance estimation with the graphical lasso.” Biostatistics (Oxford, England), 9, 432–441.CrossrefGoogle Scholar

  • Gerber, S., T. Tasdizen, S. Joshi and R. Whitaker (2009): “On the manifold structure of the space of brain images,” Med. Image Comput. Comput. Assist. Interv., 12, 305–312.PubMedGoogle Scholar

  • Gerber, S., T. Tasdizen, P. Thomas Fletcher, S. Joshi and R. Whitaker (2010): “Manifold modeling for brain population analysis,” Med. Image Anal., 14, 643–653.PubMedCrossrefGoogle Scholar

  • Glahn, D. C., P. M. Thompson and J. Blangero (2007): “Neuroimaging endophenotypes: strategies for finding genes influencing brain structure and function,” Hum. Brain Mapp., 28, 488–501.CrossrefGoogle Scholar

  • Goldstein, B. A., A. E. Hubbard, A. Cutler and L. F. Barcellos (2010): “An application of Random forests to a genome-wide association dataset: methodological considerations & new findings,” BMC Genetics, 11, Article no. 49.PubMedCrossrefGoogle Scholar

  • Goldstein, B. A., E. C. Polley and F. B. S. Briggs (2011): “Random forests for genetic association studies,” Statis. Appl. Genetics Mol. Biol., 10, 32.Google Scholar

  • Gray, K. R., P. Aljabar, R. A. Heckemann, A. Hammers, D. Rueckert and for the Alzheimer’s Disease Neuroimaging Initiative (2013): “Random forest-based similarity measures for multi-modal classification of Alzheimer’s disease,” Neuroimage, 65C, 167–175.CrossrefGoogle Scholar

  • Gray, K. R., P. Aljabar, R. A. Heckemann, A. Hammers and D. Rueckert (2011): Random forest-based manifold learning for classification of imaging data in dementia. In: Machine Learning in Medical Imaging, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 159–166.Google Scholar

  • Hahn, L., M. Ritchie and J. Moore (2003): “Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions,” Bioinformatics, 19, 376–382.PubMedCrossrefGoogle Scholar

  • Hastie, T., R. Tibshirani and J. Friedman (2009): The elements of statistical learning, Springer Series in Statistics, second edition, New York: Springer.Google Scholar

  • Higham, N. J. (2002): “Computing the nearest correlation matrix–a problem from finance,” IMA J. Numer. Anal., 22, 329–343.CrossrefGoogle Scholar

  • Hinrichs, C., V. Singh, G. Xu and S. C. Johnson (2011): “Predictive markers for AD in a multi-modality framework: an analysis of MCI progression in the ADNI population,” Neuroimage, 55, 16–16.Google Scholar

  • Huang, S., J. Li, L. Sun, J. Ye, A. Fleisher, T. Wu, K. Chen and E. Reiman (2010): “Learning brain connectivity of Alzheimer’s disease by sparse inverse covariance estimation,” Neuroimage, 50, 935–949.CrossrefPubMedGoogle Scholar

  • Iwamoto, K., M. Bundo and T. Kato (2005): “Altered expression of mitochondria-related genes in postmortem brains of patients with bipolar disorder or schizophrenia, as revealed by large-scale DNA microarray analysis,” Hum. Mol. Genet., 14, 241–253.PubMedGoogle Scholar

  • Iwangoff, P., R. Armbruster, A. Enz and W. Meierruge (1980): “Glycolytic Enzymes from human autoptic brain cortex - normal aged and demented cases,” Mech. Ageing Deve., 14, 203–209.PubMedGoogle Scholar

  • Jack, C. R., Jr., M. A. Bernstein, N. C. Fox, P. Thompson, G. Alexander, D. Harvey, B. Borowski, P. J. Britson, J. L. Whitwell, C. Ward, A. M. Dale, J. P. Felmlee, J. L. Gunter, D. L. G. Hill, R. Killiany, N. Schuff, S. Fox-Bosetti, C. Lin, C. Studholme, C. S. DeCarli, G. Krueger, H. A. Ward, G. J. Metzger, K. T. Scott, R. Mallozzi, D. Blezek, J. Levy, J. P. Debbins, A. S. Fleisher, M. Albert, R. Green, G. Bartzokis, G. Glover, J. Mugler, M. W. Weiner and A. Study (2008): “The Alzheimer’s Disease Neuroimaging Initiative (ADNI): MRI methods,” J. Magn. Reson. Im., 27, 685–691.CrossrefGoogle Scholar

  • Jiang, R., W. Tang, X. Wu and W. Fu (2009): “A random forest approach to the detection of epistatic interactions in case-control studies,” BMC Bioinformatics, 10. doi: 10.1186/1471-2105-10-S1-S65.PubMedCrossrefGoogle Scholar

  • Kohannim, O., D. P. Hibar, J. L. Stein, N. Jahanshad, C. R. Jack, Jr., M. W. Weiner, A. W. Toga, P. M. Thompson and A. D. N. Initiative (2011): “Boosting power to detect genetic associations in imaging multi-locus, genome-wide scans and ridge regression,” In: 2011 8th IEEE International Symposium on Biomedical Imaging (ISBI) - From Nano to Macro, pp. 1855–1859.Google Scholar

  • Marchini, J., P. Donnelly and L. Cardon (2005): “Genome-wide strategies for detecting multiple loci that influence complex diseases,” Nat. Genet., 37, 413–417.PubMedCrossrefGoogle Scholar

  • McAlonan, G. M., V. Cheung, C. Cheung, J. Suckling, G. Y. Lam, K. S. Tai, L. Yip, D. G. M. Murphy and S. E. Chua (2005): “Mapping the brain in autism. A voxel-based MRI study of volumetric differences and intercorrelations in autism,” Brain: A J. Neurol., 128, 268–276.Google Scholar

  • McKinney, B. A., J. E. Crowe, Jr., J. Guo and D. Tian (2009): “Capturing the spectrum of interaction effects in genetic association studies by simulated evaporative cooling network analysis,” PLOS Genetics, 5 (3).CrossrefPubMedGoogle Scholar

  • Miller, D. J., Y. Zhang, G. Yu, Y. Liu, L. Chen, C. D. Langefeld, D. Herrington and Y. Wang (2009): “An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions,” Bioinformatics (Oxford, England), 25, 2478–2485.PubMedCrossrefGoogle Scholar

  • Minas, C., S. J. Waddell and G. Montana (2011): “Distance-based differential analysis of gene curves.” Bioinformatics (Oxford, England), 27, 3135–3141.CrossrefGoogle Scholar

  • Moosmann, F., B. Triggs, F. Jurie (2007): “Fast discriminative visual codebooks using randomized clustering forests,” Adv. Neural Info. Processing Syst. 19, 985–992.Google Scholar

  • Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot and E. Duchesnay (2011): “Scikit-learn: Machine learning in python,” J. Mach. Learn. Res., 12, 2825–2830.Google Scholar

  • Peng, B. B. and M. M. Kimmel (2005): “simuPOP: A forward-time population genetics simulation environment.” Bioinformatics (Oxford, England), 21, 3686–3687.CrossrefGoogle Scholar

  • Peng, B. and C. I. Amos (2010): “Forward-time simulation of realistic samples for genome-wide association studies,” BMC Bioinformatics, 11, 442–442.CrossrefPubMedGoogle Scholar

  • Pericak-Vance, M., J. Bebout, P. Gaskell, L. Yamaoka, W. Hung, A. MJ, A. Walker, R. Bartlett, C. Haynes, K. Welsh, N. Earl, A. Heyman, C. Clark and A. Roses (1991): “Linkage studies in Familiar Alzheimer-disease - Evidence for Chromosome 19 linkage,” Am. J. Hum. Genet., 48, 1034–1050.Google Scholar

  • Saykin, A. J., L. Shen, T. M. Foroud, S. G. Potkin, S. Swaminathan, S. Kim, S. L. Risacher, K. Nho, M. J. Huentelman, D. W. Craig, P. M. Thompson, J. L. Stein, J. H. Moore, L. A. Farrer, R. C. Green, L. Bertram, C. R. Jack, Jr., M. W. Weiner and A. D. N. Initi (2010): “Alzheimer’s Disease Neuroimaging Initiative biomarkers as quantitative phenotypes: genetics core aims, progress, and plans,” Alzheimer’s Dement.: J. Alzheimer’s Assoc., 6, 265–273.Google Scholar

  • Schäfer, J. and K. Strimmer (2005): “A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics.” Stat. Appl. Genet. Mol. Biol., 4 (1), Article no. 32.Google Scholar

  • Scherzer, C. R., A. C. Eklund, L. J. Morse, Z. Liao, J. J. Locascio, D. Fefer, M. A. Schwarzschild, M. G. Schlossmacher, M. A. Hauser, J. M. Vance, L. R. Sudarsky, D. G. Standaert, J. H. Growdon, R. V. Jensen and S. R. Gullans (2007): “Molecular markers of early Parkinson’s disease based on gene expression in blood,” Proc. Nat. Acad. Sci. USA, 104, 955–960.CrossrefGoogle Scholar

  • Segal, M. and Y. Xiao (2011): “Multivariate random forests,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1, 80–87.Google Scholar

  • Silver, M., E. Janousova, X. Hua, P. M. Thompson, G. Montana and The Alzheimer’s Disease Neuroimaging Initiative (2012): “Identification of gene pathways implicated in Alzheimer’s disease using longitudinal imaging phenotypes with sparse regression,” Neuroimage, 63, 1681–1694.CrossrefPubMedGoogle Scholar

  • Sperling, R. A., P. S. Aisen, L. A. Beckett, D. A. Bennett, S. Craft, A. M. Fagan, T. Iwatsubo, C. R. Jack, J. Kaye, T. J. Montine, D. C. Park, E. M. Reiman, C. C. Rowe, E. Siemers, Y. Stern, K. Yaffe, M. C. Carrillo, B. Thies, M. Morrison-Bogorad, M. V. Wagster and C. H. Phelps (2011): “Toward defining the preclinical stages of Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease,” Alzheimer’s Dement.: J. Alzheimer’s Assoc., 7 (3), 280–292.Google Scholar

  • Stein, J. L. J., X. X. Hua, S. S. Lee, A. J. A. Ho, A. D. A. Leow, A. W. A. Toga, A. J. A. Saykin, L. L. Shen, T. T. Foroud, N. N. Pankratz, M. J. M. Huentelman, D. W. D. Craig, J. D. J. Gerber, A. N. A. Allen, J. J. J. Corneveaux, B. M. B. Dechairo, S. G. S. Potkin, M. W. M. Weiner and P. P. Thompson (2010): “Voxelwise genome-wide association study (vGWAS),” Neuroimage, 53, 15.Google Scholar

  • Strittmatter, W. J., A. M. Saunders, D. Schmechel, M. Pericak-Vance, J. Enghild, G. S. Salvesen and A. D. Roses (1993): “Apolipoprotein E: high-avidity binding to beta-amyloid and increased frequency of type 4 allele in late-onset familial Alzheimer disease,” Proc. Natl. Acad. Sci. USA, 90, 1977–1981.Google Scholar

  • Sun, L., R. Patel, J. Liu, K. Chen, T. Wu, J. Li, E. Reiman and J. Ye (2009): “Mining brain region connectivity for Alzheimer’s disease study via sparse inverse covariance estimation,” in the 15th ACM SIGKDD international conference, New York, USA: ACM Press, 1335.Google Scholar

  • Tenenbaum, J. B., V. de Silva and J. C. Langford (2000): “A global geometric framework for nonlinear dimensionality reduction,” Science, 290, 2319–2323.CrossrefGoogle Scholar

  • Verma, R., P. Khurd and C. Davatzikos (2007): “On analyzing diffusion tensor images by identifying manifold structure using isomaps,” IEEE Trans. Med. Imaging, 26, 772–778.PubMedCrossrefGoogle Scholar

  • Vounou, M., E. Janousova, R. Wolz, J. L. Stein, P. M. Thompson, D. Rueckert, G. Montana and Alzheimer’s Disease Neuroimaging Initiative (2012): “Sparse reduced-rank regression detects genetic associations with voxel-wise longitudinal phenotypes in Alzheimer’s disease,” Neuroimage, 60, 700–716.CrossrefPubMedGoogle Scholar

  • Vounou, M., T. E. Nichols and G. Montana (2010): “Discovering genetic associations with high-dimensional neuroimaging phenotypes: a sparse reduced-rank regression approach,” Neuroimage, 53, 1147–1159.PubMedCrossrefGoogle Scholar

  • Warde-Farley, D., S. L. Donaldson, O. Comes, K. Zuberi, R. Badrawi, P. Chao, M. Franz, C. Grouios, F. Kazi, C. T. Lopes, A. Maitland, S. Mostafavi, J. Montojo, Q. Shao, G. Wright, G. D. Bader and Q. Morris (2010): “The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function,” Nucleic Acid Res., 38, W214–W220.Google Scholar

  • Yang, C., Z. He, X. Wan, Q. Yang, H. Xue and W. Yu (2009): “SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies,” Bioinformatics, 25, 504–511.PubMedCrossrefGoogle Scholar

  • Yoshida, M. and A. Koike (2011): “SNPInterForest: a new method for detecting epistatic interactions,” BMC Bioinformatics, 12, Article no. 469.CrossrefPubMedGoogle Scholar

  • Zhang, D., Y. Wang, L. Zhou, H. Yuan and D. Shen (2011): “Multimodal classification of Alzheimer’s disease and mild cognitive impairment,” NeuroImage, 55, 856–867.PubMedCrossrefGoogle Scholar

  • Zhang, Y. and J. S. Liu (2007): “Bayesian inference of epistatic interactions in case-control studies,” Nat. Genet., 39, 1167–1173.CrossrefPubMedGoogle Scholar

  • Zhang, Y., D. J. Miller, and G. Kesidis (2009): “Hierarchical maximum entropy modeling for regression,” In: 2009 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), IEEE, 1–6.Google Scholar

  • Zhao, W.-Q., P. N. Lacor, H. Chen, M. P. Lambert, M. J. Quon, G. A. Krafft and W. L. Klein (2009): “Insulin receptor dysfunction impairs cellular clearance of neurotoxic oligomeric a beta,” J. Biol. Chem., 284, 18742–18753.CrossrefGoogle Scholar

About the article

Corresponding author: Giovanni Montana, Statistics Section, Department of Mathematics, Imperial College London, UK; and Department of Biomedical Engineering, King’s College London, UK, e-mail:

Published Online: 2013-11-19

Published in Print: 2013-12-01

The overall scaling factor and constant additive term arising from the omitted variance terms in (6) can be ignored as only the relative values of Gnαs are of interest.

Exceptions include responses with infinite degrees of freedom, where the forced vectorial representations are infinite-dimensional; e.g., functions represented by its infinite vector of Fourier modes.

For example, identical measurements taken at different time points.

This value is chosen to ensure a measurable but weak signal of causal SNPs in the case-control set-up

We expect that by considering manifolds of dimensions >2, we will observe similar correspondences between the clustering and partitioning of weaker marginal SNP according to their maf.

These additional effects may have possible links to other non-dementia related neurological pathologies. However, for the purposes of the detection of disease-linked SNPs or SNP-SNP pairs, these associations are irrelevant.





The specific ranges selected are 0.195<maf<0.205 and 0.22<maf<0.24 respectively. The choice of 0.2 was made solely to identify loci with a clear difference between their major and minor allele frequency and is otherwise arbitrary.

Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 12, Issue 6, Pages 757–786, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.1515/sagmb-2013-0040.

Export Citation

©2013 by Walter de Gruyter Berlin Boston.Get Permission

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Dimosthenis Tsagkrasoulis and Giovanni Montana
Pattern Recognition Letters, 2017
Jingyu Liu and Vince D. Calhoun
Frontiers in Neuroinformatics, 2014, Volume 8

Comments (0)

Please log in or register to comment.
Log in