Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido

IMPACT FACTOR 2018: 0.536
5-year IMPACT FACTOR: 0.764

CiteScore 2018: 0.49

SCImago Journal Rank (SJR) 2018: 0.316
Source Normalized Impact per Paper (SNIP) 2018: 0.342

Mathematical Citation Quotient (MCQ) 2018: 0.02

See all formats and pricing
More options …
Volume 13, Issue 1


Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

Semi-automatic selection of summary statistics for ABC model choice

Dennis Prangle / Paul Fearnhead / Murray P. Cox
  • Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand
  • Institute of Fundamental Sciences, Massey University, Palmerston North, New Zealand
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Patrick J. Biggs
  • Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Nigel P. French
  • Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand
  • Infectious Disease Research Centre, Institute of Veterinary, Animal and Biomedical Sciences, Massey University, Palmerston North, New Zealand
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2013-12-10 | DOI: https://doi.org/10.1515/sagmb-2013-0012


A central statistical goal is to choose between alternative explanatory models of data. In many modern applications, such as population genetics, it is not possible to apply standard methods based on evaluating the likelihood functions of the models, as these are numerically intractable. Approximate Bayesian computation (ABC) is a commonly used alternative for such situations. ABC simulates data x for many parameter values under each model, which is compared to the observed data xobs. More weight is placed on models under which S(x) is close to S(xobs), where S maps data to a vector of summary statistics. Previous work has shown the choice of S is crucial to the efficiency and accuracy of ABC. This paper provides a method to select good summary statistics for model choice. It uses a preliminary step, simulating many x values from all models and fitting regressions to this with the model as response. The resulting model weight estimators are used as S in an ABC analysis. Theoretical results are given to justify this as approximating low dimensional sufficient statistics. A substantive application is presented: choosing between competing coalescent models of demographic growth for Campylobacter jejuni in New Zealand using multi-locus sequence typing data.

This article offers supplementary material which is provided at the end of the article.

Keywords: ABC; model selection; sufficiency; Campylobacter; MLST; coalescent


  • Atkinson, I. A. and E. K. Cameron (1993): “Human influence on the terrestrial biota and biotic communities of New Zealand,” Trends in Ecology & Evolution, 8, 447–451.CrossrefGoogle Scholar

  • Barnes, C. P., S. Filippi, M. P. H. Stumpf and T. Thorne (2012a): “Considerate approaches to constructing summary statistics for ABC model selection,” Statistics and Computing, 22, 1181–1197.Web of ScienceCrossrefGoogle Scholar

  • Barnes, C. P., S. Filippi and M. P. H. Stumpf (2012b): “Contribution to the discussion of Fearnhead and Prangle (2012). Constructing summary statistics for approximate Bayesian computation: Semi-automatic approximate Bayesian computation,” Journal of the Royal Statistical Society: Series B, 74, 453.Web of ScienceGoogle Scholar

  • Beaumont, M. A. (2008): “Joint determination of topology, divergence time, and immigration in population trees,” In: C. Renfrew, S. Matsumura, and P. Forster, editors, Simulation, Genetics and Human Prehistory. McDonald Institute Monographs, pp. 134–154.Google Scholar

  • Beaumont, M. A., W. Zhang and D. J. Balding (2002): “Approximate Bayesian computation in population genetics,” Genetics, 162, 2025–2035.PubMedGoogle Scholar

  • Blum, M. G. B. (2010): “Approximate Bayesian computation: a nonparametric perspective,” Journal of the American Statistical Association, 105 (491), 1178–1187.Web of ScienceGoogle Scholar

  • Blum, M. G. B. and O. François (2010): “Non-linear regression models for approximate Bayesian computation,” Statistics and Computing, 20, 63–73.Web of ScienceCrossrefGoogle Scholar

  • Blum, M. G. B., M. A. Nunes, D. Prangle and S. A. Sisson (2013): “A comparative review of dimension reduction methods in approximate Bayesian computation,” Statistical Science, 28, 189–208.Web of ScienceCrossrefGoogle Scholar

  • Del Moral, P., A. Doucet and A. Jasra (2012): “An adaptive sequential Monte Carlo method for approximate Bayesian computation,” Statistics and Computing, 22 (5), 1009–1020.Web of ScienceGoogle Scholar

  • Didelot, X., R. G. Everitt, A. M. Johansen and D. J. Lawson (2011): “Likelihood-free estimation of model evidence,” Bayesian Analysis 6 (1), 49–76.Google Scholar

  • Dingle, K. E., F. M. Colles, D. R. A. Wareing, M. C. J. Maiden, M. C. J. Ure, R. Maiden, A. J. Fox, F. E. Bolton, H. J. Bootsma, R. J. Willems, R. Urwin and M. C. Maiden (2001): “Multilocus sequence typing system for Campylobacter jejuni,” Journal of Clinical Microbiology, 39, 14–23.CrossrefGoogle Scholar

  • Drovandi, C. C. and A. N. Pettitt (2011): “Estimation of parameters for macroparasite population evolution using approximate Bayesian computation,” Biometrics, 67 (1), 225–233.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Estoup, A., E. Lombaert, J.-M. Marin, T. Guillemaud, P. Pudlo, C. P. Robert and J. Cornuet (2012): “Estimation of demo-genetic model probabilities with approximate Bayesian computation using linear discriminant analysis on summary statistics,” Molecular Ecology Resources, 12 (5), 846–855.Web of ScienceCrossrefPubMedGoogle Scholar

  • Fan, Y., D. J. Nott and S. A. Sisson (2013): Approximate Bayesian computation via regression density estimation. Stat, 2, 34–48.CrossrefGoogle Scholar

  • Fearnhead, P. and D. Prangle (2012): “Constructing summary statistics for approximate Bayesian computation: semi-automatic ABC,” Journal of the Royal Statistical Society, Series B, 74, 419–474.Web of ScienceCrossrefGoogle Scholar

  • French, N., S. Yu, P. Biggs, B. Holland, P. Fearnhead, B. Binney, A. Fox, D. H. Grove-White, J. Leigh, W. Miller, P. Muellner and P. Carter (2014): “Evolution of Campylobacter species in New Zealand,” In S. Sheppard and G. Méric, editors, Campylobacter Ecology and Evolution. Caister Academic Press, Norfolk.Google Scholar

  • Friedman, J., T. Hastie, and R. Tibshirani (2010): “Regularization paths for generalized linear models via coordinate descent,” Journal of Statistical Software, 33 (1).Google Scholar

  • Grelaud, A., C. Robert, J.-M. Marin, F. Rodolphe and J. F. Taly (2009): “ABC likelihood-free methods for model choice in Gibbs random fields,” Bayesian Analysis, 4 (2), 317–336.Web of ScienceGoogle Scholar

  • Hudson, R. R. (2002): “Generating samples under a Wright-Fisher neutral model of genetic variation,” Bioinformatics, 18, 337–338.CrossrefGoogle Scholar

  • Humphrey, T., S. O’Brien and M. Madsen (2007): “Campylobacters as zoonotic pathogens: a food production perspective,” International Journal of Food Microbiology, 117 (3), 237–57.CrossrefWeb of ScienceGoogle Scholar

  • Joyce, P. and P. Marjoram (2008): “Approximately sufficient statistics and Bayesian computation,” Statistical Applications in Genetics and Molecular Biology, 7, 2008. Article 26.Google Scholar

  • Kolmogorov, A. N. (1942): “Determination of centre of dispersion and measure of accuracy from a finite number of observations (in Russian),” Izv. Akad. Nauk, USSR Ser. Mat., 6, 3–32.Google Scholar

  • Liu, J. S. (1996): “Metropolized independent sampling with comparisons to rejection sampling and importance sampling,” Statistics and Computing, 6, 113–119.CrossrefGoogle Scholar

  • Marin, J.-M., N. Pillai, C. P. Robert and J. Rousseau (2013): “Relevant statistics for Bayesian model choice,” Preprint. Available at http://www.arxiv.org/abs/1110.4700.

  • Mullner, P., S. E. F. Spencer, D. J. Wilson, G. Jones, A. D. Noble, A. C. Midwinter, J. M. Collins-Emerson, P. Carter, S. Hathaway and N. P. French (2009): “Assigning the source of human campylobacteriosis in New Zealand: a comparative genetic and epidemiological approach,” Infection, Genetics and Evolution 9 (6), 1311–1319.Web of ScienceCrossrefGoogle Scholar

  • Nordborg, M. (2004): “Coalescent theory,” In: D.J. Balding, M. Bishop, C. Cannings (Eds.). Handbook of statistical genetics, Wiley-Interscience, volume 2, New York.Google Scholar

  • Nunes, M. A. and D. J. Balding (2010): “On optimal selection of summary statistics for approximate Bayesian computation,” Statistical Applications in Genetics and Molecular Biology, 9 (1), 2010.Web of ScienceGoogle Scholar

  • Rambaut, A. and N. C. Grassly (1997): “Seq-gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees,” Computer Applications in the Biosciences, 13, 235–238.Google Scholar

  • Rayner, G. D. and H. L. MacGillivray (2002): “Numerical maximum likelihood estimation for the g-and-k and generalized g-and-h distributions,” Statistics and Computing, 12 (1), 57–75.CrossrefGoogle Scholar

  • Robert, C. P. (1996): “Intrinsic losses,” Theory and decision, 40 (2), 191–214.CrossrefGoogle Scholar

  • Robert, C. P. (2014): Bayesian computational tools. Annual Review of Statistics and Its Application, 1, 16.1–16.25.Google Scholar

  • Robert, C. P., J. M. Cornuet, J.-M. Marin and N. Pillai (2011): “Lack of confidence in approximate Bayesian computation model choice,” Proceedings of the National Academy of Sciences, 108 (37), 15112–15117.Web of ScienceGoogle Scholar

  • Savill, M., A. Hudson, M. Devane, N. Garrett, B. Gilpin and A. Ball (2003): “Elucidation of potential transmission routes of Campylobacter in New Zealand,” Water Science and Technology, 47 (3), 31–38.Google Scholar

  • Sears, A., M. G. Baker, N. Wilson, J. Marshall, P. Muellner, D. M. Campbell, R. J. Lake and N. P. French (2011): “Marked campylobacteriosis decline after interventions aimed at poultry, New Zealand,” Emerging Infectious Diseases, 17 (6), 1007–1015.Web of SciencePubMedGoogle Scholar

  • Sjödin, P., A. E. Sjöstrand, M. Jakobsson and M. G. B. Blum (2012): “Resequencing data provide no evidence for a human bottleneck in Africa during the penultimate glacial period,” Molecular Biology and Evolution, 29 (7), 1851–1860.Web of ScienceCrossrefGoogle Scholar

  • Sousa, V. C., M. A. Beaumont, P. Fernandes, M. M. Coelho and L. Chikhi (2012): “Population divergence with or without admixture: selecting models using an ABC approach,” Heredity, 108, 521–530.CrossrefPubMedWeb of ScienceGoogle Scholar

  • Toni, T. and M. P. H. Stumpf (2010): “Simulation-based model selection for dynamical systems in systems and population biology,” Bioinformatics, 26 (1), 104–110.CrossrefWeb of SciencePubMedGoogle Scholar

  • Wilson, D. J., E. Gabriel, A. J. H. Leatherbarrow, J. Cheesbrough, S. Gee, E. Bolton, A. Fox, C. A. Hart, P. J. Diggle and P. Fearnhead (2009): “Rapid evolution and the importance of recombination to the gastroenteric pathogen Campylobacter jejuni,” Molecular Biology and Evolution, 26 (2), 385–397.CrossrefWeb of ScienceGoogle Scholar

  • Wiuf C. and J. Hein (2000): “The coalescent with gene conversion,” Genetics, 155, 451–462.PubMedGoogle Scholar

  • Yu, S., P. Fearnhead, B. R. Holland, P. Biggs, M. Maiden and N. P. French (2012): “Estimating the relative roles of recombination and point mutation in the generation of single locus variants in Campylobacter jejuni and Campylobacter coli,” Journal of Molecular Evolution, 74 (5–6), 273–280.Web of ScienceCrossrefGoogle Scholar

About the article

Corresponding author: Dennis Prangle, Department of Mathematics and Statistics, Lancaster University, UK, e-mail:

Published Online: 2013-12-10

Published in Print: 2014-02-01

Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 13, Issue 1, Pages 67–82, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.1515/sagmb-2013-0012.

Export Citation

©2014 by Walter de Gruyter Berlin Boston.Get Permission

Supplementary Article Materials

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Pierre Pudlo, Jean-Michel Marin, Arnaud Estoup, Jean-Marie Cornuet, Mathieu Gautier, and Christian P. Robert
Bioinformatics, 2016, Volume 32, Number 6, Page 859
Sixing Chen, Antonietta Mira, Jukka-Pekka Onnela, and Matjaz Perc
Journal of Complex Networks, 2019
Mark A. Beaumont
Annual Review of Statistics and Its Application, 2019, Volume 6, Number 1, Page 379
Mikael Pontarp, Åke Brännström, Owen L. Petchey, and Timothée Poisot
Methods in Ecology and Evolution, 2019, Volume 10, Number 4, Page 450
Simon Laurin-Lemay, Nicolas Rodrigue, Nicolas Lartillot, Hervé Philippe, and Rasmus Nielsen
Molecular Biology and Evolution, 2018
Xing Ju Lee, Markus Hainy, James P. McKeone, Christopher C. Drovandi, and Anthony N. Pettitt
Computational Statistics & Data Analysis, 2018
Robert J. H. Ross, R. E. Baker, Andrew Parker, M. J. Ford, R. L. Mort, and C. A. Yates
npj Systems Biology and Applications, 2017, Volume 3, Number 1
Richard G. Everitt, Adam M. Johansen, Ellen Rowing, and Melina Evdemon-Hogan
Statistics and Computing, 2017, Volume 27, Number 2, Page 403
Sara Guirao-Rico, Alejandro Sánchez-Gracia, and Deborah Charlesworth
Molecular Ecology, 2017, Volume 26, Number 5, Page 1357
Michael Stocks, Mathieu Siol, Martin Lascoux, Stéphane De Mita, and Magnus Rattray
PLoS ONE, 2014, Volume 9, Number 6, Page e99581
Florence Parat, Grit Schwertfirm, Ulrike Rudolph, Thomas Miedaner, Viktor Korzun, Eva Bauer, Chris-Carolin Schön, and Aurélien Tellier
Molecular Ecology, 2016, Volume 25, Number 2, Page 500
Christopher C. Drovandi and Roy A. McCutchan
Biometrics, 2016, Volume 72, Number 2, Page 344
Phoebe J M Jones, Aaron Sim, Harriet B Taylor, Laurence Bugeon, Magaret J Dallman, Bernard Pereira, Michael P H Stumpf, and Juliane Liepe
Physical Biology, 2015, Volume 12, Number 6, Page 066001
Eszter Lakatos, Angelique Ale, Paul D. W. Kirk, and Michael P. H. Stumpf
The Journal of Chemical Physics, 2015, Volume 143, Number 9, Page 094107
Xing Ju Lee, Christopher C. Drovandi, and Anthony N. Pettitt
Biometrics, 2015, Volume 71, Number 1, Page 198

Comments (0)

Please log in or register to comment.
Log in