Jump to ContentJump to Main Navigation
Show Summary Details
In This Section

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Stumpf, Michael P.H.

6 Issues per year


IMPACT FACTOR 2016: 0.646
5-year IMPACT FACTOR: 1.191

CiteScore 2016: 0.94

SCImago Journal Rank (SJR) 2015: 0.954
Source Normalized Impact per Paper (SNIP) 2015: 0.554

Mathematical Citation Quotient (MCQ) 2015: 0.06

Online
ISSN
1544-6115
See all formats and pricing
In This Section

Semi-automatic selection of summary statistics for ABC model choice

Dennis Prangle
  • Corresponding author
  • Department of Mathematics and Statistics, Lancaster University, UK
  • Email:
/ Paul Fearnhead
  • Department of Mathematics and Statistics, Lancaster University, UK
/ Murray P. Cox
  • Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand
  • Institute of Fundamental Sciences, Massey University, Palmerston North, New Zealand
/ Patrick J. Biggs
  • Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand
/ Nigel P. French
  • Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand
  • Infectious Disease Research Centre, Institute of Veterinary, Animal and Biomedical Sciences, Massey University, Palmerston North, New Zealand
Published Online: 2013-12-10 | DOI: https://doi.org/10.1515/sagmb-2013-0012

Abstract

A central statistical goal is to choose between alternative explanatory models of data. In many modern applications, such as population genetics, it is not possible to apply standard methods based on evaluating the likelihood functions of the models, as these are numerically intractable. Approximate Bayesian computation (ABC) is a commonly used alternative for such situations. ABC simulates data x for many parameter values under each model, which is compared to the observed data xobs. More weight is placed on models under which S(x) is close to S(xobs), where S maps data to a vector of summary statistics. Previous work has shown the choice of S is crucial to the efficiency and accuracy of ABC. This paper provides a method to select good summary statistics for model choice. It uses a preliminary step, simulating many x values from all models and fitting regressions to this with the model as response. The resulting model weight estimators are used as S in an ABC analysis. Theoretical results are given to justify this as approximating low dimensional sufficient statistics. A substantive application is presented: choosing between competing coalescent models of demographic growth for Campylobacter jejuni in New Zealand using multi-locus sequence typing data.

This article offers supplementary material which is provided at the end of the article.

Keywords: ABC; model selection; sufficiency; Campylobacter; MLST; coalescent

References

  • Atkinson, I. A. and E. K. Cameron (1993): “Human influence on the terrestrial biota and biotic communities of New Zealand,” Trends in Ecology & Evolution, 8, 447–451. [Crossref]

  • Barnes, C. P., S. Filippi, M. P. H. Stumpf and T. Thorne (2012a): “Considerate approaches to constructing summary statistics for ABC model selection,” Statistics and Computing, 22, 1181–1197. [Web of Science] [Crossref]

  • Barnes, C. P., S. Filippi and M. P. H. Stumpf (2012b): “Contribution to the discussion of Fearnhead and Prangle (2012). Constructing summary statistics for approximate Bayesian computation: Semi-automatic approximate Bayesian computation,” Journal of the Royal Statistical Society: Series B, 74, 453. [Web of Science]

  • Beaumont, M. A. (2008): “Joint determination of topology, divergence time, and immigration in population trees,” In: C. Renfrew, S. Matsumura, and P. Forster, editors, Simulation, Genetics and Human Prehistory. McDonald Institute Monographs, pp. 134–154.

  • Beaumont, M. A., W. Zhang and D. J. Balding (2002): “Approximate Bayesian computation in population genetics,” Genetics, 162, 2025–2035. [PubMed]

  • Blum, M. G. B. (2010): “Approximate Bayesian computation: a nonparametric perspective,” Journal of the American Statistical Association, 105 (491), 1178–1187. [Web of Science]

  • Blum, M. G. B. and O. François (2010): “Non-linear regression models for approximate Bayesian computation,” Statistics and Computing, 20, 63–73. [Web of Science] [Crossref]

  • Blum, M. G. B., M. A. Nunes, D. Prangle and S. A. Sisson (2013): “A comparative review of dimension reduction methods in approximate Bayesian computation,” Statistical Science, 28, 189–208. [Web of Science] [Crossref]

  • Del Moral, P., A. Doucet and A. Jasra (2012): “An adaptive sequential Monte Carlo method for approximate Bayesian computation,” Statistics and Computing, 22 (5), 1009–1020. [Web of Science]

  • Didelot, X., R. G. Everitt, A. M. Johansen and D. J. Lawson (2011): “Likelihood-free estimation of model evidence,” Bayesian Analysis 6 (1), 49–76.

  • Dingle, K. E., F. M. Colles, D. R. A. Wareing, M. C. J. Maiden, M. C. J. Ure, R. Maiden, A. J. Fox, F. E. Bolton, H. J. Bootsma, R. J. Willems, R. Urwin and M. C. Maiden (2001): “Multilocus sequence typing system for Campylobacter jejuni,” Journal of Clinical Microbiology, 39, 14–23. [Crossref]

  • Drovandi, C. C. and A. N. Pettitt (2011): “Estimation of parameters for macroparasite population evolution using approximate Bayesian computation,” Biometrics, 67 (1), 225–233. [PubMed] [Web of Science] [Crossref]

  • Estoup, A., E. Lombaert, J.-M. Marin, T. Guillemaud, P. Pudlo, C. P. Robert and J. Cornuet (2012): “Estimation of demo-genetic model probabilities with approximate Bayesian computation using linear discriminant analysis on summary statistics,” Molecular Ecology Resources, 12 (5), 846–855. [Web of Science] [Crossref] [PubMed]

  • Fan, Y., D. J. Nott and S. A. Sisson (2013): Approximate Bayesian computation via regression density estimation. Stat, 2, 34–48. [Crossref]

  • Fearnhead, P. and D. Prangle (2012): “Constructing summary statistics for approximate Bayesian computation: semi-automatic ABC,” Journal of the Royal Statistical Society, Series B, 74, 419–474. [Web of Science] [Crossref]

  • French, N., S. Yu, P. Biggs, B. Holland, P. Fearnhead, B. Binney, A. Fox, D. H. Grove-White, J. Leigh, W. Miller, P. Muellner and P. Carter (2014): “Evolution of Campylobacter species in New Zealand,” In S. Sheppard and G. Méric, editors, Campylobacter Ecology and Evolution. Caister Academic Press, Norfolk.

  • Friedman, J., T. Hastie, and R. Tibshirani (2010): “Regularization paths for generalized linear models via coordinate descent,” Journal of Statistical Software, 33 (1).

  • Grelaud, A., C. Robert, J.-M. Marin, F. Rodolphe and J. F. Taly (2009): “ABC likelihood-free methods for model choice in Gibbs random fields,” Bayesian Analysis, 4 (2), 317–336. [Web of Science]

  • Hudson, R. R. (2002): “Generating samples under a Wright-Fisher neutral model of genetic variation,” Bioinformatics, 18, 337–338. [Crossref]

  • Humphrey, T., S. O’Brien and M. Madsen (2007): “Campylobacters as zoonotic pathogens: a food production perspective,” International Journal of Food Microbiology, 117 (3), 237–57. [Crossref] [Web of Science]

  • Joyce, P. and P. Marjoram (2008): “Approximately sufficient statistics and Bayesian computation,” Statistical Applications in Genetics and Molecular Biology, 7, 2008. Article 26.

  • Kolmogorov, A. N. (1942): “Determination of centre of dispersion and measure of accuracy from a finite number of observations (in Russian),” Izv. Akad. Nauk, USSR Ser. Mat., 6, 3–32.

  • Liu, J. S. (1996): “Metropolized independent sampling with comparisons to rejection sampling and importance sampling,” Statistics and Computing, 6, 113–119. [Crossref]

  • Marin, J.-M., N. Pillai, C. P. Robert and J. Rousseau (2013): “Relevant statistics for Bayesian model choice,” Preprint. Available at http://www.arxiv.org/abs/1110.4700.

  • Mullner, P., S. E. F. Spencer, D. J. Wilson, G. Jones, A. D. Noble, A. C. Midwinter, J. M. Collins-Emerson, P. Carter, S. Hathaway and N. P. French (2009): “Assigning the source of human campylobacteriosis in New Zealand: a comparative genetic and epidemiological approach,” Infection, Genetics and Evolution 9 (6), 1311–1319. [Web of Science] [Crossref]

  • Nordborg, M. (2004): “Coalescent theory,” In: D.J. Balding, M. Bishop, C. Cannings (Eds.). Handbook of statistical genetics, Wiley-Interscience, volume 2, New York.

  • Nunes, M. A. and D. J. Balding (2010): “On optimal selection of summary statistics for approximate Bayesian computation,” Statistical Applications in Genetics and Molecular Biology, 9 (1), 2010. [Web of Science]

  • Rambaut, A. and N. C. Grassly (1997): “Seq-gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees,” Computer Applications in the Biosciences, 13, 235–238.

  • Rayner, G. D. and H. L. MacGillivray (2002): “Numerical maximum likelihood estimation for the g-and-k and generalized g-and-h distributions,” Statistics and Computing, 12 (1), 57–75. [Crossref]

  • Robert, C. P. (1996): “Intrinsic losses,” Theory and decision, 40 (2), 191–214. [Crossref]

  • Robert, C. P. (2014): Bayesian computational tools. Annual Review of Statistics and Its Application, 1, 16.1–16.25.

  • Robert, C. P., J. M. Cornuet, J.-M. Marin and N. Pillai (2011): “Lack of confidence in approximate Bayesian computation model choice,” Proceedings of the National Academy of Sciences, 108 (37), 15112–15117. [Web of Science]

  • Savill, M., A. Hudson, M. Devane, N. Garrett, B. Gilpin and A. Ball (2003): “Elucidation of potential transmission routes of Campylobacter in New Zealand,” Water Science and Technology, 47 (3), 31–38.

  • Sears, A., M. G. Baker, N. Wilson, J. Marshall, P. Muellner, D. M. Campbell, R. J. Lake and N. P. French (2011): “Marked campylobacteriosis decline after interventions aimed at poultry, New Zealand,” Emerging Infectious Diseases, 17 (6), 1007–1015. [Web of Science] [PubMed]

  • Sjödin, P., A. E. Sjöstrand, M. Jakobsson and M. G. B. Blum (2012): “Resequencing data provide no evidence for a human bottleneck in Africa during the penultimate glacial period,” Molecular Biology and Evolution, 29 (7), 1851–1860. [Web of Science] [Crossref]

  • Sousa, V. C., M. A. Beaumont, P. Fernandes, M. M. Coelho and L. Chikhi (2012): “Population divergence with or without admixture: selecting models using an ABC approach,” Heredity, 108, 521–530. [Crossref] [PubMed] [Web of Science]

  • Toni, T. and M. P. H. Stumpf (2010): “Simulation-based model selection for dynamical systems in systems and population biology,” Bioinformatics, 26 (1), 104–110. [Crossref] [Web of Science] [PubMed]

  • Wilson, D. J., E. Gabriel, A. J. H. Leatherbarrow, J. Cheesbrough, S. Gee, E. Bolton, A. Fox, C. A. Hart, P. J. Diggle and P. Fearnhead (2009): “Rapid evolution and the importance of recombination to the gastroenteric pathogen Campylobacter jejuni,” Molecular Biology and Evolution, 26 (2), 385–397. [Crossref] [Web of Science]

  • Wiuf C. and J. Hein (2000): “The coalescent with gene conversion,” Genetics, 155, 451–462. [PubMed]

  • Yu, S., P. Fearnhead, B. R. Holland, P. Biggs, M. Maiden and N. P. French (2012): “Estimating the relative roles of recombination and point mutation in the generation of single locus variants in Campylobacter jejuni and Campylobacter coli,” Journal of Molecular Evolution, 74 (5–6), 273–280. [Web of Science] [Crossref]

About the article

Corresponding author: Dennis Prangle, Department of Mathematics and Statistics, Lancaster University, UK, e-mail:


Published Online: 2013-12-10

Published in Print: 2014-02-01



Citation Information: Statistical Applications in Genetics and Molecular Biology, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.1515/sagmb-2013-0012. Export Citation

Supplementary Article Materials

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

[1]
Xing Ju Lee, Christopher C. Drovandi, and Anthony N. Pettitt
Biometrics, 2015, Volume 71, Number 1, Page 198

Comments (0)

Please log in or register to comment.
Log in