Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido


IMPACT FACTOR 2018: 0.536
5-year IMPACT FACTOR: 0.764

CiteScore 2018: 0.49

SCImago Journal Rank (SJR) 2018: 0.316
Source Normalized Impact per Paper (SNIP) 2018: 0.342

Mathematical Citation Quotient (MCQ) 2017: 0.04

Online
ISSN
1544-6115
See all formats and pricing
More options …
Volume 9, Issue 1

Issues

Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

On Optimal Selection of Summary Statistics for Approximate Bayesian Computation

Matthew A Nunes / David J Balding
Published Online: 2010-09-06 | DOI: https://doi.org/10.2202/1544-6115.1576

How best to summarize large and complex datasets is a problem that arises in many areas of science. We approach it from the point of view of seeking data summaries that minimize the average squared error of the posterior distribution for a parameter of interest under approximate Bayesian computation (ABC). In ABC, simulation under the model replaces computation of the likelihood, which is convenient for many complex models. Simulated and observed datasets are usually compared using summary statistics, typically in practice chosen on the basis of the investigator's intuition and established practice in the field. We propose two algorithms for automated choice of efficient data summaries. Firstly, we motivate minimisation of the estimated entropy of the posterior approximation as a heuristic for the selection of summary statistics. Secondly, we propose a two-stage procedure: the minimum-entropy algorithm is used to identify simulated datasets close to that observed, and these are each successively regarded as observed datasets for which the mean root integrated squared error of the ABC posterior approximation is minimized over sets of summary statistics. In a simulation study, we both singly and jointly inferred the scaled mutation and recombination parameters from a population sample of DNA sequences. The computationally-fast minimum entropy algorithm showed a modest improvement over existing methods while our two-stage procedure showed substantial and highly-significant further improvement for both univariate and bivariate inferences. We found that the optimal set of summary statistics was highly dataset specific, suggesting that more generally there may be no globally-optimal choice, which argues for a new selection for each dataset even if the model and target of inference are unchanged.

Keywords: data reduction; computational statistics; likelihood free inference; entropy; sufficiency

About the article

Published Online: 2010-09-06


Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 9, Issue 1, ISSN (Online) 1544-6115, DOI: https://doi.org/10.2202/1544-6115.1576.

Export Citation

©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston.Get Permission

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

[1]
Sofia Bodare, Michael Stocks, Jeng-Chuann Yang, and Martin Lascoux
Ecology and Evolution, 2013, Page n/a
[2]
Louis Raynal, Jean-Michel Marin, Pierre Pudlo, Mathieu Ribatet, Christian P Robert, Arnaud Estoup, and Oliver Stegle
Bioinformatics, 2018
[3]
Annabel C. Beichman, Emilia Huerta-Sanchez, and Kirk E. Lohmueller
Annual Review of Ecology, Evolution, and Systematics, 2018, Volume 49, Number 1, Page 433
[4]
Yang Zeng, Xiancheng Yu, and Hu Wang
International Journal of Solids and Structures, 2018
[5]
Andrew Parker, Matthew J. Simpson, and Ruth E. Baker
Royal Society Open Science, 2018, Volume 5, Number 8, Page 180384
[6]
Christelle Fraïsse, Camille Roux, Pierre-Alexandre Gagnaire, Jonathan Romiguier, Nicolas Faivre, John J. Welch, and Nicolas Bierne
PeerJ, 2018, Volume 6, Page e5198
[7]
Niall P. Cooke and Shigeki Nakagome
Current Opinion in Genetics & Development, 2018, Volume 53, Page 60
[8]
Shiladitya Chatterjee and Matthew R. Linford
Bulletin of the Chemical Society of Japan, 2018, Volume 91, Number 5, Page 824
[9]
M. Bee and L. Trapin
Communications in Statistics - Theory and Methods, 2017, Page 0
[10]
Pascale Gerbault, Adam Powell, and Mark G. Thomas
Anthropozoologica, 2012, Volume 47, Number 2, Page 64
[11]
Juliane Liepe, Paul Kirk, Sarah Filippi, Tina Toni, Chris P Barnes, and Michael P H Stumpf
Nature Protocols, 2014, Volume 9, Number 2, Page 439
[12]
Simon Dellicour, Chedly Kastally, Olivier J. Hardy, and Patrick Mardulyn
Molecular Biology and Evolution, 2014, Volume 31, Number 12, Page 3359
[13]
S. Dellicour, P. Mardulyn, O. J. Hardy, C. Hardy, S. P. M. Roberts, and N. J. Vereecken
Journal of Evolutionary Biology, 2014, Volume 27, Number 1, Page 116
[14]
Jui-Hua Chu, Daniel Wegmann, Chia-Fen Yeh, Rong-Chien Lin, Xiao-Jun Yang, Fu-Min Lei, Cheng-Te Yao, Fa-Sheng Zou, and Shou-Hsien Li
Molecular Biology and Evolution, 2013, Volume 30, Number 11, Page 2519
[15]
David J. Huggins
Journal of Computational Chemistry, 2014, Volume 35, Number 5, Page 377
[16]
Michael Creel and Dennis Kristensen
Computational Statistics & Data Analysis, 2016, Volume 100, Page 99
[17]
Roman Jandarov, Murali Haran, Ottar Bjørnstad, and Bryan Grenfell
Journal of the Royal Statistical Society: Series C (Applied Statistics), 2014, Volume 63, Number 3, Page 423
[18]
Junsong Zhao, Matthew P. Salomon, Darryl Shibata, Christina Curtis, Kimberly Siegmund, Paul Marjoram, and Shree Ram Singh
PLOS ONE, 2017, Volume 12, Number 3, Page e0172516
[19]
Mikael Sunnåker, Alberto Giovanni Busetto, Elina Numminen, Jukka Corander, Matthieu Foll, Christophe Dessimoz, and Shoshana Wodak
PLoS Computational Biology, 2013, Volume 9, Number 1, Page e1002803
[20]
Nick Jagiella, Dennis Rickert, Fabian J. Theis, and Jan Hasenauer
Cell Systems, 2017, Volume 4, Number 2, Page 194
[21]
Aurélien Tellier, Peter Pfaffelhuber, Bernhard Haubold, Lisha Naduvilezhath, Laura E. Rose, Thomas Städler, Wolfgang Stephan, Dirk Metzler, and John J. Welch
PLoS ONE, 2011, Volume 6, Number 5, Page e18155
[22]
P Marjoram, A Zubair, and S V Nuzhdin
Heredity, 2014, Volume 112, Number 1, Page 79
[23]
Paul Kirk, Thomas Thorne, and Michael PH Stumpf
Current Opinion in Biotechnology, 2013, Volume 24, Number 4, Page 767
[24]
J. Li, D.J. Nott, Y. Fan, and S.A. Sisson
Computational Statistics & Data Analysis, 2017, Volume 106, Page 77
[25]
Robert J. H. Ross, R. E. Baker, Andrew Parker, M. J. Ford, R. L. Mort, and C. A. Yates
npj Systems Biology and Applications, 2017, Volume 3, Number 1
[26]
Laurent Excoffier, Isabelle Dupanloup, Emilia Huerta-Sánchez, Vitor C. Sousa, Matthieu Foll, and Joshua M. Akey
PLoS Genetics, 2013, Volume 9, Number 10, Page e1003905
[27]
Sara Sheehan, Yun S. Song, and Kevin Chen
PLOS Computational Biology, 2016, Volume 12, Number 3, Page e1004845
[28]
Michael Stocks, Mathieu Siol, Martin Lascoux, Stéphane De Mita, and Magnus Rattray
PLoS ONE, 2014, Volume 9, Number 6, Page e99581
[29]
Shigeki Nakagome
Genes & Genetic Systems, 2015, Volume 90, Number 3, Page 153
[30]
Diego F. Alvarado-Serrano, Michael J. Hickerson, and Robert Freckleton
Methods in Ecology and Evolution, 2016, Volume 7, Number 4, Page 418
[31]
N. Dussex, D. Wegmann, and B. C. Robertson
Molecular Ecology, 2014, Volume 23, Number 9, Page 2193
[32]
Christopher C. Drovandi and Anthony N. Pettitt
Biometrics, 2013, Volume 69, Number 4, Page 937
[33]
Tom Burr and Alexei Skurikhin
BioMed Research International, 2013, Volume 2013, Page 1
[34]
S. Aeschbacher, A. Futschik, and M. A. Beaumont
Molecular Ecology, 2013, Volume 22, Number 4, Page 987
[35]
E. LOMBAERT, T. GUILLEMAUD, C. E. THOMAS, L. J. LAWSON HANDLEY, J. LI, S. WANG, H. PANG, I. GORYACHEVA, I. A. ZAKHAROV, E. JOUSSELIN, R. L. POLAND, A. MIGEON, J. Van LENTEREN, P. DE CLERCQ, N. BERKVENS, W. JONES, and A. ESTOUP
Molecular Ecology, 2011, Volume 20, Number 22, Page 4654

Comments (0)

Please log in or register to comment.
Log in