Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido

IMPACT FACTOR 2018: 0.536
5-year IMPACT FACTOR: 0.764

CiteScore 2018: 0.49

SCImago Journal Rank (SJR) 2018: 0.316
Source Normalized Impact per Paper (SNIP) 2018: 0.342

Mathematical Citation Quotient (MCQ) 2018: 0.02

See all formats and pricing
More options …
Volume 17, Issue 3


Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

Non-parametric estimation of population size changes from the site frequency spectrum

Berit Lindum Waltoft
  • Corresponding author
  • Bioinformatics Research Centre, Aarhus University, C.F. Møllers allé 8, 8000 Aarhus C, Denmark, Phone: +45 87165763
  • National Centre for Register-based Research, Department of Economics and Business, Aarhus University, Fuglesangs allé 26, 8210 Aarhus V, Denmark
  • The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Asger Hobolth
Published Online: 2018-06-11 | DOI: https://doi.org/10.1515/sagmb-2017-0061


Changes in population size is a useful quantity for understanding the evolutionary history of a species. Genetic variation within a species can be summarized by the site frequency spectrum (SFS). For a sample of size n, the SFS is a vector of length n − 1 where entry i is the number of sites where the mutant base appears i times and the ancestral base appears ni times. We present a new method, CubSFS, for estimating the changes in population size of a panmictic population from an observed SFS. First, we provide a straightforward proof for the expression of the expected site frequency spectrum depending only on the population size. Our derivation is based on an eigenvalue decomposition of the instantaneous coalescent rate matrix. Second, we solve the inverse problem of determining the changes in population size from an observed SFS. Our solution is based on a cubic spline for the population size. The cubic spline is determined by minimizing the weighted average of two terms, namely (i) the goodness of fit to the observed SFS, and (ii) a penalty term based on the smoothness of the changes. The weight is determined by cross-validation. The new method is validated on simulated demographic histories and applied on unfolded and folded SFS from 26 different human populations from the 1000 Genomes Project.

This article offers supplementary material which is provided at the end of the article.

Keywords: Coalescent theory; population size; regularization; site frequency spectrum


  • Bhaskar, A. and Y. S. Song (2014): “Descartes’ rule of signs and the identifiability of populations demographic models from genomic variation data,” Ann. Stat., 42, 2469–2493.CrossrefPubMedGoogle Scholar

  • Bhaskar, A., Y. S. R. Wang and Y. S. Song (2015): “Efficient inference of population size histories and locus-specific mutation rate from large-sample genomic variation data,” Genome Res., 25, 268–279.PubMedCrossrefGoogle Scholar

  • Birgin, E. G. and J. M. Martínez (2008): “Improving ultimate convergence of an augmented Lagrangian method,” Optim. Method. Softw., 23, 177–195.CrossrefWeb of ScienceGoogle Scholar

  • Boitard, S., W. Rodriguez, F. Jay, S. Mona and F. Austerlitz (2016): “Inferring population size history from large samples of Genome-wide molecular data – an approximate Bayesian computation approach,” PLoS Genet., 12, e1005877.CrossrefWeb of SciencePubMedGoogle Scholar

  • Eldon, B., M. Birkner, J. Blath and F. Freund (2015): “Can the site-frequency spectrum distinguish exponential population growth from multiple-merger coalescents?” Genetics, 199, 841–856.CrossrefWeb of SciencePubMedGoogle Scholar

  • Excoffier, L., I. Dupanloup, E. Huerta Sánchez, V. C. Sousa and M. Foll (2013): “Robust demographic inference from Genomic and SNP data,” PLoS Genet., 9, e1003905.Web of SciencePubMedCrossrefGoogle Scholar

  • Gao, F. and A. Keinan (2016): “Inference of super-exponential human population growth via efficient computation of the site frequency spectrum for generalized models,” Genetics, 202, 235–245.CrossrefWeb of SciencePubMedGoogle Scholar

  • Gattepaille, L., T. Gunther and M. Jakobsson (2016): “Inferring past effective population size from distributions of coalescent times,” Genetics, 204, 1191–1206.CrossrefWeb of SciencePubMedGoogle Scholar

  • Green, P. J. and B. W. Silvermann (1994): Nonparametric regression and generalized linear models, Chapman & Hall/CRC, Londan.Google Scholar

  • Gutenkunst, R. N., R. D. Hernandex, S. H. Williamson and C. D. Bustamante (2009): “Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data,” PLoS Genet., 5, e1000695.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Gutenkunst, R. N., R. D. Herandez, S. H. Williamson and C. D. Bustamante (2010): “Diffusion approximations for demographic interence: δaδi”, Nature Precedings. http://precedings.nature.com/documents/4594/version/1.Google Scholar

  • Lan, S., J. Palacios, M. Karcher, V. N. Minin and B. Shahbaba (2015): “An efficient Bayesian inference framework for coalescent-based nonparametric phylodynamics,” Bioinformatics, 31, 3282–3289.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Lapierre, M., A. Lambert and G. Achaz (2017): “Accuracy of demographic inferences from the site frequency spectrum: the case of the yoruba population,” Genetics, 206, 439–449.Web of SciencePubMedCrossrefGoogle Scholar

  • Li, H. and R. Durbin (2011): “Inference of human population history from individual whole-genome sequences,” Nature, 475, 493–496.Web of SciencePubMedCrossrefGoogle Scholar

  • Liu, X. and Y. Fu (2015): “Exploring population size changes using SNP frequency spectra,” Nature Genet., 47, 555–559.CrossrefWeb of ScienceGoogle Scholar

  • Lukic, S. and J. Hey (2011): “Non-equilibrium allele frequency spectar via spectral metods,” Theor. Popul. Biol., 79, 203–219.CrossrefGoogle Scholar

  • Lukic, S. and J. Hey (2012): “Demographic inference using spectral methods on SNP data, with an analysis of the human out-of-Africa expansion,” Genetics, 192, 619–639.PubMedCrossrefWeb of ScienceGoogle Scholar

  • Mazet, O, W. Rodrígues, S. Gruseq, S. Boitard and L. Chikhi (2016): “On the importance of being structured: instantaneous coalescence rates and human evolution – lessions for ancestral population size inference?” Heredity, 116, 362–371.CrossrefGoogle Scholar

  • Myers, S., C. Fefferman and N. Patterson (2008): “Can one learn history from the allelic spectrum,” Theor. Popul. Biol., 73, 342–348.Web of SciencePubMedCrossrefGoogle Scholar

  • Palacios, J. and V. N. Minin (2013): “Gaussian process-based Bayesian nonparametric inference of population size trajectories from Gene Genealogies,” Biometrics, 69, 8–18.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Palacios, J. A., J. Wakaley and S. Ramachandran (2015): “Bayesian nonparametric inference of population size changes from sequential Genealogies,” Genetics, 201, 281–304.PubMedCrossrefWeb of ScienceGoogle Scholar

  • Polanski, A. and M. Kimmel (2003): “New explicit expressions for relative frequencies of single-nucleotide polymorphisms with application to statistical inference on population growth,” Genetics, 165, 427–436.PubMedGoogle Scholar

  • Polanski, A., A. Bobrowski and M. Kimmel (2003): “A note on distributions of times to coalescence, under time-dependent population size,” Theor. Popul. Biol., 63, 33–40.CrossrefPubMedGoogle Scholar

  • Powell, M. J. D. (1994): Advances in Optimization and Numerical Analysis, chapter A Direct Search Optimization Method That Models the Objective and Constraint Functions by Linear Interpolation, 51–67. Springer Netherlands, Dordrecht.Google Scholar

  • Powell, M. J. D. (1998): “Direct search algorithms for optimization calculations,” Acta Numerica, 7, 287–336.CrossrefGoogle Scholar

  • Reppell, M., M. Boehnke and S. Zôllner (2014): “The impact of accelerating faster than exponential population growth on genetic variation,” Genetics, 196, 819–828.CrossrefPubMedGoogle Scholar

  • Schiffels, S. and R. Durbin (2014): “Inferring human population size and separation history from multiple genome sequences,” Nature Genet., 46, 919–925.CrossrefWeb of ScienceGoogle Scholar

  • Sheehan, S., K. Harris and Y. S. Song (2013): “Estimating variable effective population size from multiple genomes: a sequenctially Markov conditional sampling distribution approach,” Genetics, 194, 647–662.CrossrefPubMedGoogle Scholar

  • Terhorst, J., J. A. Kamm and Y. S. Song (2017): “Robust and scalable inference of population history from hundreds of unphased whole genomes,” Nature Genet., 49, 303–309.CrossrefGoogle Scholar

  • The 1000 Genomes Project Consortium (2015): “A global reference for human genetic variation,” Nature, 526, 68–74.Web of SciencePubMedGoogle Scholar

  • Wakeley, J. (2009): Coalescent theory: an introduction, Roberts and Company Publishers, Greenwood Village, Colorado 80111, USA.Google Scholar

About the article

Published Online: 2018-06-11

Funding Source: Lundbeck Foundation

Award identifier / Grant number: R155–2014–1724

BLW is funded by The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Denmark. Grant number R155–2014–1724.

Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 17, Issue 3, 20170061, ISSN (Online) 1544-6115, DOI: https://doi.org/10.1515/sagmb-2017-0061.

Export Citation

©2018 Walter de Gruyter GmbH, Berlin/Boston.Get Permission

Supplementary Article Materials

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Theresa L. Cole, Ludovic Dutoit, Nicolas Dussex, Tom Hart, Alana Alexander, Jane L. Younger, Gemma V. Clucas, María José Frugone, Yves Cherel, Richard Cuthbert, Ursula Ellenberg, Steven R. Fiddaman, Johanna Hiscock, David Houston, Pierre Jouventin, Thomas Mattern, Gary Miller, Colin Miskelly, Paul Nolan, Michael J. Polito, Petra Quillfeldt, Peter G. Ryan, Adrian Smith, Alan J. D. Tennyson, David Thompson, Barbara Wienecke, Juliana A. Vianna, and Jonathan M. Waters
Proceedings of the National Academy of Sciences, 2019, Page 201904048
Andrew Melfi and Divakar Viswanath
Theoretical Population Biology, 2018
Jeffrey P Spence, Matthias Steinrücken, Jonathan Terhorst, and Yun S Song
Current Opinion in Genetics & Development, 2018, Volume 53, Page 70

Comments (0)

Please log in or register to comment.
Log in