Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido

6 Issues per year


IMPACT FACTOR 2017: 0.812
5-year IMPACT FACTOR: 1.104

CiteScore 2017: 0.86

SCImago Journal Rank (SJR) 2017: 0.456
Source Normalized Impact per Paper (SNIP) 2017: 0.527

Mathematical Citation Quotient (MCQ) 2016: 0.06

Online
ISSN
1544-6115
See all formats and pricing
More options …
Volume 17, Issue 3

Issues

Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

Bayesian inference of selection in the Wright-Fisher diffusion model

Jeffrey J. Gory / Radu Herbei / Laura S. Kubatko
  • Corresponding author
  • Departments of Statistics and Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH 43210, USA
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2018-06-06 | DOI: https://doi.org/10.1515/sagmb-2017-0046

Abstract

The increasing availability of population-level allele frequency data across one or more related populations necessitates the development of methods that can efficiently estimate population genetics parameters, such as the strength of selection acting on the population(s), from such data. Existing methods for this problem in the setting of the Wright-Fisher diffusion model are primarily likelihood-based, and rely on numerical approximation for likelihood computation and on bootstrapping for assessment of variability in the resulting estimates, requiring extensive computation. Recent work has provided a method for obtaining exact samples from general Wright-Fisher diffusion processes, enabling the development of methods for Bayesian estimation in this setting. We develop and implement a Bayesian method for estimating the strength of selection based on the Wright-Fisher diffusion for data sampled at a single time point. The method utilizes the latest algorithms for exact sampling to devise a Markov chain Monte Carlo procedure to draw samples from the joint posterior distribution of the selection coefficient and the allele frequencies. We demonstrate that when assumptions about the initial allele frequencies are accurate the method performs well for both simulated data and for an empirical data set on hypoxia in flies, where we find evidence for strong positive selection in a region of chromosome 2L previously identified. We discuss possible extensions of our method to the more general settings commonly encountered in practice, highlighting the advantages of Bayesian approaches to inference in this setting.

Keywords: diffusion model; selection; Wright-Fisher model

References

  • Aït-Sahalia, Y. (2002): “Maximum likelihood estimation of discretely sampled diffusions: a closed-form approximation approach,” Econometrica, 70, 223–262.CrossrefGoogle Scholar

  • Aït-Sahalia, Y. (2008): “Closed-form likelihood expansions for multivariate diffusions,” Ann. Stat., 36, 906–937.CrossrefGoogle Scholar

  • Alachiotis, N., A. Stamatakis and P. Pavlidis (2012): “OmegaPlus: a scalable tool for rapid detection of selective sweeps in whole-genome datasets,” Bioinformatics, 28, 2274–2275.PubMedCrossrefGoogle Scholar

  • Beskos, A. and G. O. Roberts (2005): “Exact simulation of diffusions,” Ann. Appl. Probab., 15, 2422–2444.CrossrefGoogle Scholar

  • Beskos, A., O. Papaspiliopoulos, G. O. Roberts and P. Fearnhead (2006a): “Exact and computationally efficient likelihood-based estimation for discretely observed diffusion processes,” J. R. Stat. Soc. Series B Stat. Methodol., 68, 333–382.CrossrefGoogle Scholar

  • Beskos, A., O. Papaspiliopoulos and G. O. Roberts (2006b): “Retrospective exact simulation of diffusion sample paths with applications,” Bernoulli, 12, 1077–1098.CrossrefGoogle Scholar

  • Beskos, A., O. Papaspiliopoulos and G. O. Roberts (2008): “A factorisation of diffusion measure and finite sample path constructions,” Methodol. Comput. Appl. Probab., 10, 85–104.CrossrefGoogle Scholar

  • Brandt, M. W. and P. Santa-Clara (2002): “Simulated likelihood estimation of diffusions with an application to exchange rate dynamics in incomplete markets,” J. Financ. Econ., 63, 161–210.CrossrefGoogle Scholar

  • Burbrink, F. T. and T. J. Guiher (2015): “Considering gene flow when using coalescent methods to delimit lineages of North American pitvipers of the genus Agkistrodon,” Zool. J. Linn. Soc., 173, 505–526.CrossrefGoogle Scholar

  • Bustamante, C. D., J. Wakeley, S. Sawyer and D. L. Hartl (2001): “Directional selection and the site-frequency spectrum,” Genetics, 159, 1779–1788.PubMedGoogle Scholar

  • Coffman, A. C., P. H. Hsieh, S. Gravel and R. N. Gutenkunst (2015): “Computationally efficient composite likelihood statistics for demographic inference,” Mol. Biol. Evol., 33, 591–593.PubMedGoogle Scholar

  • Devroye, L. (1986): Non-uniform random variate generation, New York: Springer-Verlag.Google Scholar

  • Durham, G. B. and A. R. Gallant (2002): “Numerical techniques for maximum likelihood estimation of continuous-time diffusion processes,” J. Bus. Econ. Stat., 20, 297–316.CrossrefGoogle Scholar

  • Eckert, A. J. and B. C. Carstens (2008): “Does gene flow destroy phylogenetic signal? The performance of three methods for estimating species phylogenies in the presence of gene flow,” Mol. Phylogenetics Evol., 49, 832–842.CrossrefGoogle Scholar

  • Edwards, S. V. (2009): “Is a new and general theory of molecular systematics emerging?” Evolution, 63, 1–19.PubMedCrossrefGoogle Scholar

  • Elerian, O., S. Chib, and N. Shephard (2001): “Likelihood inference for discretely observed nonlinear diffusions,” Econometrica, 69, 959–993.CrossrefGoogle Scholar

  • Etheridge, A. (2011): Some mathematical models from population genetics: École D’Été de Probabilités de Saint-Flour XXXIX-2009, Vol. 39, Heidelberg: Springer Science & Business Media.Google Scholar

  • Ethier, S. N. and R. C. Griffiths (1993): “The transition function of a Fleming-Viot process,” Ann. Probab., 21, 1571–1590.CrossrefGoogle Scholar

  • Ferrer-Admetlla, A., C. Leuenberger, J. D. Jensen and D. Wegmann (2016): “An approximate Markov model for the Wright-Fisher diffusion and its application to time series data,” Genetics, 203, 831–846.CrossrefPubMedGoogle Scholar

  • Fisher, R. A. (1930): The genetical theory of natural selection, Oxford: Clarendon Press.Google Scholar

  • Foll, M., H. Shim and J. D. Jensen (2015): “WFABC: a Wright-Fisher ABC-based approach for inferring effective population sizes and selection coefficients from time-sampled data,” Mol. Ecol. Resour., 15, 87–98.PubMedCrossrefGoogle Scholar

  • Golightly, A. and D. J. Wilkinson (2008): “Bayesian inference for nonlinear multivariate diffusion models observed with error,” Comput. Stat. Data Anal., 52, 1674–1693.CrossrefGoogle Scholar

  • Griffiths, R. C. (1979): “A transition density expansion for a multi-allele diffusion model,” Adv. Appl. Probab., 11, 310–325.CrossrefGoogle Scholar

  • Griffiths, R. C. and D. Spanò (2013): “Orthogonal polynomial kernels and canonical correlations for Dirichlet measures,” Bernoulli, 19, 548–598.CrossrefGoogle Scholar

  • Gutenkunst, R. N., R. D. Hernandez, S. H. Williamson and C. D. Bustamante (2009): “Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data,” PLoS Genet., 5, e10000695.PubMedGoogle Scholar

  • Gutenkunst, R., R. Hernandez, S. Williamson and C. Bustamante (2010): “Diffusion approximations for demographic inference: DaDi,” Nat. Prec., <http://hdl.handle.net/10101/npre.2010.4594.1>.Google Scholar

  • Hey, J. and R. Nielsen (2007): “Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics,” Proc. Natl. Acad. Sci. USA, 104, 2785–2790.CrossrefGoogle Scholar

  • Jenkins, P. A. and D. Spanó (2015): “Exact simulation of the Wright-Fisher diffusion,” arXiv:1506.06998v1.Google Scholar

  • Jewett, E. M., M. Steinrücken and Y. S. Song (2016): “The effects of population size histories on estimates of selection coefficients from time-series genetic data,” Mol. Biol. Evol., 33, 3002–3027.PubMedCrossrefGoogle Scholar

  • Karatzas, I. and S. Shreve (1998): Brownian motion and stochastic calculus, vol. 113, New York: Springer Science & Business Media.Google Scholar

  • Kim, Y. and R. Nielsen (2004): “Linkage disequilibrium as a signature of selective sweeps,” Genetics, 167, 1513–1524.CrossrefPubMedGoogle Scholar

  • Kingman, J. F. C. (1982a): “The coalescent,” Stoch. Process. Appl., 13, 235–248.CrossrefGoogle Scholar

  • Kingman, J. F. C. (1982b): “Exchangeability and the evolution of large populations.” In: Koch, G. and F. Spizzichino, editors, Exchangeability in Probability and Statistics, North-Holland, Amsterdam, pp. 97–112.Google Scholar

  • Kingman, J. F. C. (1982c): “On the genealogy of large populations,” J. Appl. Probab., 19A, 27–43.Google Scholar

  • Kloeden, P. E. and E. Platen (1992): Numerical solution of stochastic differential equations (stochastic modelling and applied probability), Springer-Verlag.Google Scholar

  • Knowles, L. L. and L. S. Kubatko (2010): Estimating species trees: practical and theoretical aspects, Wiley-Blackwell.Google Scholar

  • Leache, A. D., R. B. Harris, B. Rannala and Z. Yang (2014): “The influence of gene flow on species tree estimation: a simulation study,” Syst. Biol., 63, 17–30.PubMedCrossrefGoogle Scholar

  • Li, H. and W. Stephan (2005): “Maximum-likelihood methods for detecting recent positive selection and localizing the selected site in the genome,” Genetics, 171, 377–384.CrossrefPubMedGoogle Scholar

  • Lin, M., R. Chen and P. Mykland (2010): “On generating Monte Carlo samples of continuous diffusion bridges,” J. Am. Stat. Assoc., 105, 820–838.CrossrefGoogle Scholar

  • Lo, A. W. (1988): “Maximum likelihood estimation of generalized Itô processes with discretely sampled data,” Econ. Theor., 4, 231–247.CrossrefGoogle Scholar

  • Malaspinas, A.-S., O. Malaspinas, S. N. Evans and M. Slatkin (2012): “Estimating allele age and selection coefficient from time-serial data,” Genetics, 192, 599–607.PubMedCrossrefGoogle Scholar

  • Nielsen, R. and J. Wakeley (2001): “Distinguishing migration from isolation: a Markov chain Monte Carlo approach,” Genetics, 158, 885–896.PubMedGoogle Scholar

  • Nielsen, R., S. Williamson, Y. Kim, M. J. Hubisz, A. G. Clark and C. Bustamante (2005): “Genomic scans for selective sweeps using SNP data,” Genome Res., 15, 1566–1575.PubMedCrossrefGoogle Scholar

  • Pavlidis, P., D. Živković, A. Stamatakis and N. Alachiotis (2013): “SweeD: likelihood-based detection of selective sweeps in thousands of genomes,” Mol. Biol. Evol., 30, 2224–2234.CrossrefPubMedGoogle Scholar

  • Pedersen, A. R. (1995): “A new approach to maximum likelihood estimation for stochastic differential equations based on discrete observations,” Scand. J. Stat., 22, 55–71.Google Scholar

  • Robinson, J. D., A. J. Coffman, M. J. Hickerson and R. N. Gutenkunst (2014): “Sampling strategies for frequency spectrum-based population genomic inference,” BMC Evol. Biol., 14, 254.CrossrefPubMedGoogle Scholar

  • Ronen, R., N. Upda, E. Halperin and V. Bafna (2013): “Learning natural selection from the site frequency spectrum,” Genetics, 195, 181–193.PubMedCrossrefGoogle Scholar

  • Schraiber, J. G., S. N. Evans and M. Slatkin (2016): “Bayesian inference of natural selection from allele frequency time series,” Genetics, 203, 493–511.CrossrefPubMedGoogle Scholar

  • Scott, D. W. (1992): Multivariate density estimation: theory, practice, and visualization, John Wiley.Google Scholar

  • Song, Y. S. and M. Steinrücken (2012): “A simple method for finding explicit analytic transition densities of diffusion processes with general diploid selection,” Genetics, 190, 1117–1129.PubMedCrossrefGoogle Scholar

  • Steinrücken, M., Y. X. R. Wang and Y. S. Song (2013): “An explicit transition density expansion for a multi-allelic Wright-Fisher diffusion with general diploid selection,” Theor. Popul. Biol., 83, 1–14.PubMedCrossrefGoogle Scholar

  • Sun, L., C. Lee and J. A. Hoeting (2015): “Parameter inference and model selection in deterministic and stochastic dynamical models via approximate bayesian computation: modeling a wildlife epidemic,” Environmetrics, 26, 451–462.CrossrefGoogle Scholar

  • Tavaré, S. (1984): “Line-of-descent and genealogical processes, and their applications in population genetics models,” Theor. Popul. Biol., 26, 119–164.PubMedCrossrefGoogle Scholar

  • Wakeley, J. (2009): Coalescent theory: an introduction, Roberts and Company.Google Scholar

  • Wakeley, J. and J. Hey (1997): “Estimating ancestral population parameters,” Genetics, 145, 847–855.PubMedGoogle Scholar

  • Williamson, S. H., R. Hernandez, A. Fledel-Alon, L. Zhu, R. Nielsen and C. D. Bustamante (2005): Simultaneous inference of selection and population growth from patterns of variation in the human genome,” Proc. Natl. Acad. Sci., 102, 7882–7887.CrossrefGoogle Scholar

  • Wright, S. (1931): “Evolution in Mendelian populations,” Genetics, 16, 97–159.PubMedGoogle Scholar

  • Wutke, S., N. Benecke, E. Sandoval-Castellanos, H.-J. Dohle, S. Friederich, J. Gonzalez, J. H. Hallsson, M. Hofreiter, L. Lougas, O. Magnell, A. Morales-Muniz, L. Orlando, A. H. Palsdottir, M. Reissmann, M. Ruttkay, A. Trinks and A. Ludwig (2016): Spotted phenotypes in horses lost attractiveness in the Middle Ages,” Sci. Rep., 6, 38548.PubMedCrossrefGoogle Scholar

  • Zhou, D., N. Upda, M. Gersten, D. W. Visk, A. Bashir, J. Xue, K. A. Frazer, J. W. Posakony, S. Subramaniam, V. Bafna and G. G. Haddad (2011): Experimental selection of hypoxia-tolerant Drosophila melanogaster,” Proc. Natl. Acad. Sci., 108, 2349–2354.CrossrefGoogle Scholar

  • Živković, D., M. Steinrücken, Y. S. Song and W. Stephan (2015): “Transition densities and sample frequency spectra of diffusion processes with selection and variable population size,” Genetics, 200, 601–617.CrossrefPubMedGoogle Scholar

About the article

Published Online: 2018-06-06


Funding Source: Division of Mathematical Sciences

Award identifier / Grant number: 1209142

Award identifier / Grant number: 1407604

Award identifier / Grant number: 1106706

This work was supported by National Science Foundation grants DMS-1209142 and DMS-1407604 (R.H.) and Funder Id: 10.13039/100000121, DMS-1106706 (L.S.K.).


Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 17, Issue 3, 20170046, ISSN (Online) 1544-6115, DOI: https://doi.org/10.1515/sagmb-2017-0046.

Export Citation

©2018 Walter de Gruyter GmbH, Berlin/Boston.Get Permission

Comments (0)

Please log in or register to comment.
Log in