Jump to ContentJump to Main Navigation
Show Summary Details
In This Section

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Stumpf, Michael P.H.

6 Issues per year


IMPACT FACTOR 2016: 0.646
5-year IMPACT FACTOR: 1.191

CiteScore 2016: 0.94

SCImago Journal Rank (SJR) 2015: 0.954
Source Normalized Impact per Paper (SNIP) 2015: 0.554

Mathematical Citation Quotient (MCQ) 2015: 0.06

Online
ISSN
1544-6115
See all formats and pricing
In This Section

Sensitivity to prior specification in Bayesian genome-based prediction models

Christina Lehermeier
  • Plant Breeding, Technische Universität München, Emil-Ramann-Straße 4, 85354 Freising, Germany
/ Valentin Wimmer
  • Plant Breeding, Technische Universität München, Emil-Ramann-Straße 4, 85354 Freising, Germany
/ Theresa Albrecht
  • Plant Breeding, Technische Universität München, Emil-Ramann-Straße 4, 85354 Freising, Germany
  • Current address: Institute for Crop Production and Plant Breeding, Bavarian State Research Center for Agriculture, Am Gereuth 6, 85354 Freising, Germany
/ Hans-Jürgen Auinger
  • Plant Breeding, Technische Universität München, Emil-Ramann-Straße 4, 85354 Freising, Germany
/ Daniel Gianola
  • Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA; and Institute for Advanced Study, Technische Universität München, Lichtenbergstraße 2a, 85748 Garching, Germany
/ Volker J. Schmid
  • Department of Statistics, Ludwig-Maximilians-Universität München, Ludwigstraße 33, 80539 München, Germany
/ Chris-Carolin Schön
  • Corresponding author
  • Plant Breeding, Technische Universität München, Emil-Ramann-Straße 4, 85354 Freising, Germany
  • Email:
Published Online: 2013-04-24 | DOI: https://doi.org/10.1515/sagmb-2012-0042

Abstract

Different statistical models have been proposed for maximizing prediction accuracy in genome-based prediction of breeding values in plant and animal breeding. However, little is known about the sensitivity of these models with respect to prior and hyperparameter specification, because comparisons of prediction performance are mainly based on a single set of hyperparameters. In this study, we focused on Bayesian prediction methods using a standard linear regression model with marker covariates coding additive effects at a large number of marker loci. By comparing different hyperparameter settings, we investigated the sensitivity of four methods frequently used in genome-based prediction (Bayesian Ridge, Bayesian Lasso, BayesA and BayesB) to specification of the prior distribution of marker effects. We used datasets simulated according to a typical maize breeding program differing in the number of markers and the number of simulated quantitative trait loci affecting the trait. Furthermore, we used an experimental maize dataset, comprising 698 doubled haploid lines, each genotyped with 56110 single nucleotide polymorphism markers and phenotyped as testcrosses for the two quantitative traits grain dry matter yield and grain dry matter content. The predictive ability of the different models was assessed by five-fold cross-validation. The extent of Bayesian learning was quantified by calculation of the Hellinger distance between the prior and posterior densities of marker effects. Our results indicate that similar predictive abilities can be achieved with all methods, but with BayesA and BayesB hyperparameter settings had a stronger effect on prediction performance than with the other two methods. Prediction performance of BayesA and BayesB suffered substantially from a non-optimal choice of hyperparameters.

Keywords: genome-based prediction; genomic selection; Bayesian learning; shrinkage prior; plant breeding

References

  • Albrecht, T., V. Wimmer, H.-J. Auinger, M. Erbe, C. Knaak, M. Ouzunova, H. Simianer and C.-C. Schön (2011): “Genome-based prediction of testcross values in maize,” Theor. Appl. Genet., 123, 339–350. [Web of Science]

  • Atkinson, K. E. (1989): An introduction to numerical analysis, New York: Wiley.

  • Bernardo, J. M. and A. F. M. Smith (2002): Bayesian theory, Chichester: Wiley.

  • Browning, B. L. and S. R. Browning (2009): “A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals,” Am. J. Hum. Genet., 84, 210–223. [Web of Science]

  • Celeux, G., F. Forbes, C. P. Robert and D. M. Titterington (2006): “Deviance information criteria for missing data models,” Bayesian Anal., 1, 651–673.

  • Ching, A., K. S. Caldwell, M. Jung, M. Dolan, O. S. Smith, S. Tingey, M. Morgante and A. J. Rafalski (2002): “SNP frequency, haplotype structure and linkage disequilibrium in elite maize inbred lines,” BMC Genet., 3. DOI:10.1186/1471-2156-3-19. [Crossref]

  • Crossa, J., G. de los Campos, P. Pérez, D. Gianola, J. Burgueño, J. L. Araus, D. Makumbi, R. P. Singh, S. Dreisigacker, J. Yan, V. Arief, M. Banziger and H.-J. Braun (2010): “Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers,” Genetics, 186, 713–724. [Web of Science]

  • Cullis, B. R., A. B. Smith and N. E. Coombes (2006): “On the design of early generation variety trials with correlated data,” J. Agric. Biol. Environ. Stat., 11, 381–393. [Crossref]

  • Dekkers, J. C. M. (2007): “Marker-assisted selection for commercial crossbred performance,” J. Anim. Sci., 85, 2104–2114.

  • de los Campos, G. and P. Pérez (2012): BLR: Bayesian Linear Regression, URL http://CRAN.R-project.org/package=BLR, R package version 1.3. Accessed on November 30, 2012.

  • de los Campos, G., H. Naya, D. Gianola, J. Crossa, A. Legarra, E. Manfredi, K. Weigel and J. M. Cotes (2009): “Predicting quantitative traits with regression models for dense molecular markers and pedigree,” Genetics, 182, 375–385. [Web of Science]

  • de los Campos, G., J. M. Hickey, R. Pong-Wong, H. D. Daetwyler and M. P. L. Calus (2013): “Whole genome regression and prediction methods applied to plant and animal breeding,” Genetics, 193, 327–345. [Web of Science]

  • Falconer, D. S. and T. F. C. Mackay (1996): Introduction to Quantitative Genetics, Essex: Longman.

  • Ganal, M. W., G. Durstewitz, A. Polley, A. Bérard, E. S. Buckler, A. Charcosset, J. D. Clarke, E.-M. Graner, M. Hansen, J. Joets, M.-C. L. Paslier, M. D. McMullen, P. Montalent, M. Rose, C.-C. Schön, Q. Sun, H. Walter, O. C. Martin and M. Falque (2011): “A large maize (Zea mays L.) SNP genotyping array: Development and germplasm genotyping, and genetic mapping to compare with the B73 reference genome,” PLoS ONE, 6, 1–15.

  • Gelman, A., J. B. Carlin, H. S. Stern and D. B. Rubin (2004): Bayesian data analysis, London: Chapman and Hall. [PubMed]

  • Gianola, D., G. de los Campos, W. G. Hill, E. Manfredi and R. Fernando (2009): “Additive genetic variability and the Bayesian alphabet,” Genetics, 138, 347–363. [Web of Science]

  • Grubbs, F. E. (1950): “Sample criteria for testing outlying observations,” Ann. Math. Stat., 21, 27–58.

  • Habier, D., R. L. Fernando and J. C. M. Dekkers (2007): “The impact of genetic relationship information on genome-assisted breeding values,” Genetics, 177, 2389–2397. [Web of Science]

  • Habier, D., J. Tetens, F.-R. Seefried, P. Lichtner and G. Thaller (2010): “The impact of genetic relationship information on genomic breeding values in German Holstein cattle,” Genet. Sel. Evol., 42, 1–12. [Web of Science]

  • Habier, D., R. L. Fernando, K. Kizilkaya and D. J. Garrick (2011): “Extension of the Bayesian alphabet for genomic selection,” BMC Bioinformatics, 12, 1–24. [Web of Science]

  • Heffner, E. L., M. E. Sorrells and J.-L. Jannink (2009): “Genomic selection for crop improvement,” Crop Sci., 49, 1–12. [Web of Science] [Crossref]

  • Heslot, N., H.-P. Yang, M. E. Sorrells and J.-L. Jannink (2012): “Genomic selection in plant breeding: A comparison of models,” Crop. Sci., 52, 146–160.

  • Hill, W. G. and A. Robertson (1968): “Linkage disequilibrium in finite populations,” Theor. Appl. Genet., 38, 226–231.

  • Jannink, J.-L., A. J. Lorenz and H. Iwata (2010): “Genomic selection in plant breeding: from theory to practice,” Brief. Funct. Genomics, 9, 166–177. [Crossref] [Web of Science]

  • Kneib, T., S. Konrath and L. Fahrmeir (2011): “High dimensional structured additive regression models: Bayesian regularization, smoothing and predictive performance,” J. Roy. Stat. Soc. C-App., 60, 51–70. [Web of Science] [Crossref]

  • Le Cam, L. (1986): Asymptotic methods in statistical decision theory, New York: Springer-Verlag.

  • Meuwissen, T. H. E., B. J. Hayes and M. E. Goddard (2001): “Prediction of total genetic value using genome-wide dense marker maps,” Genetics, 157, 1819–1829.

  • Park, T. and G. Casella (2008): “The Bayesian Lasso,” J. Am. Stat. Assoc., 103, 681–686.

  • Pérez, P., G. de los Campos, J. Crossa and D. Gianola (2010): “Genomic-enabled prediction based on molecular markers and pedigree using the BLR package in R,” Plant Genome, 3, 106–116.

  • Piepho, H.-P. (2009): “Ridge regression and extensions for genomewide selection in maize,” Crop. Sci., 49, 1165–1176. [Web of Science]

  • Plummer, M., N. Best, K. Cowles and K. Vines (2006): “CODA: convergence diagnosis and output analysis for MCMC,” R News, 6, 7–11, URL http://CRAN.R-project.org/doc/Rnews/. Accessed on December 10, 2012.

  • Riedelsheimer, C., A. Czedik-Eysenberg, C. Grieder, J. Lisec, F. Technow, R. Sulpice, T. Altmann, M. Stitt, L. Willmitzer and A. E. Melchinger (2012): “Genomic and metabolic prediction of complex heterotic traits in hybrid maize,” Nat. Genet., 44, 217–220. [Web of Science]

  • Roos, M. and L. Held (2011): “Sensitivity analysis in Bayesian generalized linear mixed models for binary data,” Bayesian Anal., 6, 259–278.

  • R Development Core Team (2012): R: A language and environment for statistical computing, R foundation for statistical computing, Vienna, Austria, URL http://www.R-project.org/, ISBN 3-900051-07-0. Accessed on November 30, 2012.

  • Schaeffer, L. R. (2006): “Strategy for applying genome-wide selection in dairy cattle,” J. Anim. Breed. Genet., 123, 218–223.

  • Schön, C.-C., H. F. Utz, S. Groh, B. Truberg, S. Openshaw and A. E. Melchinger (2004): “Quantitative trait locus mapping based on resampling in a vast maize testcross experiment and its relevance to quantitative genetics for complex traits,” Genetics, 167, 485–498.

  • Silverman, B. W. (1986): Density estimation, London: Chapman and Hall.

  • Sorensen, D. and D. Gianola (2002): Likelihood, Bayesian, and MCMC methods in quantitative genetics, New York: Springer.

  • Spiegelhalter, D. J., N. G. Best, B. P. Carlin and A. van der Linde (2002): “Bayesian measures of model complexity and fit,” J. R. Stat. Soc. B, 64, 583–639.

  • Wimmer, V., T. Albrecht, H.-J. Auinger and C.-C. Schön (2012): “synbreed: a framework for the analysis of genomic prediction data using R,” Bioinformatics, 28, 2086–2087. [Crossref] [Web of Science] [PubMed]

About the article

Corresponding author: Chris-Carolin Schön, Plant Breeding, Technische Universität München, Emil-Ramann-Straße 4, 85354 Freising, Germany


Published Online: 2013-04-24

Published in Print: 2013-06-01



Citation Information: Statistical Applications in Genetics and Molecular Biology, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.1515/sagmb-2012-0042. Export Citation

Comments (0)

Please log in or register to comment.
Log in