Accessible Requires Authentication Published by De Gruyter April 24, 2013

Sensitivity to prior specification in Bayesian genome-based prediction models

Christina Lehermeier, Valentin Wimmer, Theresa Albrecht, Hans-Jürgen Auinger, Daniel Gianola, Volker J. Schmid and Chris-Carolin Schön

Abstract

Different statistical models have been proposed for maximizing prediction accuracy in genome-based prediction of breeding values in plant and animal breeding. However, little is known about the sensitivity of these models with respect to prior and hyperparameter specification, because comparisons of prediction performance are mainly based on a single set of hyperparameters. In this study, we focused on Bayesian prediction methods using a standard linear regression model with marker covariates coding additive effects at a large number of marker loci. By comparing different hyperparameter settings, we investigated the sensitivity of four methods frequently used in genome-based prediction (Bayesian Ridge, Bayesian Lasso, BayesA and BayesB) to specification of the prior distribution of marker effects. We used datasets simulated according to a typical maize breeding program differing in the number of markers and the number of simulated quantitative trait loci affecting the trait. Furthermore, we used an experimental maize dataset, comprising 698 doubled haploid lines, each genotyped with 56110 single nucleotide polymorphism markers and phenotyped as testcrosses for the two quantitative traits grain dry matter yield and grain dry matter content. The predictive ability of the different models was assessed by five-fold cross-validation. The extent of Bayesian learning was quantified by calculation of the Hellinger distance between the prior and posterior densities of marker effects. Our results indicate that similar predictive abilities can be achieved with all methods, but with BayesA and BayesB hyperparameter settings had a stronger effect on prediction performance than with the other two methods. Prediction performance of BayesA and BayesB suffered substantially from a non-optimal choice of hyperparameters.


Corresponding author: Chris-Carolin Schön, Plant Breeding, Technische Universität München, Emil-Ramann-Straße 4, 85354 Freising, Germany

Author notes: This research was funded by the German Federal Ministry of Education and Research (BMBF) within the AgroClustEr “Synbreed – Synergistic plant and animal breeding” (FKZ: 0315528A). This research was carried out with the support of the Technische Universität München – Institute for Advanced Study, funded by the German Excellence Initiative. We gratefully acknowledge the KWS SAAT AG for providing the experimental data.

References

Albrecht, T., V. Wimmer, H.-J. Auinger, M. Erbe, C. Knaak, M. Ouzunova, H. Simianer and C.-C. Schön (2011): “Genome-based prediction of testcross values in maize,” Theor. Appl. Genet., 123, 339–350. Search in Google Scholar

Atkinson, K. E. (1989): An introduction to numerical analysis, New York: Wiley. Search in Google Scholar

Bernardo, J. M. and A. F. M. Smith (2002): Bayesian theory, Chichester: Wiley. Search in Google Scholar

Browning, B. L. and S. R. Browning (2009): “A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals,” Am. J. Hum. Genet., 84, 210–223. Search in Google Scholar

Celeux, G., F. Forbes, C. P. Robert and D. M. Titterington (2006): “Deviance information criteria for missing data models,” Bayesian Anal., 1, 651–673. Search in Google Scholar

Ching, A., K. S. Caldwell, M. Jung, M. Dolan, O. S. Smith, S. Tingey, M. Morgante and A. J. Rafalski (2002): “SNP frequency, haplotype structure and linkage disequilibrium in elite maize inbred lines,” BMC Genet., 3. DOI:10.1186/1471-2156-3-19. Search in Google Scholar

Crossa, J., G. de los Campos, P. Pérez, D. Gianola, J. Burgueño, J. L. Araus, D. Makumbi, R. P. Singh, S. Dreisigacker, J. Yan, V. Arief, M. Banziger and H.-J. Braun (2010): “Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers,” Genetics, 186, 713–724. Search in Google Scholar

Cullis, B. R., A. B. Smith and N. E. Coombes (2006): “On the design of early generation variety trials with correlated data,” J. Agric. Biol. Environ. Stat., 11, 381–393. Search in Google Scholar

Dekkers, J. C. M. (2007): “Marker-assisted selection for commercial crossbred performance,” J. Anim. Sci., 85, 2104–2114. Search in Google Scholar

de los Campos, G. and P. Pérez (2012): BLR: Bayesian Linear Regression, URL http://CRAN.R-project.org/package=BLR, R package version 1.3. Accessed on November 30, 2012. Search in Google Scholar

de los Campos, G., H. Naya, D. Gianola, J. Crossa, A. Legarra, E. Manfredi, K. Weigel and J. M. Cotes (2009): “Predicting quantitative traits with regression models for dense molecular markers and pedigree,” Genetics, 182, 375–385. Search in Google Scholar

de los Campos, G., J. M. Hickey, R. Pong-Wong, H. D. Daetwyler and M. P. L. Calus (2013): “Whole genome regression and prediction methods applied to plant and animal breeding,” Genetics, 193, 327–345. Search in Google Scholar

Falconer, D. S. and T. F. C. Mackay (1996): Introduction to Quantitative Genetics, Essex: Longman. Search in Google Scholar

Ganal, M. W., G. Durstewitz, A. Polley, A. Bérard, E. S. Buckler, A. Charcosset, J. D. Clarke, E.-M. Graner, M. Hansen, J. Joets, M.-C. L. Paslier, M. D. McMullen, P. Montalent, M. Rose, C.-C. Schön, Q. Sun, H. Walter, O. C. Martin and M. Falque (2011): “A large maize (Zea mays L.) SNP genotyping array: Development and germplasm genotyping, and genetic mapping to compare with the B73 reference genome,” PLoS ONE, 6, 1–15. Search in Google Scholar

Gelman, A., J. B. Carlin, H. S. Stern and D. B. Rubin (2004): Bayesian data analysis, London: Chapman and Hall. Search in Google Scholar

Gianola, D., G. de los Campos, W. G. Hill, E. Manfredi and R. Fernando (2009): “Additive genetic variability and the Bayesian alphabet,” Genetics, 138, 347–363. Search in Google Scholar

Grubbs, F. E. (1950): “Sample criteria for testing outlying observations,” Ann. Math. Stat., 21, 27–58. Search in Google Scholar

Habier, D., R. L. Fernando and J. C. M. Dekkers (2007): “The impact of genetic relationship information on genome-assisted breeding values,” Genetics, 177, 2389–2397. Search in Google Scholar

Habier, D., J. Tetens, F.-R. Seefried, P. Lichtner and G. Thaller (2010): “The impact of genetic relationship information on genomic breeding values in German Holstein cattle,” Genet. Sel. Evol., 42, 1–12. Search in Google Scholar

Habier, D., R. L. Fernando, K. Kizilkaya and D. J. Garrick (2011): “Extension of the Bayesian alphabet for genomic selection,” BMC Bioinformatics, 12, 1–24. Search in Google Scholar

Heffner, E. L., M. E. Sorrells and J.-L. Jannink (2009): “Genomic selection for crop improvement,” Crop Sci., 49, 1–12. Search in Google Scholar

Heslot, N., H.-P. Yang, M. E. Sorrells and J.-L. Jannink (2012): “Genomic selection in plant breeding: A comparison of models,” Crop. Sci., 52, 146–160. Search in Google Scholar

Hill, W. G. and A. Robertson (1968): “Linkage disequilibrium in finite populations,” Theor. Appl. Genet., 38, 226–231. Search in Google Scholar

Jannink, J.-L., A. J. Lorenz and H. Iwata (2010): “Genomic selection in plant breeding: from theory to practice,” Brief. Funct. Genomics, 9, 166–177. Search in Google Scholar

Kneib, T., S. Konrath and L. Fahrmeir (2011): “High dimensional structured additive regression models: Bayesian regularization, smoothing and predictive performance,” J. Roy. Stat. Soc. C-App., 60, 51–70. Search in Google Scholar

Le Cam, L. (1986): Asymptotic methods in statistical decision theory, New York: Springer-Verlag. Search in Google Scholar

Meuwissen, T. H. E., B. J. Hayes and M. E. Goddard (2001): “Prediction of total genetic value using genome-wide dense marker maps,” Genetics, 157, 1819–1829. Search in Google Scholar

Park, T. and G. Casella (2008): “The Bayesian Lasso,” J. Am. Stat. Assoc., 103, 681–686. Search in Google Scholar

Pérez, P., G. de los Campos, J. Crossa and D. Gianola (2010): “Genomic-enabled prediction based on molecular markers and pedigree using the BLR package in R,” Plant Genome, 3, 106–116. Search in Google Scholar

Piepho, H.-P. (2009): “Ridge regression and extensions for genomewide selection in maize,” Crop. Sci., 49, 1165–1176. Search in Google Scholar

Plummer, M., N. Best, K. Cowles and K. Vines (2006): “CODA: convergence diagnosis and output analysis for MCMC,” R News, 6, 7–11, URL http://CRAN.R-project.org/doc/Rnews/. Accessed on December 10, 2012. Search in Google Scholar

Riedelsheimer, C., A. Czedik-Eysenberg, C. Grieder, J. Lisec, F. Technow, R. Sulpice, T. Altmann, M. Stitt, L. Willmitzer and A. E. Melchinger (2012): “Genomic and metabolic prediction of complex heterotic traits in hybrid maize,” Nat. Genet., 44, 217–220. Search in Google Scholar

Roos, M. and L. Held (2011): “Sensitivity analysis in Bayesian generalized linear mixed models for binary data,” Bayesian Anal., 6, 259–278. Search in Google Scholar

R Development Core Team (2012): R: A language and environment for statistical computing, R foundation for statistical computing, Vienna, Austria, URL http://www.R-project.org/, ISBN 3-900051-07-0. Accessed on November 30, 2012. Search in Google Scholar

Schaeffer, L. R. (2006): “Strategy for applying genome-wide selection in dairy cattle,” J. Anim. Breed. Genet., 123, 218–223. Search in Google Scholar

Schön, C.-C., H. F. Utz, S. Groh, B. Truberg, S. Openshaw and A. E. Melchinger (2004): “Quantitative trait locus mapping based on resampling in a vast maize testcross experiment and its relevance to quantitative genetics for complex traits,” Genetics, 167, 485–498. Search in Google Scholar

Silverman, B. W. (1986): Density estimation, London: Chapman and Hall. Search in Google Scholar

Sorensen, D. and D. Gianola (2002): Likelihood, Bayesian, and MCMC methods in quantitative genetics, New York: Springer. Search in Google Scholar

Spiegelhalter, D. J., N. G. Best, B. P. Carlin and A. van der Linde (2002): “Bayesian measures of model complexity and fit,” J. R. Stat. Soc. B, 64, 583–639. Search in Google Scholar

Wimmer, V., T. Albrecht, H.-J. Auinger and C.-C. Schön (2012): “synbreed: a framework for the analysis of genomic prediction data using R,” Bioinformatics, 28, 2086–2087. Search in Google Scholar

Published Online: 2013-04-24
Published in Print: 2013-06-01

©2013 by Walter de Gruyter Berlin Boston