Christina Lehermeier, Valentin Wimmer, Theresa Albrecht, Hans-Jürgen Auinger, Daniel Gianola, Volker J. Schmid, Chris-Carolin Schön
April 24, 2013
Different statistical models have been proposed for maximizing prediction accuracy in genome-based prediction of breeding values in plant and animal breeding. However, little is known about the sensitivity of these models with respect to prior and hyperparameter specification, because comparisons of prediction performance are mainly based on a single set of hyperparameters. In this study, we focused on Bayesian prediction methods using a standard linear regression model with marker covariates coding additive effects at a large number of marker loci. By comparing different hyperparameter settings, we investigated the sensitivity of four methods frequently used in genome-based prediction (Bayesian Ridge, Bayesian Lasso, BayesA and BayesB) to specification of the prior distribution of marker effects. We used datasets simulated according to a typical maize breeding program differing in the number of markers and the number of simulated quantitative trait loci affecting the trait. Furthermore, we used an experimental maize dataset, comprising 698 doubled haploid lines, each genotyped with 56110 single nucleotide polymorphism markers and phenotyped as testcrosses for the two quantitative traits grain dry matter yield and grain dry matter content. The predictive ability of the different models was assessed by five-fold cross-validation. The extent of Bayesian learning was quantified by calculation of the Hellinger distance between the prior and posterior densities of marker effects. Our results indicate that similar predictive abilities can be achieved with all methods, but with BayesA and BayesB hyperparameter settings had a stronger effect on prediction performance than with the other two methods. Prediction performance of BayesA and BayesB suffered substantially from a non-optimal choice of hyperparameters.