Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido

IMPACT FACTOR 2018: 0.536
5-year IMPACT FACTOR: 0.764

CiteScore 2018: 0.49

SCImago Journal Rank (SJR) 2018: 0.316
Source Normalized Impact per Paper (SNIP) 2018: 0.342

Mathematical Citation Quotient (MCQ) 2018: 0.02

See all formats and pricing
More options …
Volume 12, Issue 4


Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

Improving the efficiency of genomic selection

Marco Scutari / Ian Mackay / David Balding
Published Online: 2013-08-03 | DOI: https://doi.org/10.1515/sagmb-2013-0002


We investigate two approaches to increase the efficiency of phenotypic prediction from genome-wide markers, which is a key step for genomic selection (GS) in plant and animal breeding. The first approach is feature selection based on Markov blankets, which provide a theoretically-sound framework for identifying non-informative markers. Fitting GS models using only the informative markers results in simpler models, which may allow cost savings from reduced genotyping. We show that this is accompanied by no loss, and possibly a small gain, in predictive power for four GS models: partial least squares (PLS), ridge regression, LASSO and elastic net. The second approach is the choice of kinship coefficients for genomic best linear unbiased prediction (GBLUP). We compare kinships based on different combinations of centring and scaling of marker genotypes, and a newly proposed kinship measure that adjusts for linkage disequilibrium (LD). We illustrate the use of both approaches and examine their performances using three real-world data sets with continuous phenotypic traits from plant and animal genetics. We find that elastic net with feature selection and GBLUP using LD-adjusted kinships performed similarly well, and were the best-performing methods in our study.

Keywords: genome-wide prediction; genomic selection; feature selection; Markov blanket; linkage disequilibrium; kinship


  • Aliferis, C. F., A. Statnikov, I. Tsamardinos, S. Mani and X. D. Xenofon (2010): “Local causal and markov blanket induction for causal discovery and feature selection for classification part i: algorithms and empirical evaluation,” J. Mach. Learn. Res., 11, 171–234.Google Scholar

  • Astle, W. and D. J. Balding (2009): ”Population structure and cryptic relatedness in genetic association studies,” Stat. Sci., 24, 451–471.Google Scholar

  • Bravo, H. C., K. E. Leeb, B. E. K. Kleinb, R. Kleinb, S. K. Iyengarc and G. Wahbad (2009): “Examining the relative influence of familial, genetic, and environmental covariate information in flexible risk models,” PNAS, 106, 8128–8133.CrossrefGoogle Scholar

  • Cockram, J., J. White, D. L. Zuluaga, D. Smith, J. Comadran, M. Macaulay, Z. Luo, M. J. Kearsey, P. Werner, D. Harrap, C. Tapsell, H. Liu, P. E. Hedley, N. Stein, D. Schulte, B. Steuernagel, D. F. Marshall, W. T. Thomas, L. Ramsay, I. Mackay, D. J. Balding, The AGOUEB Consortium, R. Waugh and D. M. O’Sullivan (2010): “Genome-wide association mapping to candidate polymorphism resolution in the unsequenced barley genome,” PNAS, 107, 21611–21616.CrossrefGoogle Scholar

  • de los Campos, G., J. M. Hickey, R. Pong-Wong, H. D. Daetwyler and M. P. L. Calus (2012): “Whole-genome regression and prediction methods applied to plant and animal breeding,” Genetics, 193, 327–345.Google Scholar

  • Forni, S., I. Aguilar and I. Misztal (2011): “Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information,” Genet. Sel. Evol., 43, 1–7.Web of ScienceCrossrefGoogle Scholar

  • Friedman, J. H., T. Hastie and R. Tibshirani (2010): “Regularization paths for generalized linear models via coordinate descent,” J. Stat. Soft., 33, 1–22.Google Scholar

  • Gianola, D., G. de los Campos, W. G. Hill, E. Manfredi and R. Fernando (2009): “Additive genetic variability and the bayesian alphabet,” Genetics, 183, 347–363.Web of ScienceGoogle Scholar

  • Goeman, J. J. (2012): penalized R package, R package version 0.9-41.Google Scholar

  • Guan, Y. and M. Stephens (2011): “Bayesian variable selection regression for genome-wide association studies and other large-scale problems,” Ann. Appl. Stat., 5, 1780–1815.CrossrefGoogle Scholar

  • Habier, D., R. L. Fernando and J. C. M. Dekkers (2007): “The impact of genetic relationship information on genome-assisted breeding values,” Genetics, 177, 2389–2397.Web of ScienceGoogle Scholar

  • Hastie, T., R. Tibshirani, B. Narasimhan and G. Chu (2012): impute: Imputation for Microarray Data, R package version 1.30.0.Google Scholar

  • Hayes, B. J., P. J. Bowman, A. J. Chamberlain and M. E. Goddard (2009): “Genomic selection in dairy cattle: progress and challenges,” J. Dairy Sci., 92, 433–443.Web of ScienceCrossrefGoogle Scholar

  • Heffner, E. L., M. E. Sorrells and J.-L. Jannink (2009): “Genomic selection for crop improvement,” Crop Sci., 49, 1–12.CrossrefGoogle Scholar

  • Hoerl, A. E. and R. W. Kennard (1970): “Ridge regression: biased estimation for nonorthogonal problems,” Technometrics, 12, 55–67.CrossrefGoogle Scholar

  • Hooper, J. W. (1958): “The sampling variance of correlation coefficients under assumptions of fixed and mixed variates,” Biometrika, 45, 471–477.CrossrefGoogle Scholar

  • Hotelling, H. (1953): “New light on the correlation coefficient and its transforms,” J. Roy. Stat. Soc. B, 15, 193–232.Google Scholar

  • Koller, D. and M. Sahami (1996): “Toward optimal feature selection,” In: Proceedings of the 13th International Conference on Machine Learning (ICML), San Francisco, CA: Morgan Kaufmann, 284–292.Google Scholar

  • Legendre, P. (2000): “Comparison of permutation methods for the partial correlation and partial mantel tests,” J. S. Comput. Sim., 67, 37–73.Google Scholar

  • Li, Y., C. J. Willer, J. Ding, P. Scheet and G. R. Abecasis (2010): “MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes,” Genet. Epidemiol., 34, 816–834.Web of ScienceGoogle Scholar

  • Macciotta, N. P. P., G. Gaspa, R. Steri, C. Pieramati, P. Carnier and C. Dimauro (2009): “Pre-selection of most significant snps for the estimation of genomic breeding values,” BMC Proc., 3, 1–4.Google Scholar

  • Meuwissen, T. H. E., B. J. Hayes and M. E. Goddard (2001): “Prediction of total genetic value using genome-wide dense marker maps,” Genetics, 157, 1819–1829.Google Scholar

  • Mevik, B.-H., R. Wehrens and K. H. Liland (2011): pls: Partial Least Squares and Principal Component Regression, R package version 2.3-0.Google Scholar

  • Morris, A. P. and L. R. Cardon (2007): Whole Genome Association. In: D. J. Balding, M. Bishop, and C. Cannings. (Eds.), Handbook of Statistical Genetics, 3rd edition. Hoboken, NJ: Wiley.Google Scholar

  • Park, T. and G. Casella (2008): “The Bayesian Lasso,” J. Am. Stat. Assoc., 103, 681–686.CrossrefGoogle Scholar

  • Pearl, J. (1988): Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Francisco, CA: Morgan Kaufmann.Google Scholar

  • Piepho, H.-P. (2009): “Ridge regression and extensions for genomewide selection in maize,” Crop Sci., 49, 1165–1176.Web of ScienceCrossrefGoogle Scholar

  • Piepho, H.-P., J. O. Ogutu, T. Schulz-Streeck, B. Estaghvirou, A. Gordillo and F. Technow (2012): “Efficient computation of ridge-regression best linear unbiased prediction in genomic selection in plant breeding,” Crop Sci., 52, 1093–1104.Web of ScienceCrossrefGoogle Scholar

  • Purcell, S., B. Neale, K. Todd-Brown, L. Thomas, M. A. Ferreira, D. Bender, J. Mailer, P. Sklar, P. I. de Bakker, M. J. Daly and P. C. Sham (2007): “PLINK: a tool set for whole-genome association and population-based linkage analyses,” Am. J. Hum. Genet., 81, 559–575.CrossrefGoogle Scholar

  • Rostoks, N., L. Ramsay, K. MacKenzie, L. Cardle, P. R. Bhat, M. L. Roose, J. T. Svensson, N. Stein, R. K. Varshney, D. F. Marshall, A. Graner, T. J. Close and R. Waugh (2006): “Recent history of artificial outcrossing facilitates whole-genome association mapping in elite inbred crop varieties,” PNAS, 106, 18656–18661.CrossrefGoogle Scholar

  • Schulz-Streeck, T., J. Ogutu and H.-P. Piepho (2011): “Pre-selection of markers for genomic selection,” BMC Proc., 5, S12.Google Scholar

  • Scutari, M. (2010): “Learning Bayesian networks with the bnlearn R package,” J. Stat. Soft., 35, 1–22.Google Scholar

  • Scutari, M. and A. Brogini (2012): “Bayesian network structure learning with permutation tests,” Commun. Stat. Theory, 41, 3233–3243, special Issue “Statistics for Complex Problems: Permutation Testing Methods and Related Topics”. Proceedings of the Conference “Statistics for Complex Problems: the Multivariate Permutation Approach and Related Topics”, Padova, June 14–15, 2010.Google Scholar

  • Solberg, L. C., W. Valdar, D. Gauguier, G. Nunez, A. Taylor, S. Burnett, C. Arboledas-Hita, P. Hernandez-Pliego, S. Davidson, P. Burns, S. Bhattacharya, T. Hough, D. Higgs, P. K. W. O. Cookson, Y. Zhang, R. M. Deacon, J. N. Rawlins, R. Mott and J. Flint (2006): “A protocol for high-throughput phenotyping, suitable for quantitative trait analysis in mice,” Mamm. Genome, 17, 129–146.Google Scholar

  • Speed, D., G. Hermani, M. R. Johnson and D. J. Balding (2012): “Improved heritability estimation from genome-wide SNPs,” Am. J. Hum. Genet., 91, 1011–1021.Web of ScienceCrossrefGoogle Scholar

  • Speed, D., G. Hermani, M. R. Johnson, and D. J. Balding (2013): LDAK, http://dougspeed.com/ldak/.

  • Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso,” J. Roy. Stat. Soc. B, 58, 267–288.Google Scholar

  • Valdar, W., L. C. Solberg, D. Gauguier, S. Burnett, P. Klenerman, W. O. Cookson, M. S. Taylor, J. N. Rawlins, R. Mott and J. Flint (2006): “Genome-wide genetic association of complex traits in heterogeneous stock mice,” Nat. Genet., 8, 879–887.Google Scholar

  • VanRaden, P. (2008): “Efficient methods to compute genomic predictions,” J. Dairy Sci., 91, 4414–4423.Web of ScienceCrossrefGoogle Scholar

  • Vazquez, A. I., G. de los Campos, Y. C. Klimentidis, G. J. M. Rosa, D. Gianola, N. Yi and D. B. Allison (2012): “A comprehensive genetic approach for improving prediction of skin cancer risk in humans,” Genetics, 192, 1493–1502.Web of ScienceGoogle Scholar

  • Waugh, R., D. Marshall, B. Thomas, J. Comadran, J. Russell, T. Close, N. Stein, P. Hayes, G. Muehlbauer, J. Cockram, D. O’Sullivan, I. Mackay, A. Flavell, AGOUEB, BarleyCAP and L. Ramsay (2010): “Whole-genome association mapping in elite inbred crop varieties,” Genome, 53, 967–972.PubMedGoogle Scholar

  • Wimmer, V., T. Albrecht, H.-J. Auinger and C.-C. Schön (2012): “synbreed: framework for the analysis of genomic prediction data using R,” Bioinformatics, 18, 2086–2087.CrossrefWeb of ScienceGoogle Scholar

  • Zhao, K., C. Tung, G. C. Eizenga, M. H. Wright, M. L. Ali, A. H. Price, G. J. Norton, M. R. Islam, A. Reynolds, J. Mezey, A. M. McClung, C. D. Bustamante and S. R. McCouch (2011): “Genome-wide association mapping reveals a rich genetic architecture of complex traits in oryza sativa,” Nat. Commun., 2, 467.PubMedGoogle Scholar

  • Zou, H. and T. Hastie (2005): “Regularization and variable selection via the elastic net,” J. Roy. Stat. Soc. B, 67, 301–320.CrossrefGoogle Scholar

About the article

Corresponding author: Marco Scutari, Genetics Institute, University College London (UCL), London WCIE 6BT, UK

Published Online: 2013-08-03

Published in Print: 2013-08-01

Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 12, Issue 4, Pages 517–527, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.1515/sagmb-2013-0002.

Export Citation

©2013 by Walter de Gruyter Berlin Boston.Get Permission

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Colleen H. Hunt, Fred A. van Eeuwijk, Emma S. Mace, Ben J. Hayes, and David R. Jordan
Crop Science, 2018, Volume 58, Number 2, Page 690
Ian Mackay, Eric Ober, and John Hickey
Food and Energy Security, 2015, Volume 4, Number 1, Page 25
Marco Scutari, Phil Howell, David J. Balding, and Ian Mackay
Genetics, 2014, Volume 198, Number 1, Page 129
Katrin Töpner, Guilherme J. M. Rosa, Daniel Gianola, and Chris-Carolin Schön
G3: Genes|Genomes|Genetics, 2017, Volume 7, Number 8, Page 2779
Nora M Bello, Vera C Ferreira, Daniel Gianola, and Guilherme J M Rosa
Journal of Animal Science, 2018
Nataliya G. Batina, Christopher J. Crnich, and Dörte Döpfer
BMC Infectious Diseases, 2017, Volume 17, Number 1
Carlos Alberto Martínez, Kshitij Khare, Syed Rahman, and Mauricio A. Elzo
Journal of Theoretical Biology, 2017
Elva Cha, Mike Sanderson, David Renter, Abigail Jager, Natalia Cernicchiaro, and Nora M. Bello
Preventive Veterinary Medicine, 2017
Jane Ward, Mariann Rakszegi, Zoltán Bedő, Peter R Shewry, and Ian Mackay
BMC Genetics, 2015, Volume 16, Number 1, Page 19
C.A. Martínez, K. Khare, S. Rahman, and M.A. Elzo
Journal of Animal Breeding and Genetics, 2017
Marco Scutari, Ian Mackay, David Balding, and John Micheal Hickey
PLOS Genetics, 2016, Volume 12, Number 9, Page e1006288

Comments (0)

Please log in or register to comment.
Log in