Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Biometrical Letters

The Journal of Polish Biometric Society

2 Issues per year

Open Access
See all formats and pricing
More options …

An alternative methodology for imputing missing data in trials with genotype-by-environment interaction: some new aspects

Sergio Arciniegas-Alarcón
  • Departamento de Ciências Exatas, Universidade de São Paulo/ESALQ, Cx.P.09, CEP.13418-900, Piracicaba, SP - Brasil
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Marisol García-Peña
  • Departamento de Ciências Exatas, Universidade de São Paulo/ESALQ, Cx.P.09, CEP.13418-900, Piracicaba, SP - Brasil
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Wojtek Janusz Krzanowski
  • College of Engineering, Mathematics and Physical Sciences, Harrison Building, University of Exeter, North Park Road, Exeter, EX4 4QF, United Kingdom
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Carlos Tadeu dos Santos Dias
  • Departamento de Ciências Exatas, Universidade de São Paulo/ESALQ, Cx.P.09, CEP.13418-900, Piracicaba, SP - Brasil
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2014-12-20 | DOI: https://doi.org/10.2478/bile-2014-0006


A common problem in multi-environment trials arises when some genotypeby- environment combinations are missing. In Arciniegas-Alarcón et al. (2010) we outlined a method of data imputation to estimate the missing values, the computational algorithm for which was a mixture of regression and lower-rank approximation of a matrix based on its singular value decomposition (SVD). In the present paper we provide two extensions to this methodology, by including weights chosen by cross-validation and allowing multiple as well as simple imputation. The three methods are assessed and compared in a simulation study, using a complete set of real data in which values are deleted randomly at different rates. The quality of the imputations is evaluated using three measures: the Procrustes statistic, the squared correlation between matrices and the normalised root mean squared error between these estimates and the true observed values. None of the methods makes any distributional or structural assumptions, and all of them can be used for any pattern or mechanism of the missing values.

Keywords: cross-validation; singular value decomposition; imputation; genotype-by-environment interaction; weights; missing values


  • Arciniegas-Alarcón S., García-Peña M., Dias C.T.S. (2011): Data imputation in trials with genotype×environment interaction. Interciencia 36(6): 444-449.Google Scholar

  • Arciniegas-Alarcón S., García-Peña M., Dias C.T.S., Krzanowski W.J. (2010): An alternative methodology for imputing missing data in trials with genotypeby- environment interaction. Biometrical Letters 47(1): 1-14.Google Scholar

  • Bergamo G.C., Dias C.T.S., Krzanowski W.J. (2008): Distribution-free multiple imputation in an interaction matrix through singular value decomposition. Scientia Agricola 65(4): 422-427.Web of ScienceGoogle Scholar

  • Calinski T., Czajka S., Kaczmarek Z., Krajewski P., Pilarczyk W. (2009): Analyzing the Genotype-by-Environment Interactions Under a Randomization- Derived Mixed Model. Journal of Agricultural, Biological and Environmental Statistics 14(2): 224-241.Web of ScienceCrossrefGoogle Scholar

  • Ching W., Li L., Tsing N., Tai C., Ng T. (2010): A weighted local least squares imputation method for missing value estimation in microarray gene expression data. International Journal of Data Mining and Bioinformatics 4(3): 331-347.Google Scholar

  • Denis J.B., Baril C.P. (1992): Sophisticated models with numerous missing values: the multiplicative interaction model as an example. Biuletyn Oceny Odmian 24-25: 33-45.Google Scholar

  • Di Ciaccio A. (2011): Bootstrap and nonparametric predictors to impute missing data. In: B. Fichet et al. (eds.), Classification and Multivariate Analysis for Complex Data Structures, Studies in Classification, Data Analysis, and Knowledge Organization. Springer-Verlag Berlin Heidelberg.Google Scholar

  • Dias C.T.S., Krzanowski W.J. (2003): Model selection and cross validation in additive main effect and multiplicative interaction models. Crop Science 43: 865-873.CrossrefGoogle Scholar

  • Gabriel K.R. (2002): Le biplot - outil d’exploration de données multidimensionelles. Journal de la Société Française de Statistique 143(3-4): 5-55.Google Scholar

  • García-Peña M., Dias C.T.S. (2009): Analysis of bivariate additive models with multiplicative interaction (AMMI). Biometric Brazilian Journal 27(4): 586-602.Google Scholar

  • Gauch H.G. (2013): A simple protocol for AMMI analysis of yield trials. Crop Science 53: 1860-1869.CrossrefWeb of ScienceGoogle Scholar

  • Gauch H.G., Zobel R.W. (1990): Imputing missing yield trial data. Theoretical and Applied Genetics 79: 753-761.Google Scholar

  • Josse J., Pagès J., Husson F. (2011): Multiple imputation in PCA. Advances in data analysis and classification 5(3): 231-246.Google Scholar

  • Josse J., Husson F. (2012): Handling missing values in exploratory multivariate data analysis methods. Journal de la Société Française de Statistique 153(2): 79-99.Google Scholar

  • Krzanowski W.J. (1988): Missing value imputation in multivariate data using the singular value decomposition of a matrix. Biometrical Letters XXV(1-2): 31-39.Google Scholar

  • Krzanowski W.J. (2000): Principles of multivariate analysis: A user’s perspective. Oxford: University Press.Google Scholar

  • Kroonenberg P.M. (2008): Applied multiway data analysis. John Wiley & Sons.Google Scholar

  • Kumar A., Verulkar S.B., Mandal N.P., Variar M., Shukla V.D., Dwivedi J.L., Singh B.N., Singh O.N., Swain P., Mall A.K., Robin S., Chandrababu R., Jain A., Haefele S.M., Piepho H.P., Raman A. (2012): High-yielding, droughttolerant, stable rice genotypes for the shallow rainfed lowland droughtprone ecosystem. Field Crops Research 133: 37-47.Web of ScienceGoogle Scholar

  • Little R., Rubin D. (2002): Statistical analysis with missing data. 2nd ed. John Wiley & Sons, New York, NY. Google Scholar

  • Paderewski J., Rodrigues P.C. (2014): The usefulness of EM-AMMI to study the influence of missing data pattern and application to Polish post-registration winter wheat data. Australian Journal of Crop Science 8: 640-645.Google Scholar

  • Piepho H.P. (1995): Methods for estimating missing genotype-location combinations in multilocation trials - an empirical comparison. Informatik Biometrie und Epidemiologie in Medizin und Biologie 26: 335-349.Google Scholar

  • Piepho H.P., Möhring J. (2006): Selection in cultivar trials - Is it ignorable? Crop Science 46: 192-201.CrossrefGoogle Scholar

  • R Development Core Team (2013): R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org/ Google Scholar

  • Rodrigues P., Pereira D.G.S., Mexia J.T. (2011): A comparison between joint regression analysis and the additive main and multiplicative interaction model: the robustness with increasing amounts of missing data. Scientia Agricola 68(6): 679-686.CrossrefGoogle Scholar

  • Rubin D.B. (1978): Multiple imputation in sample surveys: a phenomenological Bayesian approach to nonresponse. In: Survey Research Methods Section Of The American Statistical Association. Proceedings: 20-34.Google Scholar

  • Sabaghnia N., Karimizadeh R., Mohammadi M. (2012): Model selection in additive main effect and multiplicative interaction model in durum wheat. Genetika 44(2): 325-339.CrossrefWeb of ScienceGoogle Scholar

  • Schafer J.L., Graham J.W. (2002): Missing data: our view of the state of the art. Psychological Methods 7(2): 147-177.CrossrefPubMedGoogle Scholar

  • van Buuren S. (2012): Flexible imputation of missing data. CRC press.Google Scholar

  • Wright K. (2012): agridat: Agricultural datasets. R package version 1.4. http://CRAN.R-project.org/package=agridat>Google Scholar

  • Yan W., Pageau D., Frégeau-Reid J., Durand J. (2011): Assessing the representativeness and repeatability of test locations for genotype evaluation. Crop Science 51: 1603-1610.CrossrefWeb of ScienceGoogle Scholar

  • Yan W. (2013): Biplot analysis of incomplete two-way data. Crop Science 53(1): 48-57. Web of ScienceCrossrefGoogle Scholar

About the article

Published Online: 2014-12-20

Published in Print: 2014-12-01

Citation Information: Biometrical Letters, Volume 51, Issue 2, Pages 75–88, ISSN (Online) 1896-3811, DOI: https://doi.org/10.2478/bile-2014-0006.

Export Citation

© by Sergio Arciniegas-Alarcón. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. BY-NC-ND 3.0

Comments (0)

Please log in or register to comment.
Log in