Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Journal of Official Statistics

The Journal of Statistics Sweden

4 Issues per year


IMPACT FACTOR 2016: 0.411
5-year IMPACT FACTOR: 0.776

CiteScore 2016: 0.63

SCImago Journal Rank (SJR) 2016: 0.710
Source Normalized Impact per Paper (SNIP) 2016: 0.975

Open Access
Online
ISSN
2001-7367
See all formats and pricing
More options …

Synthetic Multiple-Imputation Procedure for Multistage Complex Samples

Hanzhi Zhou / Michael R. Elliott
  • Dept. of Biostatistics, University of Michigan School of Public Health, 1415 Washington Heights, Ann Arbor, MI USA 48109; Survey Methodology Program, Institute for Social Research, University of Michigan, 426 Thompson St., Ann Arbor, MI 48109, USA.
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Trivellore E. Raghunathan
  • Dept. of Biostatistics, University of Michigan School of Public Health, 1415 Washington Heights, Ann Arbor, MI USA 48109; Survey Methodology Program, Institute for Social Research, University of Michigan, 426 Thompson St., Ann Arbor, MI 48109, USA.
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2016-03-10 | DOI: https://doi.org/10.1515/jos-2016-0011

Abstract

Multiple imputation (MI) is commonly used when item-level missing data are present. However, MI requires that survey design information be built into the imputation models. For multistage stratified clustered designs, this requires dummy variables to represent strata as well as primary sampling units (PSUs) nested within each stratum in the imputation model. Such a modeling strategy is not only operationally burdensome but also inferentially inefficient when there are many strata in the sample design. Complexity only increases when sampling weights need to be modeled. This article develops a generalpurpose analytic strategy for population inference from complex sample designs with item-level missingness. In a simulation study, the proposed procedures demonstrate efficient estimation and good coverage properties. We also consider an application to accommodate missing body mass index (BMI) data in the analysis of BMI percentiles using National Health and Nutrition Examination Survey (NHANES) III data. We argue that the proposed methods offer an easy-to-implement solution to problems that are not well-handled by current MI techniques. Note that, while the proposed method borrows from the MI framework to develop its inferential methods, it is not designed as an alternative strategy to release multiply imputed datasets for complex sample design data, but rather as an analytic strategy in and of itself.

Keywords: Finite population Bayesian bootstrap; Haldane prior; stratified sample; clustered sample; sample weights

References

  • Anderson, D. and M. Aitkin. 1985. “Variance Component Models With Binary Response: Interviewer Variability.” Journal of the Royal Statistical Society, Series B: Statistical Methodology 47: 203-210.Google Scholar

  • Cohen, M. P. 1997. “The Bayesian Bootstrap and Multiple Imputation for Unequal Probability Sample Designs.” In Proceedings of the Section on Survey Research Methods, American Statistical Association (ASA), Anaheim, CA, 1997, 635-638.Google Scholar

  • Dong, Q., M.R. Elliott, and T.E. Raghunathan. 2014. “A Nonparametric Method to Generate Synthetic Populations to Adjust for Complex Sample Design.” Survey Methodology 40: 29-46Google Scholar

  • Efron, B. 1979. “Bootstrap Methods: Another Look at the Jackknife.” Annals of Statistics 7: 1-26.CrossrefGoogle Scholar

  • Francisco, C.A. and W.A. Fuller. 1991. “Quantile Estimation With a Complex Survey Design.” Annals of Statististics 19: 454-469.Google Scholar

  • Kim, J.K., M.J. Brick, W.A. Fuller, and G. Kalton. 2006. “On the Bias of the Multiple- Imputation Variance Estimator in Survey Sampling.” Journal of the Royal Statistical Society, Series B: Statistical Methodology 68: 509-521. Doi: http://dx.doi.org/10.1111/j.1467-9868.2006.00546.x.CrossrefGoogle Scholar

  • King, G. and L. Zeng. 2001. “Logistic Regression in Rare Events Data.” Political Analysis 9: 137-163.CrossrefGoogle Scholar

  • Kovar, J.G., J.N.K. Rao, and C.F.J. Wu. 1988. “Bootstrap and Other Methods to Measure Errors in Survey Estimates.” Canadian Journal of Statistics 16: 25-45.Google Scholar

  • Little, R.J. and D.B. Rubin. 2002. Statistical Analysis with Missing Data, (2nd ed.). New York: Wiley and Sons, New York.Google Scholar

  • Little, R.J. and H. Zheng. 2007. “The Bayesian Approach to the Analysis of Finite Population Surveys.” Bayesian Statistics 8: 283-302.Google Scholar

  • Lo, A.Y. 1988. “A Bayesian Bootstrap for a Finite Population.” The Annals of Statistics 16: 1684-1695.CrossrefGoogle Scholar

  • McCarthy, P.J., and C.B. Snowden. 1985. “The Bootstrap and Finite Population Sampling. Vital and Health Statistics.” Data Evaluation and Methods Research, Series 2, No. 95. Public Health Service Publication 85-1369, U.S. Government Printing Office, Washington Meng, X.L. 1994. “Multiple Imputation Inferences With Uncongenial Sources of Input.” Statistical Science 9: 538-558. Doi: http://dx.doi.org/10.1214/ss/1177010269.CrossrefGoogle Scholar

  • National Center for Health Statistics. 1996. Analytic And Reporting Guidelines: The Third National Health and Nutrition Examination Survey, NHANES III (1988-94). National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville Maryland. Available at: http://www.cdc.gov/nchs/data/nhanes/nhanes3/nh3gui.pdf (accessed May 22, 2014)Google Scholar

  • Rao, J.N.K. and C.F.J. Wu. 1988. “Resampling Inference With Complex Survey Data.” Journal of the American Statistical Association 83: 231-241. Doi: http://dx.doi.org/10.2307/2288945.CrossrefGoogle Scholar

  • Rao, J.N.K.C.F., J. Wu, and K. Yue. 1992. “Some Recent Work on Resampling Methods for Complex Surveys.” Survey Methodology 18: 209-217.Google Scholar

  • Reiter, J.P., T.E. Raghunathan, and S.K. Kinney. 2006. “The Importance of Modeling the Sampling Design in Multiple Imputation for Missing Data.” Survey Methodology 32: 143-149.Google Scholar

  • Rubin, D.B. 1976. “Inference and Missing Data.” Biometrika 63: 581-592.CrossrefGoogle Scholar

  • Rubin, D.B. 1987. Multiple Imputation for Nonresponse in Surveys. New York: Wiley.Google Scholar

  • Rubin, D.B. 1996. “Multiple Imputation After 18þYears.” Journal of the American Statistical Association 91: 473-489. Doi: http://dx.doi.org/10.2307/2291635.CrossrefGoogle Scholar

  • Rust, K. and J.N.K. Rao. 1996. “Variance Estimation for Complex Estimators in Sample Surveys.” Statistics in Medical Research 5: 381-397.Google Scholar

  • Schafer, J.L. 1997. Analysis of Incomplete Multivariate Data. London: Chapman and Hall.Google Scholar

  • Schenker, N., T.E. Raghunathan, P. Chiu, D.M. Makuc, G. Zhang, and A.J. Cohen. 2006. “Multiple Imputation of Missing Income Data in the National Health Interview Survey.” Journal of the American Statistical Association 101: 924-933. Doi: http://dx.doi.org/10.1198/016214505000001375.CrossrefGoogle Scholar

  • Stiratelli, R., N. Laird, and J. Ware. 1984. “Random-Effects Models for Serial Observations With Binary Response.” Biometrics 40: 961-971. Doi: http://dx.doi.org/10.2307/2531147.CrossrefGoogle Scholar

  • Wei, Y., Y. Ma, and R.J. Carroll. 2012. “Multiple Imputation in Quantile Regression.” Biometrika 99: 423-438. Doi: http://dx.doi.org/10.1093/biomet/ass007.Web of ScienceCrossrefGoogle Scholar

  • Wolter, K.M. 2007. Introduction to Variance Estimation. New York: Springer.Google Scholar

  • Woodruff, R. 1952. “Confidence Interval for Medians and Other Position Measures.” Journal of the American Statistical Association 47: 635-646. Doi: http://dx.doi.org/10.1080/01621459.1952.10483443.CrossrefGoogle Scholar

  • Yang, S., J.K. Kim, and D.W. Shin. 2013. “Imputation Methods for Quantile Estimation under Missing at Random.” Statistics and Its Interface 6: 369-377.Google Scholar

  • Yuan, Y. and R.J. Little. 2007. “Parametric and Semiparametric Model-Based Estimates of the Finite Population Mean for Two-Stage Cluster Samples With Item Nonresponse.” Biometrics 63: 1172-1180. Doi: http://dx.doi.org/10.1111/j.1541-0420.2007.00816.x.Web of ScienceCrossrefGoogle Scholar

  • Zhao, E. and R.M. Yucel. 2009. “Performance of Sequential Imputation Method in Multilevel Applications.” In Proceedings of the Section on Survey Research Methods, American Statistical Association ASA, August, Washington D.C., 2800-2810.Google Scholar

  • Zhou, H. 2014. “Accounting for Complex Sample Designs in Multiple Imputation Using the Finite Population Bayesian Bootstrap.” Unpublished PhD Thesis Google Scholar

About the article

Received: 2014-06-01

Revised: 2015-04-01

Accepted: 2015-04-01

Published Online: 2016-03-10

Published in Print: 2016-03-01


Citation Information: Journal of Official Statistics, ISSN (Online) 2001-7367, DOI: https://doi.org/10.1515/jos-2016-0011.

Export Citation

© by Hanzhi Zhou. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. BY-NC-ND 4.0

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

[1]
Matthias Speidel, Jörg Drechsler, and Joseph W. Sakshaug
Behavior Research Methods, 2017

Comments (0)

Please log in or register to comment.
Log in