This paper focuses on econometrics pedagogy. It demonstrates the importance of including probability weights in regression analysis using data from surveys that do not use simple random samples (SRS). We use concrete, numerical examples and simulation to show how to effectively teach this difficult material to a student audience. We relax the assumption of simple random sampling and show how unequal probability of selection can lead to biased, inconsistent OLS slope estimates. We then explain and apply probability weighted least squares, showing how weighting the observations by the reciprocal of the probability of inclusion in the sample improves performance. The exposition is non-mathematical and relies heavily on intuitive, visual displays to make the content accessible to students. This paper will enable professors to incorporate unequal probability of selection into their courses and allow students to use best practice techniques in analyzing data from complex surveys. The primary delivery vehicle is Microsoft Excel®. Two user-defined array functions, SAMPLE and LINESTW, are included in a prepared Excel workbook. We replicate all results in Stata® and offer a do-file for easy analysis in Stata. Documented code in Excel and Stata allows users to see each step in the sampling and probability weighted least squares algorithms. All files and code are available at www.depauw.edu/learn/stata.
We thank the participants at the 2011 AEA Teaching Conference, especially our discussant Roisin O’ Sullivan, Frank M. Howland, Atsushi Inoue, and an anonymous referee for helpful criticisms and suggestions.
Barreto, H., and F. Howland. 2006. Introductory Econometrics: Using Monte Carlo Simulation with Microsoft Excel. New York: Cambridge University Press.10.1017/CBO9780511809231Search in Google Scholar
Cameron, A., and P. Trivedi. 2005. Microeconometrics: Methods and Applications. New York: Cambridge University Press.10.1017/CBO9780511811241Search in Google Scholar
Carrington, W., J. Eltinge, and K. McCue. 2000. An economist’s primer on survey samples. Center for Economic Studies, US Department of Commerce, Bureau of the Census.Search in Google Scholar
Deaton, A. 1997. The analysis of household surveys: a microeconomic approach to development policy. Baltimore: Johns Hopkins University Press.10.1596/0-8018-5254-4Search in Google Scholar
Lohr, S. 2009. Sampling: design and analysis. Boston: Thomson.Search in Google Scholar
Lumley, T. 2010. Complex Surveys: A Guide to Analysis Using R. Hoboken, NJ: Wiley.10.1002/9780470580066Search in Google Scholar
Shah, B., M. Holt, and R. Folsom. 1977. “Inference about regression models from sample survey data.” Bulletin of the International Statistical Institute 47(3): 43–57.Search in Google Scholar
Stata Corporation. 2011. Survey Data Reference Manual, Release 12. Stata Press.Search in Google Scholar
Wolter, K. 2003. Introduction to Variance Estimation. New York: Springer.Search in Google Scholar
Wooldridge, J. 2008. “Stratified and cluster sampling.” The New Palgrave Dictionary of Economics,2nd ed. London: Palgrave Macmillan.10.1057/978-1-349-95121-5_2639-1Search in Google Scholar
Wooldridge, J. 2003. “Cluster-Sample Methods in Applied Econometrics.” The American Economic Review 93(2): 133–138.10.1257/000282803321946930Search in Google Scholar
©2013 by Walter de Gruyter Berlin Boston