# Understanding and teaching unequal probability of selection1)

• Humberto Barreto and Manu Raghav

## Abstract

This paper focuses on econometrics pedagogy. It demonstrates the importance of including probability weights in regression analysis using data from surveys that do not use simple random samples (SRS). We use concrete, numerical examples and simulation to show how to effectively teach this difficult material to a student audience. We relax the assumption of simple random sampling and show how unequal probability of selection can lead to biased, inconsistent OLS slope estimates. We then explain and apply probability weighted least squares, showing how weighting the observations by the reciprocal of the probability of inclusion in the sample improves performance. The exposition is non-mathematical and relies heavily on intuitive, visual displays to make the content accessible to students. This paper will enable professors to incorporate unequal probability of selection into their courses and allow students to use best practice techniques in analyzing data from complex surveys. The primary delivery vehicle is Microsoft Excel®. Two user-defined array functions, SAMPLE and LINESTW, are included in a prepared Excel workbook. We replicate all results in Stata® and offer a do-file for easy analysis in Stata. Documented code in Excel and Stata allows users to see each step in the sampling and probability weighted least squares algorithms. All files and code are available at www.depauw.edu/learn/stata.

Corresponding author: Manu Raghav, DePauw University, 7 E. Larabee St, Harrison Hall 202, Greencastle, IN 46135, USA Phone: +1-765-225-9591, Fax: +1-765-658-1044

1. 1)

We thank the participants at the 2011 AEA Teaching Conference, especially our discussant Roisin O’ Sullivan, Frank M. Howland, Atsushi Inoue, and an anonymous referee for helpful criticisms and suggestions.

## References

Barreto, H., and F. Howland. 2006. Introductory Econometrics: Using Monte Carlo Simulation with Microsoft Excel. New York: Cambridge University Press.10.1017/CBO9780511809231Search in Google Scholar

Cameron, A., and P. Trivedi. 2005. Microeconometrics: Methods and Applications. New York: Cambridge University Press.10.1017/CBO9780511811241Search in Google Scholar

Carrington, W., J. Eltinge, and K. McCue. 2000. An economist’s primer on survey samples. Center for Economic Studies, US Department of Commerce, Bureau of the Census.Search in Google Scholar

Deaton, A. 1997. The analysis of household surveys: a microeconomic approach to development policy. Baltimore: Johns Hopkins University Press.10.1596/0-8018-5254-4Search in Google Scholar

Lohr, S. 2009. Sampling: design and analysis. Boston: Thomson.Search in Google Scholar

Lumley, T. 2010. Complex Surveys: A Guide to Analysis Using R. Hoboken, NJ: Wiley.10.1002/9780470580066Search in Google Scholar

Shah, B., M. Holt, and R. Folsom. 1977. “Inference about regression models from sample survey data.” Bulletin of the International Statistical Institute 47(3): 43–57.Search in Google Scholar

Stata Corporation. 2011. Survey Data Reference Manual, Release 12. Stata Press.Search in Google Scholar

Wolter, K. 2003. Introduction to Variance Estimation. New York: Springer.Search in Google Scholar

Wooldridge, J. 2008. “Stratified and cluster sampling.” The New Palgrave Dictionary of Economics,2nd ed. London: Palgrave Macmillan.10.1057/978-1-349-95121-5_2639-1Search in Google Scholar

Wooldridge, J. 2003. “Cluster-Sample Methods in Applied Econometrics.” The American Economic Review 93(2): 133–138.10.1257/000282803321946930Search in Google Scholar

Published Online: 2013-03-16
Published in Print: 2013-07-01

©2013 by Walter de Gruyter Berlin Boston