Jump to ContentJump to Main Navigation
Show Summary Details

Journal of Official Statistics

The Journal of Statistics Sweden

4 Issues per year

IMPACT FACTOR 2015: 0.467
5-year IMPACT FACTOR: 0.740

SCImago Journal Rank (SJR) 2015: 0.410
Source Normalized Impact per Paper (SNIP) 2015: 0.810
Impact per Publication (IPP) 2015: 0.540

Open Access
See all formats and pricing

Small Area Model-Based Estimators Using Big Data Sources

Stefano Marchetti
  • Department of Economics and Management – University of Pisa, Via Ridolfi 10, 56124 Pisa, Italy
  • Email:
/ Caterina Giusti
  • Department of Economics and Management - University of Pisa, Via Ridolfi 10, 56124 Pisa, Italy.
  • Email:
/ Monica Pratesi
  • Department of Economics and Management - University of Pisa, Via Ridolfi 10, 56124 Pisa, Italy.
  • Email:
/ Nicola Salvati
  • Department of Economics and Management - University of Pisa, Via Ridolfi 10, 56124 Pisa, Italy
  • Email:
/ Fosca Giannotti
  • KDD Lab – ISTI – National Research Council, Via G. Moruzzi 1, 56124 Pisa, Italy.
  • Email:
/ Dino Pedreschi
  • Department of Computer Science – University of Pisa, Largo B. Pontecorvo 3, 56127 Pisa, Italy
  • Email:
/ Salvatore Rinzivillo
  • KDD Lab - ISTI - National Research Council, Via G. Moruzzi 1, 56124 Pisa, Italy
  • Email:
/ Luca Pappalardo
  • KDD Lab - ISTI - National Research Council, Via G. Moruzzi 1, 56124 Pisa, Italy
  • Department of Computer Science – University of Pisa, Largo B. Pontecorvo 3, 56127 Pisa, Italy
  • Email:
/ Lorenzo Gabrielli
  • KDD Lab - ISTI - National Research Council, Via G. Moruzzi 1, 56124 Pisa, Italy
  • Department of Information Engineering – University of Pisa, Via G. Caruso 16, 56122 Pisa, Italy.
  • Email:
Published Online: 2015-06-27 | DOI: https://doi.org/10.1515/jos-2015-0017


The timely, accurate monitoring of social indicators, such as poverty or inequality, on a finegrained spatial and temporal scale is a crucial tool for understanding social phenomena and policymaking, but poses a great challenge to official statistics. This article argues that an interdisciplinary approach, combining the body of statistical research in small area estimation with the body of research in social data mining based on Big Data, can provide novel means to tackle this problem successfully. Big Data derived from the digital crumbs that humans leave behind in their daily activities are in fact providing ever more accurate proxies of social life. Social data mining from these data, coupled with advanced model-based techniques for fine-grained estimates, have the potential to provide a novel microscope through which to view and understand social complexity. This article suggests three ways to use Big Data together with small area estimation techniques, and shows how Big Data has the potential to mirror aspects of well-being and other socioeconomic phenomena.

Keywords: Social mining; auxiliary information; poverty measures


  • Bethlehem, J.G. 2002. “Weighting Nonresponse Adjustments Based on Auxiliary Information.” In Survey Nonresponse, edited by R.M. Groves, D.A. Dillman, J.L. Eltinge, and R.J.A. Little. New York: John Wiley and Sons.

  • Bethlehem, J. and S. Biffignandi. 2012. Handbook of Web Surveys. Hoboken, NJ: John Wiley and Sons.

  • Chambers, R.L. and N. Tzavidis. 2006. “M-Quantile Models for Small Area Estimation.” Biometrika 93: 255-268. Doi: http://dx.doi.org/10.1093/biomet/93.2.255. [Crossref]

  • Cheng, C.L. and J.W. Van Ness. 1999. Statistical Regression with Measurement Error. London: Arnold.

  • Eagle, N., M. Macy, and R. Claxton. 2010. “Network Diversity and Economic Development.” Science 328: 1029-1031. Doi: http://dx.doi.org/10.1126/science.1186605. [Crossref]

  • European Commission. 2015. EU-SILC USER DATABASE DESCRIPTION Version 2007-1. Luxembourg: EC. Available at: http://ec.europa.eu/eurostat/web/income-andliving-conditions/methodology/list-variables (accessed April 26, 2015).

  • Eurostat. 2014. Summary Record of 22nd Meeting of the European Statistical System Committee, Riga, September 26, 2014. Available at: https://www.google.it/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CCIQFjAA&url=http%3A%2F%2Fec.europa.eu%2Ftransparency%2Fregcomitology%2Findex.cfm%3Fdo%3DSearch.getPDF%26IU%2BSnl%2FK6tfKhHQT6oxF31qBB7fI4EnisQ1BdEUO8vC5SVAw47eF02NzJJLXFBE7MymAolL%2BDBgWkUQAUSR0vEUBA1Uxa7mJl1GidS%2BHNzw%3D&ei=5OE8VYKOLozfU9nNgtAH&usg=AFQjCNFEydu1g4aGiE_rpFJfOBD4EnRW9Q&sig2=qEgQ4yw9epL7R7eVYmTmQA&bvm=bv.91665533,d.d24 (accessed April 26, 2015).

  • Fabrizi, E., C. Giusti, N. Salvati, and N. Tzavidis. 2014. “Mapping Average Equivalized Income Using Robust Small Area Methods.” Papers in Regional Science 93: 685-701. Available at: http://onlinelibrary.wiley.com/doi/10.1111/pirs.12015/abstract (accessed April 2015). [Crossref]

  • Fay, R. and R. Herriot. 1979. “Estimates of Income for Small Places: An Application of James-Stein Procedures to Census Data.” Journal of the American Statistical Association 74: 269-277. DOI: http://dx.doi.org/10.1080/01621459.1979.10482505. [Crossref]

  • Filippucci, C. 2011. “Statistical Sources and Statistical Systems in the Information Society.” Statistica 71: 189-211.

  • Foster, J., J. Greer, and E. Thorbecke. 1984. “A Class of Decomposable Poverty Measures.” Econometrica 52: 761-766. [Crossref]

  • Ghosh, M., K. Sinha, and D. Kim. 2006. “Empirical and Hierarchical Bayesian Estimation in Finite Population Sampling Under Structural Measurement Error Models.” Scandinavian Journal of Statistics 33: 591-608. [Crossref]

  • Giannotti, F., D. Pedreschi, A. Pentland, P. Lukowicz, D. Kossmann, J. Crowley, and D. Helbing. 2012. “A Planetary Nervous System for Social Mining and Collective Awareness.” European Physics Journal - Special Topics 214: 49-75. Doi: http://dx.doi.org/10.1140/epjst/e2012-01688-9. [Web of Science] [Crossref]

  • Giusti, C., S. Marchetti, M. Pratesi, and N. Salvati. 2012a. “Semiparametric Fay-Herriot Model Using Penalized Splines.” Journal of the Indian Society of Agricultural Statistics 66: 1-14.

  • Giusti, C., S. Marchetti, M. Pratesi, and N. Salvati. 2012b. “Robust Small Area Estimation and Oversampling in the Estimation of Poverty Indicators.” Survey Research Methods 6: 155-163.

  • Hagenaars, A.J.M., K. de Vos, and M.A. Zaidi. 1994. Poverty Statistics in the Late 1980s: Research Based on Micro-data. Luxembourg: Eurostat.

  • Hastie, T., R. Tibshirani, and J. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics, 2nd ed. New York: Springer.

  • Horrigan, M.W. 2013. “Big Data: A Perspective From the BLS.” Amstat News January 2013: 25-27. Available at: http://magazine.amstat.org/blog/2013/01/01/sci-policyjan2013/ (accessed April 26, 2015).

  • ISTAT 1997. I Sisitemi Locali del Lavoro. Rome: ISTAT. Available at: http://www.istat.it/it/strumenti/territorio-e-cartografia/sistemi-locali-del-lavoro (accessed April 26, 2015).

  • Marchetti, S., N. Tzavidis, and M. Pratesi. 2012. “Non-Parametric Bootstrap Mean Squared Error Estimation for M-Quantile Estimators of Small Area Averages, Quantiles and Poverty Indicators.” Computational Statistics and Data Analysis 56: 2889-2902.Doi: http://dx.doi.org/10.1016/j.csda.2012.01.023. [Crossref]

  • Pappalardo, L., S. Rinzivillo, Z. Qu, D. Pedreschi, and F. Giannotti. 2013. “Understanding the Patterns of Car Travel.” The European Physical Journal - Special Topics 215: 61-73. Doi: http://dx.doi.org/10.1140/epjst/e2013-01715-5. [Crossref] [Web of Science]

  • Pentland, A. 2012. “Society’s Nervous System: Building Effective Government, Energy, and Public Health Systems.” Computer 45: 31-38. [Web of Science] [Crossref]

  • Porter, A.T., S.H. Holan, C.K. Wikle, and N. Cressie. 2014. “Spatial Fay-Herriot Models for Small Area Estimation with Functional Covariates.” Spatial Statistics 10: 27-42.Doi: http://dx.doi.org/10.1016/j.spasta.2014.07.001. [Crossref]

  • Pratesi, M., C. Giusti, S. Marchetti, N. Salvati, N. Tzavidis, I. Molina, M. Durban, A. Grane´, J.M. Marı`n, M.H. Veiga, D. Morales, M.D. Esteban, A. Sanchez, L. Santamaria, Y. Marhuenda, A. Perez, M. Pagliarella, C. Ferretti, and J.N.K.

  • Rao. 2010. SAMPLE Project - Pilot Application. Brussels: European Commission - Directorate General for Research and Innovation. Available at: http://www.sampleproject.eu/SAMPLEwp2d17.pdf (accessed April 26, 2015).

  • Rao, J.N.K. 2003. Small Area Estimation. New York: John Wiley and Sons.

  • Salvati, N., C. Giusti, and M. Pratesi. 2014. “The Use of Spatial Information for the Estimation of Poverty Indicators at the Small Area Level.” In Poverty and Social Exclusion, New Methods of Analysis, edited by G. Betti and A. Lemmi. London: Routledge.

  • Tan, P.N., M. Steinbach, and V. Kumar. 2006. Introduction to Data Mining. Boston: Addison-Wesley.

  • Torabi, M., G.S. Datta, and J.N.K. Rao. 2009. “Empirical Bayes Estimation of Small Area Means under a Nested Error Linear Regression Model with Measurement Errors in the Covariates.” Scandinavian Journal of Statistics 36: 355-368. Doi: http://dx.doi.org/10.1111/j.1467-9469.2008.00623.x. [Crossref]

  • Tzavidis, N., S. Marchetti, and R. Chambers. 2010. “Robust Prediction of Small Area Means and Distributions.” Australian and New Zealand Journal of Statistics 52: 167-186. Doi: http://dx.doi.org/10.1111/j.1467-842X.2010.00572.x. [Crossref]

  • Wolter, K.M. 2007. Introduction to Variance Estimation. New York: Springer. Ybarra, L.M.R. 2003. Small Area Estimation Using Data from Multiple Surveys. Unpublished PhD thesis, Arizona State University.

  • Ybarra, L.M.R., and S.L. Lohr. 2008. “Small Area Estimation When Auxiliary Information is Measured With Error.” Biometrika 95: 919-931. Doi: http://dx.doi.org/10.1093/biomet/asn048. [Web of Science] [Crossref]

About the article

Received: 2013-07-01

Revised: 2015-02-01

Accepted: 2015-02-01

Published Online: 2015-06-27

Published in Print: 2015-06-01

Citation Information: Journal of Official Statistics, ISSN (Online) 2001-7367, DOI: https://doi.org/10.1515/jos-2015-0017. Export Citation

© by Stefano Marchetti. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. (CC BY-NC-ND 3.0)

Comments (0)

Please log in or register to comment.
Log in