Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Journal of Official Statistics

The Journal of Statistics Sweden

4 Issues per year

IMPACT FACTOR 2016: 0.411
5-year IMPACT FACTOR: 0.776

CiteScore 2016: 0.63

SCImago Journal Rank (SJR) 2016: 0.710
Source Normalized Impact per Paper (SNIP) 2016: 0.975

Open Access
See all formats and pricing
More options …

Small Area Model-Based Estimators Using Big Data Sources

Stefano Marchetti / Caterina Giusti / Monica Pratesi / Nicola Salvati / Fosca Giannotti / Dino Pedreschi / Salvatore Rinzivillo / Luca Pappalardo
  • KDD Lab - ISTI - National Research Council, Via G. Moruzzi 1, 56124 Pisa, Italy
  • Department of Computer Science – University of Pisa, Largo B. Pontecorvo 3, 56127 Pisa, Italy
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Lorenzo Gabrielli
  • KDD Lab - ISTI - National Research Council, Via G. Moruzzi 1, 56124 Pisa, Italy
  • Department of Information Engineering – University of Pisa, Via G. Caruso 16, 56122 Pisa, Italy.
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2015-06-27 | DOI: https://doi.org/10.1515/jos-2015-0017


The timely, accurate monitoring of social indicators, such as poverty or inequality, on a finegrained spatial and temporal scale is a crucial tool for understanding social phenomena and policymaking, but poses a great challenge to official statistics. This article argues that an interdisciplinary approach, combining the body of statistical research in small area estimation with the body of research in social data mining based on Big Data, can provide novel means to tackle this problem successfully. Big Data derived from the digital crumbs that humans leave behind in their daily activities are in fact providing ever more accurate proxies of social life. Social data mining from these data, coupled with advanced model-based techniques for fine-grained estimates, have the potential to provide a novel microscope through which to view and understand social complexity. This article suggests three ways to use Big Data together with small area estimation techniques, and shows how Big Data has the potential to mirror aspects of well-being and other socioeconomic phenomena.

Keywords: Social mining; auxiliary information; poverty measures


  • Bethlehem, J.G. 2002. “Weighting Nonresponse Adjustments Based on Auxiliary Information.” In Survey Nonresponse, edited by R.M. Groves, D.A. Dillman, J.L. Eltinge, and R.J.A. Little. New York: John Wiley and Sons.Google Scholar

  • Bethlehem, J. and S. Biffignandi. 2012. Handbook of Web Surveys. Hoboken, NJ: John Wiley and Sons.Google Scholar

  • Chambers, R.L. and N. Tzavidis. 2006. “M-Quantile Models for Small Area Estimation.” Biometrika 93: 255-268. Doi: http://dx.doi.org/10.1093/biomet/93.2.255.CrossrefGoogle Scholar

  • Cheng, C.L. and J.W. Van Ness. 1999. Statistical Regression with Measurement Error. London: Arnold.Google Scholar

  • Eagle, N., M. Macy, and R. Claxton. 2010. “Network Diversity and Economic Development.” Science 328: 1029-1031. Doi: http://dx.doi.org/10.1126/science.1186605.CrossrefGoogle Scholar

  • European Commission. 2015. EU-SILC USER DATABASE DESCRIPTION Version 2007-1. Luxembourg: EC. Available at: http://ec.europa.eu/eurostat/web/income-andliving-conditions/methodology/list-variables (accessed April 26, 2015).Google Scholar

  • Eurostat. 2014. Summary Record of 22nd Meeting of the European Statistical System Committee, Riga, September 26, 2014. Available at: https://www.google.it/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CCIQFjAA&url=http%3A%2F%2Fec.europa.eu%2Ftransparency%2Fregcomitology%2Findex.cfm%3Fdo%3DSearch.getPDF%26IU%2BSnl%2FK6tfKhHQT6oxF31qBB7fI4EnisQ1BdEUO8vC5SVAw47eF02NzJJLXFBE7MymAolL%2BDBgWkUQAUSR0vEUBA1Uxa7mJl1GidS%2BHNzw%3D&ei=5OE8VYKOLozfU9nNgtAH&usg=AFQjCNFEydu1g4aGiE_rpFJfOBD4EnRW9Q&sig2=qEgQ4yw9epL7R7eVYmTmQA&bvm=bv.91665533,d.d24 (accessed April 26, 2015).Google Scholar

  • Fabrizi, E., C. Giusti, N. Salvati, and N. Tzavidis. 2014. “Mapping Average Equivalized Income Using Robust Small Area Methods.” Papers in Regional Science 93: 685-701. Available at: http://onlinelibrary.wiley.com/doi/10.1111/pirs.12015/abstract (accessed April 2015).CrossrefGoogle Scholar

  • Fay, R. and R. Herriot. 1979. “Estimates of Income for Small Places: An Application of James-Stein Procedures to Census Data.” Journal of the American Statistical Association 74: 269-277. DOI: http://dx.doi.org/10.1080/01621459.1979.10482505.CrossrefGoogle Scholar

  • Filippucci, C. 2011. “Statistical Sources and Statistical Systems in the Information Society.” Statistica 71: 189-211.Google Scholar

  • Foster, J., J. Greer, and E. Thorbecke. 1984. “A Class of Decomposable Poverty Measures.” Econometrica 52: 761-766.CrossrefGoogle Scholar

  • Ghosh, M., K. Sinha, and D. Kim. 2006. “Empirical and Hierarchical Bayesian Estimation in Finite Population Sampling Under Structural Measurement Error Models.” Scandinavian Journal of Statistics 33: 591-608.CrossrefGoogle Scholar

  • Giannotti, F., D. Pedreschi, A. Pentland, P. Lukowicz, D. Kossmann, J. Crowley, and D. Helbing. 2012. “A Planetary Nervous System for Social Mining and Collective Awareness.” European Physics Journal - Special Topics 214: 49-75. Doi: http://dx.doi.org/10.1140/epjst/e2012-01688-9.Web of ScienceCrossrefGoogle Scholar

  • Giusti, C., S. Marchetti, M. Pratesi, and N. Salvati. 2012a. “Semiparametric Fay-Herriot Model Using Penalized Splines.” Journal of the Indian Society of Agricultural Statistics 66: 1-14.Google Scholar

  • Giusti, C., S. Marchetti, M. Pratesi, and N. Salvati. 2012b. “Robust Small Area Estimation and Oversampling in the Estimation of Poverty Indicators.” Survey Research Methods 6: 155-163. Google Scholar

  • Hagenaars, A.J.M., K. de Vos, and M.A. Zaidi. 1994. Poverty Statistics in the Late 1980s: Research Based on Micro-data. Luxembourg: Eurostat.Google Scholar

  • Hastie, T., R. Tibshirani, and J. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics, 2nd ed. New York: Springer.Google Scholar

  • Horrigan, M.W. 2013. “Big Data: A Perspective From the BLS.” Amstat News January 2013: 25-27. Available at: http://magazine.amstat.org/blog/2013/01/01/sci-policyjan2013/ (accessed April 26, 2015).Google Scholar

  • ISTAT 1997. I Sisitemi Locali del Lavoro. Rome: ISTAT. Available at: http://www.istat.it/it/strumenti/territorio-e-cartografia/sistemi-locali-del-lavoro (accessed April 26, 2015).Google Scholar

  • Marchetti, S., N. Tzavidis, and M. Pratesi. 2012. “Non-Parametric Bootstrap Mean Squared Error Estimation for M-Quantile Estimators of Small Area Averages, Quantiles and Poverty Indicators.” Computational Statistics and Data Analysis 56: 2889-2902.Doi: http://dx.doi.org/10.1016/j.csda.2012.01.023.CrossrefGoogle Scholar

  • Pappalardo, L., S. Rinzivillo, Z. Qu, D. Pedreschi, and F. Giannotti. 2013. “Understanding the Patterns of Car Travel.” The European Physical Journal - Special Topics 215: 61-73. Doi: http://dx.doi.org/10.1140/epjst/e2013-01715-5.CrossrefWeb of ScienceGoogle Scholar

  • Pentland, A. 2012. “Society’s Nervous System: Building Effective Government, Energy, and Public Health Systems.” Computer 45: 31-38.Web of ScienceCrossrefGoogle Scholar

  • Porter, A.T., S.H. Holan, C.K. Wikle, and N. Cressie. 2014. “Spatial Fay-Herriot Models for Small Area Estimation with Functional Covariates.” Spatial Statistics 10: 27-42.Doi: http://dx.doi.org/10.1016/j.spasta.2014.07.001.CrossrefGoogle Scholar

  • Pratesi, M., C. Giusti, S. Marchetti, N. Salvati, N. Tzavidis, I. Molina, M. Durban, A. Grane´, J.M. Marı`n, M.H. Veiga, D. Morales, M.D. Esteban, A. Sanchez, L. Santamaria, Y. Marhuenda, A. Perez, M. Pagliarella, C. Ferretti, and J.N.K. Google Scholar

  • Rao. 2010. SAMPLE Project - Pilot Application. Brussels: European Commission - Directorate General for Research and Innovation. Available at: http://www.sampleproject.eu/SAMPLEwp2d17.pdf (accessed April 26, 2015).Google Scholar

  • Rao, J.N.K. 2003. Small Area Estimation. New York: John Wiley and Sons.Google Scholar

  • Salvati, N., C. Giusti, and M. Pratesi. 2014. “The Use of Spatial Information for the Estimation of Poverty Indicators at the Small Area Level.” In Poverty and Social Exclusion, New Methods of Analysis, edited by G. Betti and A. Lemmi. London: Routledge.Google Scholar

  • Tan, P.N., M. Steinbach, and V. Kumar. 2006. Introduction to Data Mining. Boston: Addison-Wesley.Google Scholar

  • Torabi, M., G.S. Datta, and J.N.K. Rao. 2009. “Empirical Bayes Estimation of Small Area Means under a Nested Error Linear Regression Model with Measurement Errors in the Covariates.” Scandinavian Journal of Statistics 36: 355-368. Doi: http://dx.doi.org/10.1111/j.1467-9469.2008.00623.x.CrossrefGoogle Scholar

  • Tzavidis, N., S. Marchetti, and R. Chambers. 2010. “Robust Prediction of Small Area Means and Distributions.” Australian and New Zealand Journal of Statistics 52: 167-186. Doi: http://dx.doi.org/10.1111/j.1467-842X.2010.00572.x.CrossrefGoogle Scholar

  • Wolter, K.M. 2007. Introduction to Variance Estimation. New York: Springer. Ybarra, L.M.R. 2003. Small Area Estimation Using Data from Multiple Surveys. Unpublished PhD thesis, Arizona State University.Google Scholar

  • Ybarra, L.M.R., and S.L. Lohr. 2008. “Small Area Estimation When Auxiliary Information is Measured With Error.” Biometrika 95: 919-931. Doi: http://dx.doi.org/10.1093/biomet/asn048. Web of ScienceCrossrefGoogle Scholar

About the article

Received: 2013-07-01

Revised: 2015-02-01

Accepted: 2015-02-01

Published Online: 2015-06-27

Published in Print: 2015-06-01

Citation Information: Journal of Official Statistics, Volume 31, Issue 2, Pages 263–281, ISSN (Online) 2001-7367, DOI: https://doi.org/10.1515/jos-2015-0017.

Export Citation

© by Stefano Marchetti. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. BY-NC-ND 3.0

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Luca Pappalardo and Filippo Simini
Data Mining and Knowledge Discovery, 2017
Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, and Anna Monreale
ACM Transactions on Intelligent Systems and Technology, 2017, Volume 9, Number 3, Page 1
Mary E. Thompson
Canadian Journal of Statistics, 2017
Enrico di Bella, Lucia Leporatti, and Filomena Maggino
Social Indicators Research, 2016
Luca Pappalardo, Maarten Vanhoof, Lorenzo Gabrielli, Zbigniew Smoreda, Dino Pedreschi, and Fosca Giannotti
International Journal of Data Science and Analytics, 2016, Volume 2, Number 1-2, Page 75

Comments (0)

Please log in or register to comment.
Log in