Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Journal of Official Statistics

The Journal of Statistics Sweden

4 Issues per year


IMPACT FACTOR 2016: 0.411
5-year IMPACT FACTOR: 0.776

CiteScore 2016: 0.63

SCImago Journal Rank (SJR) 2016: 0.710
Source Normalized Impact per Paper (SNIP) 2016: 0.975

Open Access
Online
ISSN
2001-7367
See all formats and pricing
More options …

Big Data as a Source for Official Statistics

Piet J.H. Daas
  • Corresponding author
  • Statistics Netherlands, Division of Process development, IT and methodology P.O. Box 4481, 6401 CZ, Heerlen, The Netherlands.
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Marco J. Puts
  • Statistics Netherlands, Division of Process development, IT and methodology P.O. Box 4481, 6401 CZ, Heerlen, The Netherlands
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Bart Buelens
  • Statistics Netherlands, Division of Process development, IT and methodology P.O. Box 4481, 6401 CZ, Heerlen, The Netherlands.
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Paul A.M. van den Hurk
  • Statistics Netherlands, Division of Process development, IT and methodology P.O. Box 4481, 6401 CZ, Heerlen, The Netherlands.
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2015-06-27 | DOI: https://doi.org/10.1515/jos-2015-0016

Abstract

More and more data are being produced by an increasing number of electronic devices physically surrounding us and on the internet. The large amount of data and the high frequency at which they are produced have resulted in the introduction of the term ‘Big Data’. Because these data reflect many different aspects of our daily lives and because of their abundance and availability, Big Data sources are very interesting from an official statistics point of view. This article discusses the exploration of both opportunities and challenges for official statistics associated with the application of Big Data. Experiences gained with analyses of large amounts of Dutch traffic loop detection records and Dutch social media messages are described to illustrate the topics characteristic of the statistical analysis and use of Big Data.

Keywords: Large data sets; traffic data; social media

References

  • ASA. 2014. Discovery With Data: Leveraging Statistics with Computer Science to Transform Science and Society. July 2, 2014 version. Available at: http://www.amstat.org/policy/pdfs/BigDataStatisticsJune2014.pdf (accessed July 2014).Google Scholar

  • Beyer, M.A. and L. Douglas. 2012. The Importance of ‘Big Data’: A Definition. Gartner report, June version, ID Number: G00235055. Available at: http://www.gartner.com/it-glossary/big-data/ (accessed January 2013).Google Scholar

  • Breiman, L. 2001. “Statistical Modeling: The Two Cultures.” Statistical Science 16: 99-231. Doi: http://dx.doi.org/10.1214/ss/1009213726.CrossrefGoogle Scholar

  • Buelens, B., H.J. Boonstra, J. van den Brakel, and P. Daas. 2012. Shifting Paradigms in Official Statistics: from Design-Based to Model-Based to Algorithmic Inference. Discussion paper 201218, Statistics Netherlands, The Hague/Heerlen.Google Scholar

  • Buelens, B., P. Daas, J. Burger, M. Puts, and J. van den Brakel. 2014. Selectivity of Big Data. Discussion paper 201411, Statistics Netherlands, The Hague/Heerlen, The Netherlands.Google Scholar

  • Cheung, P. 2012. Big Data, Official Statistics and Social Science Research: Emerging Data Challenges. Presentation at the December 19th World Bank meeting, Washington.Available at: http://www.worldbank.org/wb/Big-data-pc-2012-12-12.pdf (accessed January 2013).Google Scholar

  • Coosto. 2013. Main page. Available at: http://www.coosto.com/uk/ (accessed August 2013).Google Scholar

  • Daas, P.J.H. and M.J.H. Puts. 2014. Social Media Sentiment and Consumer Confidence.Paper for the Workshop on using Big Data for Forecasting and Statistics, April 7-8, Frankfurt, Germany. Available at: https://www.ecb.europa.eu/pub/pdf/scpsps/ecbsp5.pdf (accessed April 2015).Google Scholar

  • Daas, P.J.H., M. Roos, M. van de Ven, and J. Neroni. 2012a. Twitter as a Potential Data Source for Statistics. Discussion paper 201221, The Hague/Heerlen: Statistics Netherlands.Google Scholar

  • Daas, P., M. Tennekes, E. de Jonge, A. Priem, B. Buelens, M. van Pelt, and P. van den Hurk. 2012b. Data Science and the Future of Statistics. Presentation at the first Data Science NL meetup, Utrecht University, Utrecht. Available at: http://www.slideshare.net/pietdaas/data-science-and-the-future-of-statistics (accessed December 2012).Google Scholar

  • De Jonge, E., M. van Pelt, and M. Roos. 2012. Time Patterns, Geospatial Clustering and Mobility Statistics Based on Mobile Phone Network Data. Discussion paper 201214, The Hague/Heerlen: Statistics Netherlands.Google Scholar

  • De Jonge, E., J. Wijffels, and J. van der Laan. 2014. “ffbase: Basic Statistical Functions for Package ff. R package version 0.11.3.” Available at: http://cran.r-project.org/web/packages/ffbase/index.html (accessed April 2015).Google Scholar

  • De Waal, T., J. Pannekoek, and S. Scholtus. 2011. Handbook of Statistical Editing and Imputation. Hoboken, NJ: John Wiley & Sons.Google Scholar

  • Engle, R.F. and C.W.J. Granger. 1987. “Co-Integration and Error Correction: Representation, Estimation, and Testing.” Econometrica 55: 251-276.CrossrefGoogle Scholar

  • Eurostat. 2012. Internet Access and Use. Eurostat newsrelease 185/2012, December 18, 2012. Available at: http://epp.eurostat.ec.europa.eu/cache/ITY_PUBLIC/4-18122012-AP/EN/4-18122012-AP-EN.PDF (accessed January 2013).Google Scholar

  • Flekova, L. and I. Gurevych. 2013. Can We Hide in the Web? Large Scale Simultaneous Age and Gender Author Profiling in Social Media. Paper for the evaluation lab on uncovering plagiarism, authorship, and social software misuse at Conference and Labs Evaluation Forum 2013, September 23-26, Valencia, Spain.Google Scholar

  • Fry, B. 2008. Visualizing Data: Exploring and Explaining Data with the Processing Environment. Sebastopol, CA: O’Reilly Media Inc.Google Scholar

  • Glasson, M., J. Trepanier, V. Patruno, P. Daas, M. Skaliotis, and A. Khan. 2013. What does “Big Data” mean for Official Statistics? Paper for the High-Level Group for the Modernization of Statistical Production and Services, March 10.Google Scholar

  • Golder, S.A. and M.W. Macy. 2011. “Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures.” Science 30: 1878-1881. Doi: http://dx.doi.org/10.1126/science.1202775. Web of ScienceCrossrefGoogle Scholar

  • Groves, R.M. 2011. “Three Eras of Survey Research.” Public Opinion Quarterly 75: 861-871. Doi: http://dx.doi.org/10.1093/poq/nfr057.CrossrefWeb of ScienceGoogle Scholar

  • Hassani, H., G. Saporta, and E. Sirimal Silvia. 2014. “Data Mining and Official Statistics: The Past, the Present and the Future.” Big Data 2: 1-10. Doi: http://dx.doi.org/10.1089/big.2013.0038.CrossrefGoogle Scholar

  • Hastie, T., R. Tibshirani, and J. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York: Springer Science þ Business Media, LLC.Google Scholar

  • Lansdall-Welfare, T., V. Lampos, and N. Cristianini. 2012. “Nowcasting the Mood of the Nation.” Significance 9: 26-28. Available at: http://www.significancemagazine.org/details/magazine/2468761/Nowcasting-the-mood-of-the-nation.html (accessed January 2013).Google Scholar

  • Lynch, C. 2008. “Big Data: How Do Your Data Grow?” Nature 455: 28-29. Doi: http:// dx.doi.org/10.1038/455028a.Web of ScienceCrossrefGoogle Scholar

  • Manton, J.H., V. Krishnamurthy, and R.J. Elliott. 1999. “Discrete Time Filters for Double Stochastic Poisson Processes and Other Exponential Noise Models.” International Journal of Adaptive Control and Signal Processing 13: 393-416.Google Scholar

  • Manyika, J., M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and A. Hung Byers. 2011. Big Data: The Next Frontier for Innovation, Competition, and Productivity. Report of the McKinsey Global Institute, McKinsey & Company.Google Scholar

  • NAS. 2013. Frontiers in Massive Data Analysis. Washington, DC: The National Academies Press.Google Scholar

  • NDW. 2012. The Database Explained. Brochure of the National Data Warehouse for Traffic Information, March. Available at: http://www.ndw.nu/download_files.php?action¼download_file&file_hash¼209140a807e959f06646b0311f79de26 (accessed December 2012).Google Scholar

  • O’Connor, B., R. Balasubramanyan, B.R. Routledge, and N.A. Smith. 2010. From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series. Carnegie Mellon University, Research Showcase. Available at: www.cs.cmu.edu/,nasmith/papers/oconnorþbalasubramanyanþroutledgeþsmith.icwsm10.pdf (accessed April 2015).Google Scholar

  • R Development Core Team. 2012. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna.Google Scholar

  • Rajaraman, A. and J.D. Ullman. 2011. Mining of Massive Datasets. Cambridge: Cambridge University Press.Google Scholar

  • Schutt, R. and C. O’Neil. 2013. Doing Data Science: Straight Talk from the Frontline.Google Scholar

  • Sebastopol, CA: O’Reilly Media. Scott, S.L., A.W. Blocker, F.V. Bonassi, H.A. Chipman, E.I. George, and R.E. McCulloch. 2013. Bayes and Big Data: The Consensus Monte Carlo Algorithm. Bayes 250. Available at: http://www.rob-mcculloch.org/some_papers_and_talks/papers/working/consensus-mc.pdf (accessed April 2015).Google Scholar

  • Statistics Netherlands. 2013. Consumer Confidence Survey. Available at: http://www.cbs.nl/en-GB/menu/methoden/dataverzameling/consumenten-conjunctuuronderzoek-cco.htm (accessed April 2013).Google Scholar

  • Struijs, P. and P.J.H. Daas. 2013. Big Data, Big Impact? Paper for the Seminar on Statistical Data Collection, September 25-27, Geneva. Switzerland Google Scholar

  • Tennekes, M., E. de Jonge, and P.J.H. Daas. 2013. “Visualizing and Inspecting Large Datasets with Tableplots.” Journal of Data Science 11: 43-58.Google Scholar

  • Van der Laan, J. 2013. LaF: Fast Access to Large ASCII files. R package version 0.5.Google Scholar

  • Zikopoulos, P., D. deRoos, K. Parasuraman, T. Deutsch, D. Corrigan, and J. Giles. 2012. Harness the Power of Big Data. New York: McGraw-Hill. Google Scholar

About the article

Received: 2013-08-01

Revised: 2013-08-01

Accepted: 2014-09-01

Published Online: 2015-06-27

Published in Print: 2015-06-01


Citation Information: Journal of Official Statistics, ISSN (Online) 2001-7367, DOI: https://doi.org/10.1515/jos-2015-0016.

Export Citation

© by Piet J.H. Daas. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. BY-NC-ND 3.0

Comments (0)

Please log in or register to comment.
Log in