Abstract
We use 23M Tweets related to the EU referendum in the UK to predict the Brexit vote. In particular, we use user-generated labels known as hashtags to build training sets related to the Leave/Remain campaign. Next, we train SVMs in order to classify Tweets. Finally, we compare our results to Internet and telephone polls. This approach not only allows to reduce the time of hand-coding data to create a training set, but also achieves high level of correlations with Internet polls. Our results suggest that Twitter data may be a suitable substitute for Internet polls and may be a useful complement for telephone polls. We also discuss the reach and limitations of this method.
References
Ackerman, S., B. Jacobs and S. Siddiqui (2016) Newly Discovered Emails Relating to Hillary Clinton Case Under Review by FBI. Retrieved January 06, 2017, from https://www.theguardian.com/us-news/2016/oct/28/fbi-reopens-hillary-clinton-emails-investigation.Search in Google Scholar
Barberá, P. (2014) “Birds of the Same Feather Tweet Together: Bayesian Ideal Point Estimation Using Twitter Data,” Political Analysis, 23:76–91.10.1093/pan/mpu011Search in Google Scholar
Barberá, P. and G. Rivero (2015) “Understanding the Political Representativeness of Twitter Users,” Social Science Computer Review, 33:712–729.10.1177/0894439314558836Search in Google Scholar
Beauchamp, N. (2017) “Predicting and Interpolating State-Level Polls Using Twitter Textual Data,” American Journal of Political Science, 61:490–503.10.1111/ajps.12274Search in Google Scholar
Benoit, K., K. Watanabe, P. Nulty, A. Obeng, H. Wang, B. Lauderdale and W. Lowe (2017) quanteda: Quantitative Analysis of Textual Data. URL http://quanteda.io, r package version 0.99.Search in Google Scholar
Berinsky, A. J. (2017) “Measuring Public Opinion with Surveys,” Annual Review of Political Science, 20:309–329.10.1146/annurev-polisci-101513-113724Search in Google Scholar
Bermingham, A. and A. F. Smeaton (2010) “Classifying Sentiment in Microblogs.” In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management – CIKM ’10, 1833. URL http://portal.acm.org/citation.cfm?doid=1871437.1871741.10.1145/1871437.1871741Search in Google Scholar
Bird, S., E. Loper and E. Klein (2009) Natural Language Processing with Python. Sebastopol, CA: OReilly Media Inc.Search in Google Scholar
Burnap, P., R. Gibson, L. Sloan, R. Southern and M. Williams (2016) “140 Characters to Victory?: Using Twitter to Predict the UK 2015 General Election,” Electoral Studies, 41:230–233.10.1016/j.electstud.2015.11.017Search in Google Scholar
Caldarelli, G., A. Chessa, F. Pammolli, G. Pompa, M. Puliga, M. Riccaboni and G. Riotta (2014) “A Multi-level Geographical Study of Italian Political Elections from Twitter Data,” PLoS One, 9:e95809.10.1371/journal.pone.0095809Search in Google Scholar
Campbell, A., P. E. Converse, W. E. Miller and E. Donald (1960) Stokes. The American Voter. New York, NY: John Wiley and Sons, p. 77.Search in Google Scholar
Ceron, A., L. Curini, S. M. Iacus and G. Porro (2014) “Every Tweet Counts? How Sentiment Analysis of Social Media Can Improve Our Knowledge of Citizens’ Political Preferences with an Application to Italy and France,” New Media & Society, 16:340–358.10.1177/1461444813480466Search in Google Scholar
Chin, D., A. Zappone and J. Zhao (2016) “Analyzing Twitter Sentiment of the 2016 Presidential Candidates.” Available at: https://web.stanford.edu/~jesszhao/files/twitterSentiment.pdfSearch in Google Scholar
DiGrazia, J., K. McKelvey, J. Bollen and F. Rojas (2013) “More Tweets, More Votes: Social Media as a Quantitative Indicator of Political Behavior,” PLoS One, 8:e79449.10.1371/journal.pone.0079449Search in Google Scholar
eMarketer (2016) Twitter, Facebook User Growth Slowing in the UK. Retrieved January 31, 2017, form https://www.emarketer.com/Article/Twitter-Facebook-User-Growth-Slowing-UK/1014326.Search in Google Scholar
Fábrega, J. and J. Sajuria (2014) “The Formation of Political Discourse Within Online Networks: The Case of the Occupy Movement,” International Journal of Organisational Design and Engineering, 3:210–222.10.1504/IJODE.2014.065094Search in Google Scholar
Franch, F. (2013) “Wisdom of the Crowds: 2010 UK Election Prediction with Social Media,” Journal of Information Technology & Politics, 10:57–71.10.1080/19331681.2012.705080Search in Google Scholar
Gayo-Avello, D. (2012) “No, You Cannot Predict Elections with Twitter,” IEEE Internet Computing, 16:91–94.10.1109/MIC.2012.137Search in Google Scholar
Gayo Avello, D., P. T. Metaxas and E. Mustafaraj (2011) “Limits of Electoral Predictions Using Twitter.” In: Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media. Association for the Advancement of Artificial Intelligence. Available at: https://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/viewFile/2862/3254.Search in Google Scholar
Hersh, E. D. (2015) Hacking the Electorate: How Campaigns Perceive Voters. Cambridge: Cambridge University Press.10.1017/CBO9781316212783Search in Google Scholar
Hopkins, D. and G. King (2010) “A Method of Automated Nonparametric Content Analysis for Social Science,” American Journal of Political Science, 54:229–247.10.1111/j.1540-5907.2009.00428.xSearch in Google Scholar
Howard, P. N. and B. Kollanyi (2016) Bots,# Strongerin, and# Brexit: Computational Propaganda During the uk-eu Referendum. Working Paper.10.2139/ssrn.2798311Search in Google Scholar
Huberty, M. (2015) “Can We Vote with Our Tweet? On the Perennial Difficulty of Election Forecasting with Social Media,” International Journal of Forecasting, 31:992–1007.10.1016/j.ijforecast.2014.08.005Search in Google Scholar
Huckfeldt, R. R. and J. Sprague (1995) Citizens, Politics and Social Communication: Information and Influence in an Election Campaign. Cambridge: Cambridge University Press.10.1017/CBO9780511664113Search in Google Scholar
Huckfeldt, R., E. G. Carmines, J. J. Mondak and E. Zeemering (2007) “Information, Activation, and Electoral Competition in the 2002 Congressional Elections,” Journal of Politics, 69:798–812.10.1111/j.1468-2508.2007.00576.xSearch in Google Scholar
Manning, C. D., P. Raghavan and H. Schütze (2008) Introduction to Information Retrieval. Cambridge: Cambridge University Press.10.1017/CBO9780511809071Search in Google Scholar
McKelvey, K., J. DiGrazia and F. Rojas (2014) “Twitter Publics: How Online Political Communities Signaled Electoral Outcomes in the 2010 US House Election,” Information, Communication & Society, 17:436–450.10.1080/1369118X.2014.892149Search in Google Scholar
Morstatter, F., J. Pfeffer, H. Liu and K. Carley (2013) “Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose,” Proceedings of ICWSM, 400–408. Available at: http://www.aaai.org/ocs/index.php/ICWSM/ICWSM13/paper/view/6071.Search in Google Scholar
Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot and E. Duchesnay (2011) “Scikit-Learn: Machine Learning in Python,” Journal of Machine Learning Research, 12:2825–2830.Search in Google Scholar
Sajuria, J. and J. Fábrega (2016) “Do We Need Polls? Why Twitter Will Not Replace Opinion Surveys, But Can Complement Them,” In: (Snee, H., C. Hine, Y. Morey, S. Roberts and H. Watson, eds.) Digital Methods for Social Science. Berlin: Springer, pp. 87–104.10.1057/9781137453662_6Search in Google Scholar
Sang, E. T. K. and J. Bos (2012) “Predicting the 2011 Dutch Senate Election Results with Twitter,” In: Proceedings of the Workshop on Semantic Analysis in Social Media. Association for Computational Linguistics, pp. 53–60. Available at: https://www.let.rug.nl/bos/pubs/TjongBos2012EACL.pdf.Search in Google Scholar
Settle, J. E., R. M. Bond, L. Coviello, C. J. Fariss, J. H. Fowler and J. J. Jones (2016) “From Posting to Voting: The Effects of Political Competition on Online Political Engagement,” Political Science Research and Methods, 4:61–378.10.1017/psrm.2015.1Search in Google Scholar
Silver, N. (2016a) The Myth of the Lag. Retrieved January 06, 2017, from http://fivethirtyeight.com/features/myth-of-lag/.Search in Google Scholar
Silver, N. (2016b) National Polls Will Wind Up Being More Accurate than They were in 2012: 2012: Obama up 1, Won by 4 2014: Clinton up 3–4, will win by 1–2 [tweet]. Retrieved January 06, 2017, from https://twitter.com/NateSilver538/status/796411118302302208.Search in Google Scholar
Tumasjan, A., T. O. Sprenger, P. G. Sandner and I. M. Welpe (2010) “Predicting Elections with Twitter: What 140 Characters Reveal About Political Sentiment,” ICWSM, 10:178–185.Search in Google Scholar
Verba, S., K. L. Schlozman, H. E. Brady and H. E. Brady (1995) Voice and Equality: Civic Voluntarism in American Politics, volume 4. Cambridge: Cambridge University Press.10.2307/j.ctv1pnc1k7Search in Google Scholar
©2017 Walter de Gruyter GmbH, Berlin/Boston