Optimising crowdsourcing efficiency: Amplifying human computation with validation

Jon Chamberlain 1 , Udo Kruschwitz 1  and Massimo Poesio 2
  • 1 University of Essex, Wivenhoe Park, Colchester, UK
  • 2 Queen Mary University, Mile End Rd, London, UK
Jon Chamberlain
  • Corresponding author
  • University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, UK
  • Email
  • Further information
  • Dr Jon Chamberlain is a web developer and lecturer in Human-Computer Interaction at the University of Essex with experience of industrial and academic computer applications (language processing, game design, social network analysis) in the domains of citizen science, marine ecology, and human rights observation. He was the lead developer of the Phrase Detectives project since its inception in 2007 and has continued investigating crowdsourcing using games and social networks for almost a decade.
  • Search for other articles:
  • degruyter.comGoogle Scholar
, Udo Kruschwitz
  • University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, UK
  • Email
  • Further information
  • Professor Udo Kruschwitz’s research interests are in natural language processing (NLP), information retrieval (IR) and the implementation of such techniques in real applications. He is developing techniques that allow the extraction of conceptual information from document collections and access logs and the utilization of such knowledge in search and navigation contexts. Professor Kruschwitz was Co-PI in the original EPRSC project that developed Phrase Detectives.
  • Search for other articles:
  • degruyter.comGoogle Scholar
and Massimo Poesio
  • Queen Mary University, Mile End Rd, London, E1 4NS, UK
  • Email
  • Further information
  • Professor Massimo Poesio is a computational linguist. His work on anaphora is driven by the analysis of corpora and of disagreements in corpus annotation, most recently, using the Phrase Detectives game-with-a-purpose to collect such data. He is also a PI of the DALI project, an Advanced ERC grant; a supervisor in the IGGI Doctoral training centre in Intelligent Games and Game Intelligence; and a PI in the Centre for Human Rights and Information Technology in the Era of Big Data.
  • Search for other articles:
  • degruyter.comGoogle Scholar


Crowdsourcing has revolutionised the way tasks can be completed but the process is frequently inefficient, costing practitioners time and money. This research investigates whether crowdsourcing can be optimised with a validation process, as measured by four criteria: quality; cost; noise; and speed. A validation model is described, simulated and tested on real data from an online crowdsourcing game to collect data about human language. Results show that by adding an agreement validation (or a like/upvote) step fewer annotations are required, noise and collection time are reduced and quality may be improved.

  • 1.

    R. Artstein and M. Poesio, Inter-coder agreement for computational linguistics, Computational Linguistics 34 (2008), 555–596.

    • Crossref
    • Export Citation
  • 2.

    Yochai Benkler and Helen Nissenbaum, Commons-based Peer Production and Virtue, Journal of Political Philosophy 14 (2006), 394–419.

    • Crossref
    • Export Citation
  • 3.

    Michael S. Bernstein, Greg Little, Robert C. Miller, Björn Hartmann, Mark S. Ackerman, David R. Karger, David Crowell and Katrina Panovich, Soylent: A Word Processor with a Crowd Inside, in: Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology (UIST’10), pp. 313–322, 2010.

  • 4.

    Amiangshu Bosu, Christopher S. Corley, Dustin Heaton, Debarshi Chatterji, Jeffrey C. Carver and Nicholas A. Kraft, Building Reputation in StackOverflow: An Empirical Investigation, in: Proceedings of the 10th Working Conference on Mining Software Repositories (MSR’13), pp. 89–92, 2013.

  • 5.

    Daren C. Brabham, Crowdsourcing, The MIT Press, 2013.

  • 6.

    J. Chamberlain, Groupsourcing: Distributed Problem Solving Using Social Networks, in: Proceedings of 2nd AAAI Conference on Human Computation and Crowdsourcing (HCOMP’14), 2014.

  • 7.

    J. Chamberlain, M. Poesio and U. Kruschwitz, Phrase Detectives Corpus 1.0 Crowdsourced Anaphoric Coreference, in: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16), may 2016.

  • 8.

    Ido Guy, Inbal Ronen, Naama Zwerdling, Irena Zuyev-Grabovitch and Michal Jacovi, What is Your Organization ‘Like’?: A Study of Liking Activity in the Enterprise, in: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, CHI ’16, pp. 3025–3037, ACM, New York, NY, USA, 2016.

  • 9.

    Matthias Hirth, Tobias Hoßfeld and Phuoc Tran-Gia, Analyzing costs and accuracy of validation mechanisms for crowdsourcing platforms, Mathematical and Computer Modelling 57 (2013), 2918–2932.

    • Crossref
    • Export Citation
  • 10.

    J. Howe, Crowdsourcing: Why the power of the crowd is driving the future of business, Crown Publishing Group, 2008.

  • 11.

    Faiza Khattak and Ansaf Salleb-Aouissi, Quality control of crowd labeling through expert evaluation, in: Proceedings of the 2nd Workshop on Computational Social Science and the Wisdom of Crowds (NIPS’11), 2011.

  • 12.

    Anand P. Kulkarni, Matthew Can and Bjoern Hartmann, Turkomatic: Automatic Recursive Task and Workflow Design for Mechanical Turk, in: CHI ’11 Extended Abstracts on Human Factors in Computing Systems, pp. 2053–2058, ACM, New York, NY, USA, 2011.

  • 13.

    M. Lafourcade, A. Joubert and N. Le Brun, Games with a Purpose (GWAPS), John Wiley & Sons, 2015.

  • 14.

    Greg Little, Lydia B. Chilton, Max Goldman and Robert C. Miller, TurKit: Human Computation Algorithms on Mechanical Turk, in: Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology, UIST ’10, pp. 57–66, ACM, New York, NY, USA, 2010.

  • 15.

    M. Poesio, J. Chamberlain, U. Kruschwitz, L. Robaldo and L. Ducceschi, Phrase Detectives: Utilizing Collective Intelligence for Internet-Scale Language Resource Creation, ACM Transactions on Interactive Intelligent Systems 3 (2013), 1–44.

  • 16.

    W. Rafelsberger and A. Scharl, Games with a purpose for social networking platforms, in: Proceedings of the 20th ACM Conference on Hypertext and Hypermedia, 2009.

  • 17.

    Victor S. Sheng, Foster Provost and Panagiotis G. Ipeirotis, Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers, in: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08), pp. 614–622, 2008.

  • 18.

    Brian Sidlauskas, Calvin Bernard, Devin Bloom, Whitcomb Bronaugh, Michael Clementson and Richard P. Vari, Ichthyologists Hooked on Facebook, Science 332 (2011), 537.

    • Crossref
    • PubMed
    • Export Citation
  • 19.

    R. Snow, B. O’Connor, D. Jurafsky and A. Y. Ng, Cheap and fast - but is it good?: Evaluating non-expert annotations for natural language tasks, in: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP’08), 2008.

Purchase article
Get instant unlimited access to the article.
Log in
Already have access? Please log in.

Log in with your institution

Journal + Issues

it - Information Technology is a strictly peer-reviewed scientific journal. It is the oldest German journal in the field of information technology. Today, the major aim of it - Information Technology is highlighting issues on ongoing newsworthy areas in information technology and informatics and their application. It aims at presenting the topics with a holistic view.