Jump to ContentJump to Main Navigation
Show Summary Details
More options …

European Journal of Applied Linguistics

Founded by Knapp, Karlfried

Editor-in-Chief: Bührig, Kristin / ten Thije, Jan D.

Agenzia Nazionale di Valutazione del Sistema Universitario e della Ricerca: Classe A

See all formats and pricing
More options …

https://www.mocoda2.de: a database and web-based editing environment for collecting and refining a corpus of mobile messaging interactions

Michael Beißwenger / Wolfgang Imo / Marcel Fladrich / Evelyn Ziegler
Published Online: 2019-09-12 | DOI: https://doi.org/10.1515/eujal-2019-0004


This paper reports on findings from the MoCoDa2 project which is creating a corpus of private CMC interactions from smartphone apps based on donations by their users. Different from other projects in the field, the project involves users not only as donators but also as editors of their data: In a web-based editing environment which provides users with access to their raw data, they are supported in pseudonymising their data and enhancing them with rich metadata on the interactional context, meta-data on the interlocutors and their relations, and on embedded media files. The resulting corpus will be a useful resource not only for quantitative but also for qualitative CMC research. For representation and annotation of the data the project builds on best practices from previous projects in the field and cooperates with a language technology partner.

Keywords: computer-mediated communication; CMC; corpora; data collection

6 References

  • Beißwenger, Michael. 2013). Das Dortmunder Chat-Korpus. Zeitschrift für germanistische Linguistik 41(1), 161–164.Google Scholar

  • Beißwenger, Michael. 2018. Internetbasierte Kommunikation und Korpuslinguistik: Repräsentation basaler Interaktionsformate in TEI. In Henning Lobin, Roman Schneider & Andreas Witt (eds.), Digitale Infrastrukturen für die germanistische Forschung, 307–349. Berlin & New York: de Gruyter.Google Scholar

  • Beißwenger, Michael, Sabine Bartsch, Stefan Evert & Kay-Michael Würzner. 2016. EmpiriST 2015: A Shared Task on the Automatic Linguistic Annotation of Computer-Mediated Communication and Web Corpora. In Proceedings of the 10th Web as Corpus Workshop (WAC-X) and the EmpiriST Shared Task (ACL Anthology W16–2606), 44–56. Stroudsburg: Association for Computational Linguistics. http://aclweb.org/anthology/W/W16/W16–2606.pdf (accessed 29 January 2019).

  • Beißwenger, Michael, Thierry Chanier, Tomaž Erjavec, Darja Fišer, Axel Herold, Nikola Lubešic, Harald Lüngen, Céline Poudat, Egon Stemle, Angelika Storrer & Ciara Wigham. 2017 a. Closing a Gap in the Language Resources Landscape: Groundwork and Best Practices from Projects on Computer-mediated Communication in four European Countries. In Lars Borin (ed.), Selected papers from the CLARIN Annual Conference 2016. Aix-en-Provence, 26–28 October 2016 (Linköping University Electronic Conference Proceedings 136), 1–18. Linköping. http://www.ep.liu.se/ecp/contents.asp?issue=136 (accessed 29 January 2019).

  • Beißwenger, Michael, Ciara Wigham, Carole Etienne, Darja Fišer, Holger Grumt Suárez, Laura Herzberg, Erhard Hinrichs, Tobias Horsmann, Natali Karlova-Bourbonus, Lothar Lemnitzer, Julien Longhi, Harald Lüngen, Lydia-Mai Ho-Dac, Christophe Parisse, Céline Poudat, Thomas Schmidt, Egon Stemle, Angelika Storrer & Torsten Zesch. 2017 b. Connecting Resources: Which Issues Have to be Solved to Integrate CMC Corpora from Heterogeneous Sources and for Different Languages? In Egon W. Stemle & Ciara R. Wigham (eds.), Proceedings of the 5th Conference on CMC and Social Media Corpora for the Humanities (cmccorpora 2017). Bolzano, Italy, Oct 03–04, 2017, 52–55. https://cmc-corpora2017.eurac.edu/proceedings/ (accessed 29 January 2019).

  • Chanier, Thierry, Céline Poudat, Benoit Sagot, Georges Antoniadis, Ciara Wigham, Linda Hriba, Julien Longhi & Djamé Seddah. 2014. The CoMeRe corpus for French: structuring and annotating heterogeneous CMC genres. Journal of language Technology and Computational Linguistics, 29(2), 1–30. https://jlcl.org/content/2-allissues/6-Heft2-2014/1Chanier-et-al.pdf (accessed 29 January 2019).

  • Dürscheid, Christa & Elisabeth Stark. 2011. sms4science: An international corpus-based texting project and the specific challenges for multilingual Switzerland. In Crispin Thurlow & Kristine Mroczek (eds.), Digital Discourse. Language in the New Media, 299–320. Oxford, UK: Oxford University Press. Google Scholar

  • Frey, Jennifer-Carmen, Egon W. Stemle & Aivars Glaznieks. 2014). Collecting Language Data of Non-Public Social Media Profiles. In Gertrud Faaß & Josef Ruppenhofer (eds.), Workshop Proceedings of the 12th Edition of the KONVENS Conference, 11–15. Hildesheim: Universitätsverlag Hildesheim. Google Scholar

  • Imo, Wolfgang. 2015. Vom Happen zum Häppchen... Die Präferenz für inkrementelle Äußerungsproduktion in internetbasierten Messengerdiensten. Networx 69, 1–35. http://www.mediensprache.net/de/networx/networx-69.aspx (accessed 29 January 2019).

  • Imo, Wolfgang (2017): Interaktionale Linguistik und die qualitative Erforschung computervermittelter Kommunikation. In Michael Beißwenger (ed.), Empirische Erforschung internetbasierter Kommunikation (Empirische Linguistik / Empirical Linguistics 9), 81–108. Berlin & New York: de Gruyter.Google Scholar

  • [JIM 2016] Medienpädagogischer Forschungsverbund Südwest (ed.). 2016. Jugend, Information, (Multi-)Media. Basisuntersuchung zum Medienumgang 12-19-Jähriger.http://www.mpfs.de/de/studien/jim-studie/2016/ (accessed 29 January 2019).

  • [KIM 2016] Medienpädagogischer Forschungsverbund Südwest (ed.). 2016. KIM-Studie. Kindheit, Internet, Medien. Basisuntersuchung zum Medienumgang 6-13-Jähriger.http://www.mpfs.de/de/studien/kim-studie/2016/ (accessed 29 January 2019).

  • Lüngen, Harald. 2017. DeReKo – Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim. Zeitschrift für germanistische Linguistik 45(1), 161–170.Google Scholar

  • Lüngen, Harald, Michael Beißwenger, Axel Herold & Angelika Storrer. 2016). Integrating corpora of computer-mediated communication in CLARIN-D: Results from the curation project ChatCorpus2CLARIN. In Stefanie Dipper, Friedrich Neubarth & Heike Zinsmeister (eds.), Proceedings of the 13th Conference on Natural Language Processing (KONVENS 2016), 156–164. https://www.linguistics.rub.de/konvens16/pub/20_konvensproc.pdf (accessed 29 January 2019).

  • Lüngen, Harald, Michael Beißwenger, Laura Herzberg & Cathrin Pichler. 2017. Anonymisation of the Dortmund Chat Corpus 2.1. In Egon W. Stemle & Ciara R. Wigham (eds.), Proceedings of the 5th Conference on CMC and Social Media Corpora for the Humanities (cmccorpora 2017). Bolzano, Italy, Oct 03–04, 2017, 21–24. https://cmc-corpora2017.eurac.edu/proceedings/ (accessed 29 January 2019).

  • [TEI P5] TEI Consortium (ed.). 2007. TEI P5: Guidelines for Electronic Text Encoding and Interchange.http://www.tei-c.org/Guidelines/P5/ (accessed 29 January 2019).

  • Ueberwasser, Simone & Elisabeth Stark. 2017. What’s up, Switzerland? A corpus-based research project in a multilingual country. Linguistik online 84(5), 105–126. https://bop.unibe.ch/linguistik-online/article/view/3849 (accessed 29 January 2019).

About the article

Published Online: 2019-09-12

Published in Print: 2019-09-02

Citation Information: European Journal of Applied Linguistics, Volume 7, Issue 2, Pages 333–344, ISSN (Online) 2192-953X, ISSN (Print) 2192-9521, DOI: https://doi.org/10.1515/eujal-2019-0004.

Export Citation

© 2019 Walter de Gruyter GmbH, Berlin/Boston.Get Permission

Comments (0)

Please log in or register to comment.
Log in