Accessible Unlicensed Requires Authentication Published by De Gruyter Mouton January 9, 2016

DART – The dialogue annotation and research tool

Martin Weisser

Abstract

Corpus-based research into pragmatics is suffering from a distinct lack of suitably annotated corpora. This dilemma has so far generally forced researchers in corpus-based pragmatics to focus on well-known fixed expressions (e. g. discourse markers, politeness formulae, etc.) in their research, rather than being able to investigate interaction on the level of speech acts and other pragmatics-relevant features on a larger scale. This article describes a research environment that aims at remedying this problem (currently for English only) by making large-scale annotation of, and research into, speech acts and other linguistic levels possible in an efficient manner, at the same time discussing the difficulties and complexities inherent in such an endeavour. It then goes on to illustrate the efficiency of the approach, and how the resulting annotations represent an improvement over existing models in the form of a brief case study. The latter includes an illustrative discussion of the performance of the tool in annotating a subset of 100 files from the Switchboard corpus, plus a more detailed comparison of the automatically annotated version of one of the files with its original, manually annotated, version.

References

Adolphs, Svenja. 2008. Corpus and context: Investigating pragmatic functions in spoken discourse. Amsterdam: John Benjamins Publishing Company. Search in Google Scholar

Aijmer, Karin. 1996. Conversational routines in English: Convention and creativity. London: Longman. Search in Google Scholar

Archer, Dawn, Karin Aijmer & Anne Wichmann (eds.). 2012. Pragmatics: An advanced resource book for students. London & New York: Routledge. Search in Google Scholar

Allen, James & Mark Core. 1997. Draft of DAMSL: Dialog act markup in several layers. ftp://ftp.cs.rochester.edu/pub/packages/dialog-annotation/manual.ps.gz (accessed 04 October 2014). Search in Google Scholar

Anderson, Anne, Miles Bader, Ellen Bard, Elizabeth Boyle, Gwyneth Doherty, Simon Garrod, Stephen Isard, Jacqueline Kowtko, Jan McAllister, Jim Miller, Catherine Sotillo, Henry Thompson & Regina Weinert. 1991. The HCRC Map Task corpus. Language and Speech 34(4). 351–366. Search in Google Scholar

Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad & Ed Finegan. 1999. Longman grammar of spoken and written English. London: Longman. Search in Google Scholar

Blutner, Reinhard. 2004. Pragmatics and the lexicon. In L. Horn & G. Ward (eds.), The handbook of pragmatics. Oxford: Blackwell. 488–514. Search in Google Scholar

Bunt, Harry, Jan Alexandersson, Jean Carletta, Jae-Woong Choe, Alex Fang, Koiti Hasida, Kiyong Lee, Volha Pethukova, Andrei Popescu-Belis, Laurent Romary, Claudia Soria & David Traum. 2010. Towards and ISO standard for dialogue annotation. In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC 2010). 2548–2555. Search in Google Scholar

DeRose, Steven. 1988. Grammatical category disambiguation by statistical optimization. Computational Linguistics 14(1). 31–39. Search in Google Scholar

Edwards, Jane. 1993. Principles and contrasting systems of discourse transcription. In Jane Edwards & Martin Lampert. (eds.), Talking data: Transcription and coding in discourse research. Hillsdale, NJ: Lawrence Erlbaum Associates. Search in Google Scholar

Garside, Roger. 1987. The CLAWS word-tagging system. In Roger Garside, Geoffrey Leech & Geoffrey Sampson. (eds.), The Computational analysis of English: A corpus-based approach. London: Longman. Search in Google Scholar

Green, Georgia. 1989. Pragmatics and natural language understanding. Hillsdale, NJ: Lawrence Erlbaum Associates. Search in Google Scholar

Horn, Laurence & Gregory Ward (eds.). 2004. The handbook of pragmatics. Oxford: Blackwell. Search in Google Scholar

Jurafsky, Daniel, Elizabeth Shriberg & Debra Biasca. 1997. Switchboard SWBD-DAMSL shallow-discourse-function annotation coder manual. http://www.icsi.berkeley.edu/pubs/speech/tr-97-02.pdf (accessed 04 October 2014). Search in Google Scholar

Klein, Marion. 1999. Standardisation Efforts on the level of dialogue act in the MATE project. In Proceedings of the ACL Workshop “Towards Standards and Tools for Discourse Tagging”. 35–41. Search in Google Scholar

Leech, Geoffrey, Martin Weisser, Andrew Wilson, & Martine Grice. 2000. Survey and guidelines for the representation and annotation of dialogue. In Dafydd Gibbon, Inge Mertins & Roger Moore. (eds.), Handbook of Multimodal and Spoken Language Systems. Dordrecht: Kluwer Academic Publishers. 1–101. Search in Google Scholar

Leech, Geoffrey, Paul Rayson & Andrew Wilson. 2001. Word frequencies in written and spoken English. London: Longman. Search in Google Scholar

Leech, Geoffrey & Martin Weisser. 2003. Generic speech act annotation for task-oriented dialogues. In Dawn Archer, Paul Rayson, Andrew Wilson & Tony McEnery (eds.), Proceedings of the Corpus Linguistics 2003 Conference. Lancaster University: UCREL Technical Papers, vol. 16. Search in Google Scholar

Leech, Geoffrey & Martin Weisser. 2013. The SPAADIA Annotation Scheme. http://martinweisser.org/publications/SPAADIA_Annotation_Scheme.pdf Search in Google Scholar

Manning, Christopher. 2011. Part-of-speech tagging from 97% to 100%: Is it time for some linguistics? In Alexander Gelbukh (ed.), Computational linguistics and intelligent text processing. Proceedings of the 12th International Conference, CICLing 2011, Tokyo, Japan, Part I. 171–189. Heidelberg: Springer. Search in Google Scholar

Taylor, Anne. 1995. Dysfluency annotation stylebook for the Switchboard Corpus. Linguistic Data Consortium. https://catalog.ldc.upenn.edu/desc/addenda/LDC1999T42/DFLGUIDE.PDF (accessed 4 October 2014). Search in Google Scholar

Schiffrin, Deborah. 1987. Discourse markers. Cambridge: Cambridge University Press. Search in Google Scholar

Schiffrin, Deborah. 1994. Approaches to discourse. Oxford: Blackwell. Search in Google Scholar

Searle, John. (1969). Speech Acts: An essay in the philosophy of language. Cambridge: Cambridge University Press. Search in Google Scholar

Thompson, Henry, Anne Anderson & Miles Bader. 1995. Publishing a spoken and written corpus on CD-ROM: the HCRC Map Task experience. In Geoffrey Leech, Greg Myers & Jenny Thomas (eds.), Spoken English on computer: Transcription, mark-up and application. London: Longman. 168–180. Search in Google Scholar

Weisser, Martin. 2002. Determining generic elements in dialogue. Language, Information and Lexicography 12–13. 131–156. 25th, December, 2003. Institute of Language and Information Studies, Yonsei University. Search in Google Scholar

Weisser, Martin. 2003. SPAACy – A semi-automated tool for annotating dialogue acts. International Journal of Corpus Linguistics 8(1). 63–74. Search in Google Scholar

Weisser, Martin. 2004. Tagging dialogues in SPAACy. In Jean Véronis (ed.), Traitement Automatique des Langues: Le traitement automatique des corpus oraux. 45, 131–157. Cachan: Lavoisier. Search in Google Scholar

Weisser, Martin. 2009. Essential programming for Linguistics. Edinburgh Advanced Textbooks in Linguistics. Edinburgh: EUP. Search in Google Scholar

Weisser, Martin. 2010. Annotating dialogue corpora semi-automatically: A corpus-based approach to pragmatics. Unpublished habilitation, University of Bayreuth. Search in Google Scholar

Weisser, Martin. 2014a. The Dialogue Annotation and Research Tool (DART) (Version 1.0) [Computer Software]. http://martinweisser.org/ling_soft.html#DART. Search in Google Scholar

Weisser, Martin. 2014b. The Simple Corpus Tool (Version 1.21) [Computer Software]. http://martinweisser.org/ling_soft.html#viewer. Search in Google Scholar

Weisser, Martin. 2014c. Pragmatic Annotation. In Karin Aijmer & Christoph Rühlemann (eds.), Corpus pragmatics: A handbook. Cambridge: Cambridge University Press. 84–113. Search in Google Scholar

Weisser, Martin. forthcoming. 2016. Profiling agents & callers: a dual comparison across speaker roles and British vs. American English. In Pickering, Lucy, Friginal, Eric, & Staples, Shelley. (eds.), Talking at work: Corpus-based explorations of workplace discourse. London: Palgrave Macmillan. Search in Google Scholar

Published Online: 2016-1-9
Published in Print: 2016-10-1

©2016 by De Gruyter Mouton