January 9, 2016
Corpus-based research into pragmatics is suffering from a distinct lack of suitably annotated corpora. This dilemma has so far generally forced researchers in corpus-based pragmatics to focus on well-known fixed expressions (e. g. discourse markers, politeness formulae, etc.) in their research, rather than being able to investigate interaction on the level of speech acts and other pragmatics-relevant features on a larger scale. This article describes a research environment that aims at remedying this problem (currently for English only) by making large-scale annotation of, and research into, speech acts and other linguistic levels possible in an efficient manner, at the same time discussing the difficulties and complexities inherent in such an endeavour. It then goes on to illustrate the efficiency of the approach, and how the resulting annotations represent an improvement over existing models in the form of a brief case study. The latter includes an illustrative discussion of the performance of the tool in annotating a subset of 100 files from the Switchboard corpus, plus a more detailed comparison of the automatically annotated version of one of the files with its original, manually annotated, version.