Abstract
This article presents a discourse annotation methodology based on Rhetorical Structure Theory and an empirical study of annotating a corpus of specialized medical texts in Basque. The annotation process includes two phases: segmentation and annotation of rhetorical relations. Phase one entails an initial study which leads to establishing linguistic criteria for sentence-based segmentation; a second phase focuses on annotation of rhetorical relations. After establishing discourse segments and rhetorical relations, the annotation process is analyzed and evaluated by means of the method commonly used in RST (Marcu 2000). Inconsistencies detected in the evaluation method lead the authors to redefine some criteria of the evaluation method. As a result of this work, a small annotated Basque-language corpus is provided to scientific community.
About the authors
Mikel Iruskieta is lecturer of Basque language and literature at the University of the Basque Country. His methodological interests include text parsing and knowledge and discourse representation. He has worked mainly on text analysis applications such as machine translation, text summarization and knowledge extraction.
Arantza Diaz de Ilarraza is professor of computer languages and systems at the University of the Basque Country. She received her PhD in Computer Science from the University of the Basque Country in 1990. She is a researcher in the field of Natural Language Processing. Her research interests include the development of natural language processing resources, machine translation and linguistic annotations.
Mikel Lersundi received his PhD from the University of the Basque Country; his dissertation performed a syntactic and semantic analysis of a Basque dictionary to extract lexical-semantic relations between words and to build a database containing these relations. He teaches Basque language for scientific purposes at the University of the Basque Country and specializes in lexico-semantic relations, terminology, and machine translation.
©2015 by De Gruyter Mouton