Abstract
In existing research on syntactic alternations such as the dative alternation, (give her the apple vs. give the apple to her), the linguistic data is often analysed with the help of logistic regression models. In this article, we evaluate the use of logistic regression for this type of research, and present two different approaches: Bayesian Networks and Memory-based learning. For the Bayesian Network, we use the higher-level semantic features suggested in the literature, while we limit ourselves to lexical items in the memory-based approach. We evaluate the suitability of the three approaches by applying them to a large data set (>11,000 instances) extracted from the British National Corpus, and comparing their quality in terms of classification accuracy, their interpretability in the context of linguistic research, and their actual classification of individual cases. Our main finding is that the classifications are very similar across the three approaches, also when employing lexical items instead of the higher-level features, because most of the alternation is determined by the verb and the length of the two objects (here: her and the apple).
About the authors
Daphne Theijssen is a former PhD student at the Centre for Language Studies at Radboud University Nijmegen. Her doctoral thesis concerns the use of computational models to explain the dative alternation in English.
Louis ten Bosch is assistant professor at the Centre for Language Studies at Radboud University Nijmegen. His research interests include speech technology and computational models of first language acquisition.
Lou Boves is professor at the Centre for Language Studies at Radboud University Nijmegen. His research interests include speech technology and computational models of human learning.
Bert Cranen is assistant professor at the Centre for Language Studies at Radboud University Nijmegen. His research interests include speech technology and computational modelling of human speech production and perception.
Hans van Halteren is assistant professor in the Department of Linguistics at Radboud University Nijmegen. His research interests include computational linguistics, social media corpora and machine learning.
©[2013] by Walter de Gruyter Berlin Boston