The present paper presents a multimodel inference approach to linguistic variation, expanding on prior work by Kuperman and Bresnan (2012). We argue that corpus data often present the analyst with high model selection uncertainty. This uncertainty is inevitable given that language is highly redundant: every feature is predictable from multiple other features. However, uncertainty involved in model selection is ignored by the standard method of selecting the single best model and inferring the effects of the predictors under the assumption that the best model is true. Multimodel inference avoids committing to a single model. Rather, we make predictions based on the entire set of plausible models, with contributions of models weighted by the models' predictive value. We argue that multimodel inference is superior to model selection for both the I-Language goal of inferring the mental grammars that generated the corpus, and the E-Language goal of predicting characteristics of future speech samples from the community represented by the corpus. Applying multimodel inference to the classic problem of English auxiliary contraction, we show that the choice between multimodel inference and model selection matters in practice: the best model may contain predictors that are not significant when the full set of plausible models is considered, and may omit predictors that are significant considering the full set of models. We also contribute to the study of English auxiliary contraction. We document the effects of priming, contextual predictability, and specific syntactic constructions and provide evidence against effects of phonological context.
About the authors
Danielle Barth is currently pursuing her PhD at the University of Oregon. She is primarily interested in patterns of reduction, how they are acquired and how they interact with information statistics. She also works with speakers in Matukar, Papua New Guinea on language documentation and revitalization efforts. In part because of this, she is interested in both analytic and archiving aspects of corpus linguistics.
Vsevolod Kapatsinski received his M.A. in Linguistics at the University of New Mexico in 2005 and his PhD in Linguistics and Cognitive Science at Indiana University in 2009. He is currently an assistant professor at the Department of Linguistics at the University of Oregon. The author's research interests lie in experimental and corpus linguistics. He is mainly concerned with describing and explaining which statistical patterns in linguistic data are tracked by language learners and what units and generalizations they acquire and use in processing.
© 2017 Walter de Gruyter GmbH, Berlin/Boston