Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter Mouton November 4, 2005

Language is never, ever, ever, random

Adam Kilgarriff
From the journal

Abstract

Language users never choose words randomly, and language is essentially non-random. Statistical hypothesis testing uses a null hypothesis, which posits randomness. Hence, when we look at linguistic phenomena in corpora, the null hypothesis will never be true. Moreover, where there is enough data, we shall (almost) always be able to establish that it is not true. In corpus studies, we frequently do have enough data, so the fact that a relation between two phenomena is demonstrably non-random, does not support the inference that it is not arbitrary. We present experimental evidence of how arbitrary associations between word frequencies and corpora are systematically non-random. We review literature in which hypothesis testing has been used, and show how it has often led to unhelpful or misleading results.

:
Published Online: 2005-11-04
Published in Print: 2005-11-18

Walter de Gruyter GmbH & Co. KG

Scroll Up Arrow