This paper focuses on detecting and measuring traces of "formulaic language". For this purpose, we test a number of computational formulae that quantify the degree to which a text type incorporates inflexible sequences of words. We assess these candidate indices using a number of reference corpora representing a wide variety of text types, both routine and creative. We adopt the concept of "phrase-frame" proposed by Fletcher (2002–2007) as a means of exploring phraseological pattern variability. To date, there have been few studies explicitly addressing this issue, with the exception of Roemer (2010). We examine ten productivity indices, including Roemer's VPR, the Herfindahl-Hirschman index, Simpson's diversity index and relative Shannon entropy. We report that a novel measure, which we term Hapaxity, best meets our criteria, and show how this index of micro-productivity (in phrase-frames) may be used to assess macro-productivity (in text registers), thus quantifying an important aspect of a register’s reliance on formulaic subsequences.
Poznan Studies in Contemporary Linguistics publishes high-quality articles representative of theory-based empirical research in contemporary synchronic linguistics and interdisciplinary studies of language from various perspectives. The journal serves as a forum for modern developments and trends in linguistics, with contributions from the world’s leading linguistic labs.