Is there a formula for formulaic language?

Richard S. Forsyth 1  and Łukasz Grabowski 2
  • 1 Independent researcher
  • 2 Opole University, Opole, Poland


This paper focuses on detecting and measuring traces of "formulaic language". For this purpose, we test a number of computational formulae that quantify the degree to which a text type incorporates inflexible sequences of words. We assess these candidate indices using a number of reference corpora representing a wide variety of text types, both routine and creative. We adopt the concept of "phrase-frame" proposed by Fletcher (2002–2007) as a means of exploring phraseological pattern variability. To date, there have been few studies explicitly addressing this issue, with the exception of Roemer (2010). We examine ten productivity indices, including Roemer's VPR, the Herfindahl-Hirschman index, Simpson's diversity index and relative Shannon entropy. We report that a novel measure, which we term Hapaxity, best meets our criteria, and show how this index of micro-productivity (in phrase-frames) may be used to assess macro-productivity (in text registers), thus quantifying an important aspect of a register’s reliance on formulaic subsequences.

Purchase article
Get instant unlimited access to the article.
Log in
Already have access? Please log in.

Journal + Issues

Poznan Studies in Contemporary Linguistics publishes high-quality articles representative of theory-based empirical research in contemporary synchronic linguistics and interdisciplinary studies of language from various perspectives. The journal serves as a forum for modern developments and trends in linguistics, with contributions from the world’s leading linguistic labs.