Accessible Unlicensed Requires Authentication Published by De Gruyter Mouton December 14, 2015

Is there a formula for formulaic language?

Richard S. Forsyth and Łukasz Grabowski


This paper focuses on detecting and measuring traces of "formulaic language". For this purpose, we test a number of computational formulae that quantify the degree to which a text type incorporates inflexible sequences of words. We assess these candidate indices using a number of reference corpora representing a wide variety of text types, both routine and creative. We adopt the concept of "phrase-frame" proposed by Fletcher (2002–2007) as a means of exploring phraseological pattern variability. To date, there have been few studies explicitly addressing this issue, with the exception of Roemer (2010). We examine ten productivity indices, including Roemer's VPR, the Herfindahl-Hirschman index, Simpson's diversity index and relative Shannon entropy. We report that a novel measure, which we term Hapaxity, best meets our criteria, and show how this index of micro-productivity (in phrase-frames) may be used to assess macro-productivity (in text registers), thus quantifying an important aspect of a register’s reliance on formulaic subsequences.

Received: 2015-4-22
Revised: 2015-9-23
Accepted: 2015-10-13
Published Online: 2015-12-14

©2015 by Walter de Gruyter Berlin/Boston