Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Corpus Linguistics and Linguistic Theory

Founded by Gries, Stefan Th. / Stefanowitsch, Anatol

Ed. by Wulff, Stefanie

IMPACT FACTOR 2017: 1.200
5-year IMPACT FACTOR: 1.386

CiteScore 2017: 0.80

SCImago Journal Rank (SJR) 2017: 0.288
Source Normalized Impact per Paper (SNIP) 2017: 0.930

See all formats and pricing
More options …

ΔP as a measure of collocation strength

Considerations based on analyses of hesitation placement in spontaneous speech

Ulrike Schneider
  • Corresponding author
  • Department of English and Linguistics, University of Mainz, Philosophicum, Jakob-Welder-Weg 18, 55128 Mainz, Germany
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2018-03-29 | DOI: https://doi.org/10.1515/cllt-2017-0036


This paper explores the proposed benefits of ΔP (delta P) as a measure of collocation strength. Its focus is on contrasting ΔP with other, more commonly used, association measures, particularly transitional probabilities, but also mutual information and Lexical Gravity G. To this end, first the strong correlation between ΔP and transitional probability is illustrated with the help of two exemplary corpora. This is followed by an analysis of hesitation placement in spontaneous spoken English, based on the assumption that hesitations will not be placed within strong collocations. Results show that, despite their strong similarity, in some contexts ΔP is more predictive of hesitation placement than transitional probability. Yet neither ΔP nor any of the other association measures emerges as the universally best predictor. On the basis of these results, it is suggested that studies should always rely on several association measures.

Keywords: collocations; association measures; ΔP (delta P); bigrams; hesitations


  • Allan, Lorraine G. 1980. A note on measurement of contingency between two binary variables in judgement tasks. Bulletin of the Psychonomic Society 15(3). 147–149.CrossrefGoogle Scholar

  • Arnon, Inbal & Neal Snider. 2010. More than words: Frequency effects for multi-word phrases. Journal of Memory and Language 62. 67–82.CrossrefGoogle Scholar

  • Baayen, R. Harald. 2008. Analyzing linguistic data: A practical introduction to statistics using R. Cambridge: Cambridge University Press.Google Scholar

  • Baayen, R. Harald. 2009. LanguageR: Data sets and functions with ‘Analyzing Linguistic Data: A practical introduction to statistics’. R package version 0.955. http://CRAN.R-project.org/package=languageR.

  • Beattie, Geoffrey & Brian L. Butterworth. 1979. Contextual probability and word frequency as determinants of pauses and errors in spontaneous speech. Language and Speech 22(3). 201–211.CrossrefGoogle Scholar

  • Beckner, Clay, Richard Blythe, Morten H. Joan Bybee, William Croft Christiansen, Nick C. Ellis, John Holland, Ke Jinyun, Diane Larsen-Freeman & Tom Schoeneman. 2009. Language is a complex adaptive system: Position paper. Language Learning 59(Supplement 1). 1–26.CrossrefGoogle Scholar

  • Bell, Alan, Daniel Jurafsky, Eric Fosler-Lussier, Cynthia Girand, Michelle Gregory & Daniel Gildea. 2003. Effects of disfluencies, predictability, and utterance position on word form variation in English conversation. Journal of the Acoustical Society of America 113(2). 1001–1024.CrossrefGoogle Scholar

  • Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad & Edward Finegan. 1999. Longman grammar of spoken and written English. Harlow: Pearson.Google Scholar

  • Bod, Rens. 2010. Probabilistic linguistics. In Bernd Heine & Heiko Narrog (eds.), The Oxford handbook of linguistic analysis, 633–662. Oxford: Oxford University Press.Google Scholar

  • Bresnan, Joan & Jessica Spencer. 2013. Frequency and variation in English subject-verb contraction. Stanford, CA: Stanford University Department of Linguistics and Center for the Study of Language and Information.Google Scholar

  • Brezina, Vaclav, Tony McEnery & Stephen Wattam. 2015. Collocations in context. A new perspective on collocational networks. International Journal of Corpus Linguistics 20(2). 139–173.CrossrefGoogle Scholar

  • Bybee, Joan. 1998. The emergent lexicon. Chicago Linguistics Society 34: The Panels. 421–435.Google Scholar

  • Bybee, Joan. 2002. Phonological evidence for the exemplar storage of multiword sequences. Studies in Second Language Acquisition 24(2). 215–221.Google Scholar

  • Bybee, Joan. 2006. From usage to grammar: The mind’s response to repetition. Language 82(4). 711–733.CrossrefGoogle Scholar

  • Bybee, Joan. 2007a. Frequency of use and the organization of language. Oxford: Oxford University Press.Google Scholar

  • Bybee, Joan. 2007b. Sequentiality as the basis of constituent structure. In Joan Bybee (ed.), Frequency of use and the organisation of language, 313–335. Oxford: Oxford University Press. (Reprinted from Talmy Givón & Bertram F. Malle (eds.), The evolution of language out of pre-language. Amsterdam: John Benjamins. 2002. 107–132.).Google Scholar

  • Bybee, Joan. 2010. Language, usage, and cognition. Cambridge: Cambridge University Press.Google Scholar

  • Bybee, Joan & James L. McClelland. 2005. Alternatives to the combinatorial paradigm of linguistic theory based on domain general principles of human cognition. The Linguistic Review 22. 381–410.Google Scholar

  • Bybee, Joan & Joanne Scheibman. 2007. The effect of usage on degrees of constituency. The reduction of don’t in English. In Joan Bybee (ed.), Frequency of use and the organisation of language, 294–312. Oxford: Oxford University Press. (Reprinted from Linguistics 37(4). 1999. 575–596.).Google Scholar

  • Calhoun, Sasha, Jean Carletta, Jason Brenier, Neil Mayo, Daniel Jurafsky, Mark Steedman & David Beaver. 2010. The NXT-format switchboard corpus: A rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue. Language Resources and Evaluation Journal 44. 387–419.CrossrefGoogle Scholar

  • Clark, Herbert H. & Jean E. Fox Tree. 2002. Using uh and um in spontaneous speaking. Cognition 84. 73–110.CrossrefGoogle Scholar

  • Croft, William. 2001. Radical construction grammar: Syntactic theory in typological perspective. Oxford: Oxford University Press.Google Scholar

  • Daudaravičius, Vidas & Marcinkevičienė. Rūta. 2004. Gravity counts for the boundaries of collocations. International Journal of Corpus Linguistics 9(2). 321–348.CrossrefGoogle Scholar

  • Eikmeyer, Hans-Jürgen, Ulrich Schade, Marc Kupietz & Uwe Laubenstein. 1999. A connectionist view of language production. In Rolf Klabunde & Christiane Von Stutterheim (eds.), Representations and processes in language production, 205–236. Wiesbaden: Deutscher Universitätsverlag.Google Scholar

  • Ellis, Nick C. 2006. Language acquisition as rational contingency learning. Applied Linguistics 27(1). 1–24.CrossrefGoogle Scholar

  • Ellis, Nick C. & Fernando Ferreira-Junior. 2009. Constructions and their acquisition. Islands and the distinctiveness of the occupancy. Annual Review of Cognitive Linguistics 7. 187–220.Google Scholar

  • Ellis, Nick C., Rita Simpson-Vlach & Carson Maynard. 2008. Formulaic language in native and second language speakers: Psycholinguistics, corpus linguistics and TESOL. TESOL Quarterly 24(3). 375–396.Google Scholar

  • Elman, Jeffrey L. 1990. Finding structure in time. Cognitive Science 14. 179–211.CrossrefGoogle Scholar

  • Evert, Stefan. 2004. The statistics of co-occurrences: Word pairs and collocations. Stuttgart: Institut für maschinelle Sprachverarbeitung, University of Stuttgart dissertation.Google Scholar

  • Fillmore, Charles J., Paul Kay & Mary Catherine O’Connor. 2003. Regularity and idiomaticity in grammatical constructions: The case of let alone. In Michael Tomasello (ed.), The new psychology of language: Cognitive and functional approaches to language structure, 243–270. Mahwah, NJ: Lawrence Erlbaum.Google Scholar

  • Fried, Mirjam & Östman. Jan-Ola. 2004. Construction grammar: A thumbnail sketch. In Mirjam Fried & Jan-Ola Östman (eds.), Construction grammar in a cross-language perspective, 11–86. Amsterdam/Philadelphia: John Benjamins.Google Scholar

  • Frisson, Steven, Keith Rayner & Martin J. Pickering. 2005. Effects of contextual predictability and transitional probability on eye movements during reading. Journal of Experimental Psychology: Learning, Memory and Cognition 31(5). 862–877.Google Scholar

  • Fung, Loretta & Ronald Carter. 2007. Discourse markers and spoken English: Native and learner use in pedagogic settings. Applied Linguistics 28(3). 410–439.CrossrefGoogle Scholar

  • Godfrey, John J., Edward Holliman & McDaniel. Jane 1992. SWITCHBOARD: Telephone speech corpus for research and development. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1992 1. I-517–I-20.Google Scholar

  • Goldberg, Adele. 2005. Constructions at work: The nature of generalization in language. Oxford: Oxford University Press.Google Scholar

  • Goldman-Eisler, Frieda. 1968. Psycholinguistics: Experiments in spontaneous speech. New York: Academic Press.Google Scholar

  • Gregory, Michelle L., William D. Raymond, Alan Bell, Eric Fosler-Lussier & Daniel Jurafsky. 1999. The effects of collocational strength and contextual predictability in lexical production. Communication and Linguistic Studies 35. 151–166.Google Scholar

  • Gries, Stefan Th. 2013. 50-something years of work on collocations: What is or should be next …. International Journal of Corpus Linguistics 18(1). 137–165.CrossrefGoogle Scholar

  • Gries, Stefan Th. 2014. Coll.analysis 3.5. A script for R to compute perform collostructional analyses. http://www.linguistics.ucsb.edu/faculty/stgries/teaching/groningen/index.html.

  • Gries, Stefan Th. 2015a. More (old and new) misunderstandings of collostruction analysis: On Schmidt & Küchenhoff (2013). Cognitive Linguistics 26(3). 505–536.Google Scholar

  • Gries, Stefan Th. 2015b. The role of quantitative methods in cognitive linguistics. In Jocelyne Daems, Eline Zenner, Kris Heylen, Dirk Speelman & Hubert Cuyckens (eds.), Change of paradigms – New paradoxes. Recontextualizing language and linguistics. Berlin/Boston: De Gruyter Mouton.Google Scholar

  • Gries, Stefan Th. & Joybrato Mukherjee. 2010. Lexical gravity across varieties of English: An ICE-based study of n-Grams in Asian Englishes. International Journal of Corpus Linguistics 15(4). 520–548.CrossrefGoogle Scholar

  • Hothorn, Torsten, Kurt Hornik & Achim Zeileis. 2006. Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics 15(3). 651–674.CrossrefGoogle Scholar

  • Jenkins, Herbert M. & William C. Ward. 1965. Judgement of contingency between responses and outcomes. Psychological Monographs 79(1). 1–17.CrossrefGoogle Scholar

  • Jucker, Andreas. 1993. The discourse marker well: A relevance-theoretical account. Journal of Pragmatics 19. 435–452.CrossrefGoogle Scholar

  • Jurafsky, Daniel, Alan Bell, Eric Fosler-Lussier, Cynthia Girand & William D. Raymond. 1998. Reduction of English function words in Switchboard. Proceedings of the International Conference of Spoken Language Processing, Sydney. 1–4.Google Scholar

  • Jurafsky, Daniel & James H. Martin. 2008. Speech and language processing. An introduction to natural language processing, computational linguistics, and speech recognition. Pearson/Prentice Hall International.Google Scholar

  • Kapatsinski, Vsevolod M. 2005. Measuring the relationship of structure to use: Determinants of the extent of recycle in repetition repair. Berkeley Linguistics Society 30. 481–492.Google Scholar

  • Kapatsinski, Vsevolod M. & Joshua Radicke. 2009. Frequency and the emergence of prefabs: Evidence from monitoring. In Roberta Corrigan, Edith A. Moravcsik, Hamid Ouali & Kathleen M. Wheatley (eds.), Formulaic language. Vol. 2: Acquisition, loss, psychological reality, functional explanations, 499–520. Amsterdam/Philadelphia: John Benjamins.Google Scholar

  • Langacker, Ronald W. 2000. A dynamic usage-based model. In Suzanne Kemmer & Michael Barlow (eds.), Usage-based models of language, 1–63. Stanford, CA: CSLI Publications.Google Scholar

  • Levey, Stephen. 2006. The sociolinguistic distribution of discourse marker like in preadolescent speech. Multilingua 25. 413–441.CrossrefGoogle Scholar

  • Maclay, Howard & Charles E. Osgood. 1959. Hesitation phenomena in spontaneous English speech. Word 15. 19–44.CrossrefGoogle Scholar

  • Manning, Christopher D. & Hinrich Schütze. 1999. Foundations of statistical natural language processing. Cambridge, MA: MIT Press.Google Scholar

  • Müller, Simone. 2005. Discourse markers in native and non-native English discourse. Amsterdam/Philadelphia: John Benjamins.Google Scholar

  • NXT Switchboard Corpus Public Release. 2008. Philadelphia: Linguistic Data Consortium. Catalog #LDC2009T26.Google Scholar

  • Oakes, Michael. 1998. Statistics for corpus linguistics. Edinburgh: Edinburgh University Press.Google Scholar

  • Onnis, Luca & Eric Thiessen. 2013. Language experience changes subsequent learning. Cognition 162(2). 168–284.Google Scholar

  • Pecina, Pavel. 2010. Lexical association measures and collocation extraction. Language Resources and Evaluation 44(1/2). 137–158.CrossrefGoogle Scholar

  • Perruchet, Pierre & Sebastien Pacton. 2006. Implicit learning and statistical learning: One phenomenon, two approaches. TRENDS in Cognitive Sciences 10(5). 233–238.PubMedCrossrefGoogle Scholar

  • Phillips, Martin K. 1983. Lexical macrostructure in science text. Birmingham: University of Birmingham dissertation.Google Scholar

  • R Development Core Team. 2009. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. http://www.R-project.org.

  • Reali, Florencia & Morten H. Christiansen. 2007. Processing of relative clauses is made easier by frequency of occurrence. Journal of Memory and Language 57. 1–23.CrossrefGoogle Scholar

  • Rescorla, Robert A. 1968. Probability of shock in the presence and absence of CS in fear conditioning. Journal of Comparative Physiological Psychology 66. 1–5.CrossrefGoogle Scholar

  • Rumelhart, David E. & James L. McClelland (eds.). 1986. Parallel distributed processing: Explorations in the microstructure of cognition. Foundations, vol. 1. Cambridge, MA/London: MIT Press/Bradford.Google Scholar

  • Schmid, Hans-Jörg & Küchenhoff. Helmut. 2013. Collostructional analysis and other ways of measuring lexicogrammatical attraction: Theoretical premises, practical problems and cognitive underpinnings. Cognitive Linguistics 24(3). 531–577.Google Scholar

  • Schneider, Ulrike. 2014. Frequency, chunks and hesitations. A usage-based analysis of chunking in English. Freiburg: NIHIN Studies. https://freidok.uni-freiburg.de/data/9793

  • Schneider, Ulrike. 2016. Chunking as a factor determining the placement of hesitations. A corpus-based study of spoken English. In Heike Behrens & Stefan Pfänder (eds.), Frequency effects in language: What counts in language processing, acquisition and change, 61–89. Berlin/New York: Mouton De Gruyter.Google Scholar

  • Shanks, David R. 1995. The psychology of associative learning. Cambridge: Cambridge University Press.Google Scholar

  • Shriberg, Elizabeth & Andreas Stolcke. 1996. Word predictability after hesitations: A corpus-based study. Proceedings of the International Conference on Spoken Language Processing. 1868–1871.Google Scholar

  • Strobl, Carolin, Anne-Laure Boulestreix, Thomas Kneib, Thomas Augustin & Achim Zeileis. 2008. Conditional variable importance for random forests. BMC Bioinformatics 9. 307.CrossrefPubMedGoogle Scholar

  • Strobl, Carolin, Anne-Laure Boulestreix, Achim Zeileis & Torsten Hothorn. 2007. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics 8. 25.CrossrefGoogle Scholar

  • Strobl, Carolin, James Malley & Gerhard Tutz. 2009. An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods 14(4). 323–348.CrossrefPubMedGoogle Scholar

  • Tagliamonte, Sali A. & R. Harald Baayen. 2012. Models, forests, and trees of York English: Was/were variation as a case study for statistical practice. Language Variation and Change 24. 135–178.CrossrefGoogle Scholar

  • Tily, Harry, Susanne Gahl, Inbal Arnon, Neal Snider, Anubha Kothari & Joan Bresnan. 2009. Syntactic probabilities affect pronunciation variation in spontaneous speech. Language and Cognition 1(2). 147–165.CrossrefGoogle Scholar

  • Vogel Sosa, Anna & James MacFarlane. 2002. Evidence for frequency-based constituents in the mental lexicon: Collocations involving the word. Journal of Brain and Language 83. 227–236.CrossrefGoogle Scholar

  • Wahl, Alexander. 2015. Intonation unit boundaries and the storage of bigrams. Evidence from bidirectional and directional association measures. Review of Cognitive Linguistics 13(1). 191–219.CrossrefGoogle Scholar

  • Ward, William C. & Herbert M. Jenkins. 1965. The display of information and the judgement of contingency. Canadian Journal of Experimental Psychology 19(3). 231–241.CrossrefGoogle Scholar

  • Wiechmann, Daniel. 2008. On the computation of collostruction strength: Testing measures of association as expressions of lexical bias. Corpus Linguistics and Linguistic Theory 4(2). 253–290.Google Scholar

  • Wray, Alison. 2002. Formulaic Language and the Lexicon. Cambridge: Cambridge University Press.Google Scholar

About the article

Published Online: 2018-03-29

Citation Information: Corpus Linguistics and Linguistic Theory, ISSN (Online) 1613-7035, ISSN (Print) 1613-7027, DOI: https://doi.org/10.1515/cllt-2017-0036.

Export Citation

© 2018 Walter de Gruyter GmbH, Berlin/Boston.Get Permission

Comments (0)

Please log in or register to comment.
Log in