Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter Mouton March 29, 2018

ΔP as a measure of collocation strength

Considerations based on analyses of hesitation placement in spontaneous speech

Ulrike Schneider


This paper explores the proposed benefits of ΔP (delta P) as a measure of collocation strength. Its focus is on contrasting ΔP with other, more commonly used, association measures, particularly transitional probabilities, but also mutual information and Lexical Gravity G. To this end, first the strong correlation between ΔP and transitional probability is illustrated with the help of two exemplary corpora. This is followed by an analysis of hesitation placement in spontaneous spoken English, based on the assumption that hesitations will not be placed within strong collocations. Results show that, despite their strong similarity, in some contexts ΔP is more predictive of hesitation placement than transitional probability. Yet neither ΔP nor any of the other association measures emerges as the universally best predictor. On the basis of these results, it is suggested that studies should always rely on several association measures.


Allan, Lorraine G. 1980. A note on measurement of contingency between two binary variables in judgement tasks. Bulletin of the Psychonomic Society 15(3). 147–149.10.3758/BF03334492Search in Google Scholar

Arnon, Inbal & Neal Snider. 2010. More than words: Frequency effects for multi-word phrases. Journal of Memory and Language 62. 67–82.10.1016/j.jml.2009.09.005Search in Google Scholar

Baayen, R. Harald. 2008. Analyzing linguistic data: A practical introduction to statistics using R. Cambridge: Cambridge University Press.10.1017/CBO9780511801686Search in Google Scholar

Baayen, R. Harald. 2009. LanguageR: Data sets and functions with ‘Analyzing Linguistic Data: A practical introduction to statistics’. R package version 0.955. in Google Scholar

Beattie, Geoffrey & Brian L. Butterworth. 1979. Contextual probability and word frequency as determinants of pauses and errors in spontaneous speech. Language and Speech 22(3). 201–211.10.1177/002383097902200301Search in Google Scholar

Beckner, Clay, Richard Blythe, Morten H. Joan Bybee, William Croft Christiansen, Nick C. Ellis, John Holland, Ke Jinyun, Diane Larsen-Freeman & Tom Schoeneman. 2009. Language is a complex adaptive system: Position paper. Language Learning 59(Supplement 1). 1–26.10.1111/j.1467-9922.2009.00533.xSearch in Google Scholar

Bell, Alan, Daniel Jurafsky, Eric Fosler-Lussier, Cynthia Girand, Michelle Gregory & Daniel Gildea. 2003. Effects of disfluencies, predictability, and utterance position on word form variation in English conversation. Journal of the Acoustical Society of America 113(2). 1001–1024.10.1121/1.1534836Search in Google Scholar

Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad & Edward Finegan. 1999. Longman grammar of spoken and written English. Harlow: Pearson.Search in Google Scholar

Bod, Rens. 2010. Probabilistic linguistics. In Bernd Heine & Heiko Narrog (eds.), The Oxford handbook of linguistic analysis, 633–662. Oxford: Oxford University Press.Search in Google Scholar

Bresnan, Joan & Jessica Spencer. 2013. Frequency and variation in English subject-verb contraction. Stanford, CA: Stanford University Department of Linguistics and Center for the Study of Language and Information.Search in Google Scholar

Brezina, Vaclav, Tony McEnery & Stephen Wattam. 2015. Collocations in context. A new perspective on collocational networks. International Journal of Corpus Linguistics 20(2). 139–173.10.1075/ijcl.20.2.01breSearch in Google Scholar

Bybee, Joan. 1998. The emergent lexicon. Chicago Linguistics Society 34: The Panels. 421–435.10.1093/acprof:oso/9780195301571.003.0013Search in Google Scholar

Bybee, Joan. 2002. Phonological evidence for the exemplar storage of multiword sequences. Studies in Second Language Acquisition 24(2). 215–221.10.1017/S0272263102002061Search in Google Scholar

Bybee, Joan. 2006. From usage to grammar: The mind’s response to repetition. Language 82(4). 711–733.10.1353/lan.2006.0186Search in Google Scholar

Bybee, Joan. 2007a. Frequency of use and the organization of language. Oxford: Oxford University Press.10.1093/acprof:oso/9780195301571.001.0001Search in Google Scholar

Bybee, Joan. 2007b. Sequentiality as the basis of constituent structure. In Joan Bybee (ed.), Frequency of use and the organisation of language, 313–335. Oxford: Oxford University Press. (Reprinted from Talmy Givón & Bertram F. Malle (eds.), The evolution of language out of pre-language. Amsterdam: John Benjamins. 2002. 107–132.).10.1075/tsl.53.07bybSearch in Google Scholar

Bybee, Joan. 2010. Language, usage, and cognition. Cambridge: Cambridge University Press.10.1017/CBO9780511750526Search in Google Scholar

Bybee, Joan & James L. McClelland. 2005. Alternatives to the combinatorial paradigm of linguistic theory based on domain general principles of human cognition. The Linguistic Review 22. 381–410.10.1515/tlir.2005.22.2-4.381Search in Google Scholar

Bybee, Joan & Joanne Scheibman. 2007. The effect of usage on degrees of constituency. The reduction of don’t in English. In Joan Bybee (ed.), Frequency of use and the organisation of language, 294–312. Oxford: Oxford University Press. (Reprinted from Linguistics 37(4). 1999. 575–596.).10.1093/acprof:oso/9780195301571.001.0001Search in Google Scholar

Calhoun, Sasha, Jean Carletta, Jason Brenier, Neil Mayo, Daniel Jurafsky, Mark Steedman & David Beaver. 2010. The NXT-format switchboard corpus: A rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue. Language Resources and Evaluation Journal 44. 387–419.10.1007/s10579-010-9120-1Search in Google Scholar

Clark, Herbert H. & Jean E. Fox Tree. 2002. Using uh and um in spontaneous speaking. Cognition 84. 73–110.10.1016/S0010-0277(02)00017-3Search in Google Scholar

Croft, William. 2001. Radical construction grammar: Syntactic theory in typological perspective. Oxford: Oxford University Press.10.1093/acprof:oso/9780198299554.001.0001Search in Google Scholar

Daudaravičius, Vidas & Marcinkevičienė. Rūta. 2004. Gravity counts for the boundaries of collocations. International Journal of Corpus Linguistics 9(2). 321–348.10.1075/ijcl.9.2.08dauSearch in Google Scholar

Eikmeyer, Hans-Jürgen, Ulrich Schade, Marc Kupietz & Uwe Laubenstein. 1999. A connectionist view of language production. In Rolf Klabunde & Christiane Von Stutterheim (eds.), Representations and processes in language production, 205–236. Wiesbaden: Deutscher Universitätsverlag.10.1007/978-3-322-99290-1_8Search in Google Scholar

Ellis, Nick C. 2006. Language acquisition as rational contingency learning. Applied Linguistics 27(1). 1–24.10.1093/applin/ami038Search in Google Scholar

Ellis, Nick C. & Fernando Ferreira-Junior. 2009. Constructions and their acquisition. Islands and the distinctiveness of the occupancy. Annual Review of Cognitive Linguistics 7. 187–220.10.1075/arcl.7.08ellSearch in Google Scholar

Ellis, Nick C., Rita Simpson-Vlach & Carson Maynard. 2008. Formulaic language in native and second language speakers: Psycholinguistics, corpus linguistics and TESOL. TESOL Quarterly 24(3). 375–396.10.1002/j.1545-7249.2008.tb00137.xSearch in Google Scholar

Elman, Jeffrey L. 1990. Finding structure in time. Cognitive Science 14. 179–211.10.4324/9781315784779-11Search in Google Scholar

Evert, Stefan. 2004. The statistics of co-occurrences: Word pairs and collocations. Stuttgart: Institut für maschinelle Sprachverarbeitung, University of Stuttgart dissertation.Search in Google Scholar

Fillmore, Charles J., Paul Kay & Mary Catherine O’Connor. 2003. Regularity and idiomaticity in grammatical constructions: The case of let alone. In Michael Tomasello (ed.), The new psychology of language: Cognitive and functional approaches to language structure, 243–270. Mahwah, NJ: Lawrence Erlbaum.Search in Google Scholar

Fried, Mirjam & Östman. Jan-Ola. 2004. Construction grammar: A thumbnail sketch. In Mirjam Fried & Jan-Ola Östman (eds.), Construction grammar in a cross-language perspective, 11–86. Amsterdam/Philadelphia: John Benjamins.10.1075/cal.2.02friSearch in Google Scholar

Frisson, Steven, Keith Rayner & Martin J. Pickering. 2005. Effects of contextual predictability and transitional probability on eye movements during reading. Journal of Experimental Psychology: Learning, Memory and Cognition 31(5). 862–877.10.1037/0278-7393.31.5.862Search in Google Scholar

Fung, Loretta & Ronald Carter. 2007. Discourse markers and spoken English: Native and learner use in pedagogic settings. Applied Linguistics 28(3). 410–439.10.1093/applin/amm030Search in Google Scholar

Godfrey, John J., Edward Holliman & McDaniel. Jane 1992. SWITCHBOARD: Telephone speech corpus for research and development. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1992 1. I-517–I-20.10.1109/ICASSP.1992.225858Search in Google Scholar

Goldberg, Adele. 2005. Constructions at work: The nature of generalization in language. Oxford: Oxford University Press.10.1093/acprof:oso/9780199268511.001.0001Search in Google Scholar

Goldman-Eisler, Frieda. 1968. Psycholinguistics: Experiments in spontaneous speech. New York: Academic Press.Search in Google Scholar

Gregory, Michelle L., William D. Raymond, Alan Bell, Eric Fosler-Lussier & Daniel Jurafsky. 1999. The effects of collocational strength and contextual predictability in lexical production. Communication and Linguistic Studies 35. 151–166.Search in Google Scholar

Gries, Stefan Th. 2013. 50-something years of work on collocations: What is or should be next …. International Journal of Corpus Linguistics 18(1). 137–165.10.1075/bct.74.07griSearch in Google Scholar

Gries, Stefan Th. 2014. Coll.analysis 3.5. A script for R to compute perform collostructional analyses. in Google Scholar

Gries, Stefan Th. 2015a. More (old and new) misunderstandings of collostruction analysis: On Schmidt & Küchenhoff (2013). Cognitive Linguistics 26(3). 505–536.10.1515/cog-2014-0092Search in Google Scholar

Gries, Stefan Th. 2015b. The role of quantitative methods in cognitive linguistics. In Jocelyne Daems, Eline Zenner, Kris Heylen, Dirk Speelman & Hubert Cuyckens (eds.), Change of paradigms – New paradoxes. Recontextualizing language and linguistics. Berlin/Boston: De Gruyter Mouton.Search in Google Scholar

Gries, Stefan Th. & Joybrato Mukherjee. 2010. Lexical gravity across varieties of English: An ICE-based study of n-Grams in Asian Englishes. International Journal of Corpus Linguistics 15(4). 520–548.10.1075/ijcl.15.4.04griSearch in Google Scholar

Hothorn, Torsten, Kurt Hornik & Achim Zeileis. 2006. Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics 15(3). 651–674.10.1198/106186006X133933Search in Google Scholar

Jenkins, Herbert M. & William C. Ward. 1965. Judgement of contingency between responses and outcomes. Psychological Monographs 79(1). 1–17.10.1037/h0093874Search in Google Scholar

Jucker, Andreas. 1993. The discourse marker well: A relevance-theoretical account. Journal of Pragmatics 19. 435–452.10.1016/0378-2166(93)90004-9Search in Google Scholar

Jurafsky, Daniel, Alan Bell, Eric Fosler-Lussier, Cynthia Girand & William D. Raymond. 1998. Reduction of English function words in Switchboard. Proceedings of the International Conference of Spoken Language Processing, Sydney. 1–4.10.21437/ICSLP.1998-801Search in Google Scholar

Jurafsky, Daniel & James H. Martin. 2008. Speech and language processing. An introduction to natural language processing, computational linguistics, and speech recognition. Pearson/Prentice Hall International.Search in Google Scholar

Kapatsinski, Vsevolod M. 2005. Measuring the relationship of structure to use: Determinants of the extent of recycle in repetition repair. Berkeley Linguistics Society 30. 481–492.10.3765/bls.v30i1.949Search in Google Scholar

Kapatsinski, Vsevolod M. & Joshua Radicke. 2009. Frequency and the emergence of prefabs: Evidence from monitoring. In Roberta Corrigan, Edith A. Moravcsik, Hamid Ouali & Kathleen M. Wheatley (eds.), Formulaic language. Vol. 2: Acquisition, loss, psychological reality, functional explanations, 499–520. Amsterdam/Philadelphia: John Benjamins.10.1075/tsl.83.14kapSearch in Google Scholar

Langacker, Ronald W. 2000. A dynamic usage-based model. In Suzanne Kemmer & Michael Barlow (eds.), Usage-based models of language, 1–63. Stanford, CA: CSLI Publications.Search in Google Scholar

Levey, Stephen. 2006. The sociolinguistic distribution of discourse marker like in preadolescent speech. Multilingua 25. 413–441.10.1515/MULTI.2006.022Search in Google Scholar

Maclay, Howard & Charles E. Osgood. 1959. Hesitation phenomena in spontaneous English speech. Word 15. 19–44.10.1080/00437956.1959.11659682Search in Google Scholar

Manning, Christopher D. & Hinrich Schütze. 1999. Foundations of statistical natural language processing. Cambridge, MA: MIT Press.Search in Google Scholar

Müller, Simone. 2005. Discourse markers in native and non-native English discourse. Amsterdam/Philadelphia: John Benjamins.10.1075/pbns.138Search in Google Scholar

NXT Switchboard Corpus Public Release. 2008. Philadelphia: Linguistic Data Consortium. Catalog #LDC2009T26.Search in Google Scholar

Oakes, Michael. 1998. Statistics for corpus linguistics. Edinburgh: Edinburgh University Press.Search in Google Scholar

Onnis, Luca & Eric Thiessen. 2013. Language experience changes subsequent learning. Cognition 162(2). 168–284.10.1016/j.cognition.2012.10.008Search in Google Scholar

Pecina, Pavel. 2010. Lexical association measures and collocation extraction. Language Resources and Evaluation 44(1/2). 137–158.10.1007/s10579-009-9101-4Search in Google Scholar

Perruchet, Pierre & Sebastien Pacton. 2006. Implicit learning and statistical learning: One phenomenon, two approaches. TRENDS in Cognitive Sciences 10(5). 233–238.10.1016/j.tics.2006.03.006Search in Google Scholar

Phillips, Martin K. 1983. Lexical macrostructure in science text. Birmingham: University of Birmingham dissertation.Search in Google Scholar

R Development Core Team. 2009. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. in Google Scholar

Reali, Florencia & Morten H. Christiansen. 2007. Processing of relative clauses is made easier by frequency of occurrence. Journal of Memory and Language 57. 1–23.10.1016/j.jml.2006.08.014Search in Google Scholar

Rescorla, Robert A. 1968. Probability of shock in the presence and absence of CS in fear conditioning. Journal of Comparative Physiological Psychology 66. 1–5.10.1037/h0025984Search in Google Scholar

Rumelhart, David E. & James L. McClelland (eds.). 1986. Parallel distributed processing: Explorations in the microstructure of cognition. Foundations, vol. 1. Cambridge, MA/London: MIT Press/Bradford.10.7551/mitpress/5236.001.0001Search in Google Scholar

Schmid, Hans-Jörg & Küchenhoff. Helmut. 2013. Collostructional analysis and other ways of measuring lexicogrammatical attraction: Theoretical premises, practical problems and cognitive underpinnings. Cognitive Linguistics 24(3). 531–577.10.1515/cog-2013-0018Search in Google Scholar

Schneider, Ulrike. 2014. Frequency, chunks and hesitations. A usage-based analysis of chunking in English. Freiburg: NIHIN Studies. in Google Scholar

Schneider, Ulrike. 2016. Chunking as a factor determining the placement of hesitations. A corpus-based study of spoken English. In Heike Behrens & Stefan Pfänder (eds.), Frequency effects in language: What counts in language processing, acquisition and change, 61–89. Berlin/New York: Mouton De Gruyter.10.1515/9783110346916-004Search in Google Scholar

Shanks, David R. 1995. The psychology of associative learning. Cambridge: Cambridge University Press.10.1017/CBO9780511623288Search in Google Scholar

Shriberg, Elizabeth & Andreas Stolcke. 1996. Word predictability after hesitations: A corpus-based study. Proceedings of the International Conference on Spoken Language Processing. 1868–1871.10.1109/ICSLP.1996.607996Search in Google Scholar

Strobl, Carolin, Anne-Laure Boulestreix, Thomas Kneib, Thomas Augustin & Achim Zeileis. 2008. Conditional variable importance for random forests. BMC Bioinformatics 9. 307.10.1186/1471-2105-9-307Search in Google Scholar

Strobl, Carolin, Anne-Laure Boulestreix, Achim Zeileis & Torsten Hothorn. 2007. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics 8. 25.10.1186/1471-2105-8-25Search in Google Scholar

Strobl, Carolin, James Malley & Gerhard Tutz. 2009. An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods 14(4). 323–348.10.1037/a0016973Search in Google Scholar

Tagliamonte, Sali A. & R. Harald Baayen. 2012. Models, forests, and trees of York English: Was/were variation as a case study for statistical practice. Language Variation and Change 24. 135–178.10.1017/S0954394512000129Search in Google Scholar

Tily, Harry, Susanne Gahl, Inbal Arnon, Neal Snider, Anubha Kothari & Joan Bresnan. 2009. Syntactic probabilities affect pronunciation variation in spontaneous speech. Language and Cognition 1(2). 147–165.10.1515/LANGCOG.2009.008Search in Google Scholar

Vogel Sosa, Anna & James MacFarlane. 2002. Evidence for frequency-based constituents in the mental lexicon: Collocations involving the word. Journal of Brain and Language 83. 227–236.10.1016/S0093-934X(02)00032-9Search in Google Scholar

Wahl, Alexander. 2015. Intonation unit boundaries and the storage of bigrams. Evidence from bidirectional and directional association measures. Review of Cognitive Linguistics 13(1). 191–219.10.1075/rcl.13.1.08wahSearch in Google Scholar

Ward, William C. & Herbert M. Jenkins. 1965. The display of information and the judgement of contingency. Canadian Journal of Experimental Psychology 19(3). 231–241.10.1037/h0082908Search in Google Scholar

Wiechmann, Daniel. 2008. On the computation of collostruction strength: Testing measures of association as expressions of lexical bias. Corpus Linguistics and Linguistic Theory 4(2). 253–290.10.1515/CLLT.2008.011Search in Google Scholar

Wray, Alison. 2002. Formulaic Language and the Lexicon. Cambridge: Cambridge University Press.10.1017/CBO9780511519772Search in Google Scholar

Published Online: 2018-03-29
Published in Print: 2020-10-25

© 2020 Walter de Gruyter GmbH, Berlin/Boston

Scroll Up Arrow