Show Summary Details
More options …

Folia Linguistica

Acta Societatis Linguisticae Europaeae

Editor-in-Chief: Fischer, Olga / Norde, Muriel

Folia Linguistica
IMPACT FACTOR 2018: 0.463
5-year IMPACT FACTOR: 0.647

CiteScore 2018: 0.59

SCImago Journal Rank (SJR) 2018: 0.284
Source Normalized Impact per Paper (SNIP) 2018: 0.971

Online
ISSN
1614-7308
See all formats and pricing
More options …
Volume 40, Issue 1

Incremental word processing influences the evolution of phonotactic patterns

Andrew Wedel
Published Online: 2019-07-28 | DOI: https://doi.org/10.1515/flih-2019-0011

Abstract

Listeners incrementally process words as they hear them, progressively updating inferences about what word is intended as the phonetic signal unfolds in time. As a consequence, phonetic cues positioned early in the signal for a word are on average more informative about word-identity because they disambiguate the intended word from more lexical alternatives than cues late in the word. In this contribution, we review two new findings about structure in lexicons and phonological grammars, and argue that both arise through the same biases on phonetic reduction and enhancement resulting from incremental processing.

(i) Languages optimize their lexicons over time with respect to the amount of signal allocated to words relative to their predictability: words that are on average less predictable in context tend to be longer, while those that are on average more predictable tend to be shorter. However, the fact that phonetic material earlier in the word plays a larger role in word identification suggests that languages should also optimize the distribution of that information across the word. In this contribution we review recent work on a range of different languages that supports this hypothesis: less frequent words are not only on average longer, but also contain more highly informative segments early in the word.

(ii) All languages are characterized by phonological grammars of rules describing predictable modifications of pronunciation in context. Because speakers appear to pronounce informative phonetic cues more carefully than less informative cues, it has been predicted that languages should be less likely to evolve phonological rules that reduce lexical contrast at word beginnings. A recent investigation through a statistical analysis of a cross-linguistic dataset of phonological rules strongly supports this hypothesis. Taken together, we argue that these findings suggest that the incrementality of lexical processing has wide-ranging effects on the evolution of phonotactic patterns.

1 Introduction

We speak in order to be understood (Jakobson et al. 1965).

Human languages are message transmission systems: a speaker encodes an intended message into a signal which is transmitted over a channel to a recipient, who then decodes it back into a message. A guiding principle for the work we describe here is that human language should be subject to the same theoretical constraints that apply to message transmission systems in general (Shannon 1948). From that point, we are in a position to ask whether these constraints appear to meaningfully shape the diachronic evolution of linguistic systems (see e.g. Hockett 1960; Piantadosi et al. 2009; Hall et al. 2018). Here we focus on a particular property of the information transmission channel in human language: the incremental processing of phonetic cues in the course of lexical access. We review results from two independent research areas that suggest languages are in fact significantly shaped by constraints on information flow imposed by incremental processing in the forms of their lexicons and in their phonological grammars. We argue that both of these patterns can be understood in the context of the established trade-off between effort-reduction and maintenance of message transmission accuracy.

In the remainder of the introduction, we start by reviewing evidence for incremental processing, and show how it imposes constraints on the amount of information different parts of the word can contribute to lexical access. We then turn to Zipf’s Principle of Least Effort and the Law of Abbreviation (1935) and review what they predict for optimization of word forms under competing pressures to minimize effort while maintaining message transmission accuracy. Finally, we outline hypotheses about how individual words, the lexicon as a whole, and the phonological grammar might be expected to be optimized over diachronic time in response to incremental processing. In the sections subsequent to the introduction, we synthesize our recent work that investigates these hypotheses, and argue that they are strongly consistent with the proposal that linguistic systems are optimized with respect to incremental processing. We end with a discussion about how other typologically common properties of languages may arise through diachronic changes grounded in incremental processing, and suggest future directions.

1.1 Incremental processing in lexical access

Figure 1:

Segmental information by position in the word.

If languages evolve over diachronic time under pressure to maintain accurate message transmission, we can predict that languages will develop in ways that maintain higher segmental information for low predictability words, particularly at the beginnings relative to their ends. To provide context for this prediction, we turn to Zipf’s Law of Abbreviation.

1.2 Word shape and Zipf’s law of Abbreviation

Human languages allow for messages with variable amounts and types of phonetic material. It follows from information theory (Shannon 1948) that it will be most efficient for languages to allocate more phonetic information to messages that themselves convey more information. Zipf famously identified this ‘law of abbreviation’ (1935) in language, showing that more frequent words tend to be short, and conversely, less frequent words tend to be longer. Later work by Piantadosi et al. (2011) showed that word length is in fact best predicted by its average contextual predictability, rather than frequency per se. This relationship has been subsequently shown to hold in a large number of languages, suggesting that the diachronic optimization of word length with respect to predictability is a general phenomenon. Experimental work has also provided insight into possible pathways for word length modulation to occur in response to predictability. Mahowald et al. (2013) showed that in a sentence completion task which requires participants to choose a short or long form of a word (e.g. lab ~ laboratory), predictable contexts prompted preferential use of short forms. Similarly, Kanwal et al. (2017) used an artificial language usage paradigm to show that participants tended to develop shorter forms for more predictable messages in the language, and maintained longer forms for less predictable messages.

What is it about an inverse word length ~ word predictability relationship that is ‘more efficient’? Assuming that a greater amount of phonetic information in a signal is correlated with greater effort for the speaker (e.g. in terms of the number of articulatory gestures or their degree of hyperarticulation), then having the most frequent words in the lexicon be short minimizes speaker effort. However, Zipf’s law of abbreviation also follows from the definition of an effective communication system for the listener (Shannon 1948, Shannon 1949). Words that are less frequent are also likely to be less predictable from context, and so the listener must rely correspondingly more on the phonetic signal to identify the word. A longer word contains more material, which on average creates a larger perceptual distance from other alternatives in the lexicon, aiding the listener in lexical processing. This results in two separate pressures to form an efficient lexicon which are often at odds. Words should be both minimally effortful and maximally perceptually distinct from others in the lexicon. These two often contrasting pressures should cause the lexicon to evolve to a state where both are relatively satisfied (Köhler 1993).

Although longer words are on average more distinct from each other, increasing the length of a word is not the only way to create usefully greater perceptual distance from other lexical alternatives. Words of the same length can, depending on their segmental composition, be in denser or sparser regions of the lexicon. The three segment word cat, for example, is in a dense lexical neighborhood with very many neighboring words differing by just one segment at any position. The same-length word imp in contrast, is in a very low density neighborhood with very few words that differ by one segment in any position. As a consequence, the particular composition and order of the segments in the word imp results in a more informative signal than those in the word cat. A hypothesis that arises from this line of thinking is that less predictable words may not only evolve to have or retain more segments than more predictable words, but also that they may evolve to have or retain more contextually informative segments, particularly at word-beginnings where phonetic cues are most informative in general. In the next section, we review work that tests this hypothesis.

2 Incremental processing and the structure of the lexicon

In King and Wedel (submitted), we report a set of tests of the hypothesis that languages tend to compensate lower word predictability with contextually higher information segments independently of word length. This work was carried out for a genetically and areally diverse set of languages, but here we will use English as running example. The information contributed by a segment (or any sublexical unit such as a feature, biphone, etc.) is calculated as the negative logarithm of the probability of that segment appearing in the speech signal, given the previous segmental context in the word (van Son and Pols 2003; Cohen Priva 2015, Cohen Priva 2017). Segments that are relatively improbable given the cohort of possible words defined by the previous segmental context contribute more information than those that are relatively probable, because they contribute to reducing the cohort of possible alternatives further. The information of a particular segment in a word can be estimated from the set of words and their frequencies in a sample of the language, such as a corpus, as in eq. (1).

$SI\left(s\right)=-{\text{log}}_{2}\text{\hspace{0.17em}}p\left(s\text{|}previous\right)=-{\text{log}}_{2}\frac{count\left(s\cap previous\right)}{count\left(previous\right)\text{\hspace{0.17em}}}$(1)

Figure 1 shows the average segment information values at each segment position in the set of six-segment monomorphemic words in English. Note that the highest segment information is found at the initial position, at which point no disambiguating information has yet been introduced from the segmental material in the word itself and so the cohort of alternatives is the entire word set. As successive segments are incorporated into the context, the set of lexical alternatives is reduced with the result that each new segment can contribute, on average, a smaller and smaller amount of disambiguating information.

Once we have created a dataset with segment information values for each position in each word, we can ask whether the segment information in a given word is predicted by the predictability of the word itself. In parallel with the way that segment information is calculated, we will frame word predictability in terms of the information it contributes based on its probability in the corpus. More specifically, we will use word information as equivalent to its surprisal, which is the negative logarithm (base 2) of the word’s probability in the corpus. If the lexicon is optimized to support low predictability (i. e. high information) words by allocating higher information segments to them, we expect word and segment information to show a positive correlation. However, because the calculation of contextual segment information is based in part on word frequency in the corpus, segmental information is, necessarily, partially correlated with word information, to degrees that differ depending on segment position in the word (see King and Wedel submitted for discussion). As a consequence, even in the absence of any evolutionary optimization of the lexicon, some degree of positive correlation between word information and segment information should exist. To test for optimization above and beyond this existing correlation between word information and segment information, we first find a baseline for the expected relationship in a given lexicon in the absence of any optimization. We do this by shuffling the frequencies within each length class in the corpus of words to create a new pseudo-lexicon in which frequency-based optimization between word information and the segmental content of words has been removed. This method leaves the relationship between word length and word predictability unchanged, as well as the lexical network structure in the lexicon. We then carry out a linear regression to measure the predictive relationship of word information (i. e. the negative base 2 logarithm of its probability in the corpus) on segment information at each position in the word, and store the model estimate for this relationship. We repeat this step 1000 times to generate a distribution of model estimates for this relationship for each position in the word when the word frequencies in the lexicon are shuffled. Finally, we then inspect the model estimates for the predictive relationship between word information and segmental information at each position for the real lexicon and ask if they fall significantly outside the distribution of estimates for the shuffled lexicons, and in which direction.

In Figure 2, we show the model estimates for the predictive value of word information on segment information at each segment position over the real lexicon, normalized by subtracting from each position the average model estimates for the shuffled lexicons. The model estimates for the word information factor are higher at early positions in the word than the distribution of model estimates in the shuffled lexicons, suggesting that the real lexicon is optimized through the allocation of higher information segments early in higher information words.

Figure 2:

Word information predicts segment information at early positions in English. The y-axis shows the regression model estimate for the word information factor predicting segment information relative to the baseline calculated over frequency-shuffled lexicons.

Figure 2 shows that lower word predictability significantly correlates with a higher than expected segment information early in the word. This is consistent with the hypothesis that the lexicon evolves under a bias based in incremental cue processing favoring high information cues where they can contribute the most to message communication: at the beginnings of words that are less predictable in context. A result of this pattern is that words that are less predictable tend to begin with segments that immediately reduce the set of competing competitors to a greater than average degree, and then end with segments that are more predictable given the preceding segments. We can illustrate this general pattern of longer, more predictable tails for low-predictability words by showing that all else being equal, low-predictability words have an earlier uniqueness point, that is, a point at which the word is fully disambiguated from alternatives. Figure 3 shows the relationship between word predictability and the relative position of the uniqueness point in the word in English. In order to remove the effect of word-length, we divide the position of the uniqueness point by the length of the word to create a measure that is comparable across words of varying lengths. As seen in Figure 3, in less predictable words, the uniqueness point tends to be located relatively earlier in the word, with a correspondingly longer post-uniqueness point, or redundant tail. Redundancy can be defined as additional information in the signal above and beyond that which would be needed for accurate transmission under error-free conditions (Shannon 1948). The likelihood of accurate message transmission in noise can be increased by providing additional, redundant opportunities to recover the same information, which effectively increases the distance between signals (see King and Wedel submitted for more discussion).

Figure 3:

The relationship between lexical uniqueness point and word predictability.

King and Wedel (submitted) show that the results shown in Figures 2 and 3 for English hold for a genetically and areally diverse set of languages, suggesting that this pattern is general. If these results hold, we can update Zipf in the following way: lexicons tend to evolve to not only allocate more segmental material to less predictable words, but also to allocate that segmental material so that it is more informative (see Köhler 1987). On that basis, in the next section we review results from a study asking whether phonological grammars in turn evolve to protect phonemic information where it is often most informative, near the beginnings of words.

3 Incremental processing and the phonological grammar

A broad body of research on the evolution of phonological patterns suggests that consistent biases on word form in usage can, over time, shift long-term word representations (Seyfarth 2014; Sóskuthy and Hay 2017), and that these shifts can in turn spread to words with similar structures (Nielsen 2011; Levi 2015; see also Wang 1969; Bybee 2002; Phillips 2006), setting the stage for the eventual development of phonologized patterns which indirectly reflect these biases (e.g. Cohen Priva 2017). If, as suggested by the work reviewed in the previous section, words change under a bias to maintain more informative material, and/or to reduce less informative material, can we find any reflex of this bias in phonological grammars?

Over the last 40 years, a number of phonologists working in different traditions have proposed that the observed range of phonological rules may be in part explained by the positional asymmetry in the amount of information provided by phonetic cues across the word. Houlihan (1975) proposed that because of the greater information provided by early cues, word-initial position should tend to host a greater number of contrasts, and conversely, that contrast neutralizing rules should be limited to word-final position. Related proposals were published by Houlihan and Iverson (1979), Nooteboom (1981) and Taft (1984).

To investigate this pattern at a larger scale, we recently began a larger project to test these predictions statistically through analyzing grammars from genetically and areally diverse sets of languages. As an initial exploration of the larger hypothesis that grammatical systems tend to favor preservation of lexically initial phonological contrasts, we tested a relatively narrow and conservatively defined formulation of the asymmetry: that phonemically neutralizing rules (i. e. those which can potentially create surface homophony) are less likely to target the beginnings of lexical domains than the ends (Wedel et al. in press). Within a dataset of 50 languages from 37 top-level families and 21 linguistic areas, we identified all phonological rules which were defined as targeting either the beginning or end of a lexical domain, i. e. a root, stem, word, phrase or utterance. Examples of familiar languages within the dataset include Navajo/North America, Quechua/South America, Armenian/Europe, Chichewa/Africa, Mongolian/Asia, and Bardi/Australia. We further coded each rule for whether it was neutralizing, operationally defined as potentially creating surface homophony. Examples of neutralizing rules include a word-final obstruent devoicing rule in a language with a phonemic voicing distinction in obstruents, or a word-final vowel deletion rule in a language which allows word-final codas. Rules classified as non-neutralizing in this approach are a heterogeneous set, including processes that might be expected to enhance cue perceptibility, such as epenthesis or fortition rules, as well as those that could reduce cue perceptibility, such as a word-final sonorant devoicing rule in a language without a phonemic sonorant voicing contrast. Note that this method of binning into two contrasting sets imposes an over-coarse division on the data relative to our overarching hypothesis that phonological rules that reduce disambiguating information will be more common toward word ends. For example, rules that arguably reduce the salience of a phonological contrast but fail to result in a phonological neutralization, such as vowel devoicing, are classified as non-neutralizing. On the other hand, rules that are phonologically neutralizing, but that phonetically can show incomplete neutralization will be classified as neutralizing. This method nonetheless serves to segregate more information-reducing rules from less information-reducing rules, while being simple enough that it can be applied clearly using any sufficiently detailed grammatical description of a language.

As can be seen by visual inspection of the data in Figure 4 below, fewer rules overall appear to target word beginnings relative to word ends, and this tendency seems to be amplified for neutralizing rules. We statistically tested whether these visually apparent trends are significant in this dataset using a logistic mixed effects regression model, with Language, Top-level family, and Area as random intercepts. In this model, we tested whether rule-type (neutralizing versus non-neutralizing) significantly predicted the lexical domain edge at which the rule was defined (initial versus final). We found that all rule types were significantly more likely to target final domain edges, but that above and beyond this, neutralizing rules were significantly more likely to target final domain edges.

Figure 4:

Edge-bias in neutralizing versus non-neutralizing rules.

We further asked whether this end-effect could be due simply to a tendency for syllable codas to be targeted by phonologically reductive rules. To do so, we coded every rule in the dataset for whether it modified or eliminated a segment that would otherwise have surfaced as a coda. We then repeated our analysis on a dataset in which all of these rules were removed. The resulting dataset contained all of the rules in the original dataset that targeted beginnings of lexical domains, while those targeting ends were essentially only those that modified domain-final syllable nuclei. Even within this more limited dataset, neutralizing rules remain significantly more likely to target the ends than the beginnings of lexical domains, suggesting that this result cannot be simply accounted for as a syllable coda-driven phenomenon. Finally, we asked if this could be accounted for in some way as deriving from the well-known typological tendency for suffixation over prefixation. We coded all languages in the dataset in several different ways for their affix preference, but could identify no significant predictive effect of affix preference on the tendency for neutralizing rules to target lexical domain ends (see Wedel et al. in press for additional discussion). Figure 5 shows one possible division of the data which divides the set of languages into exclusively prefixing (8 languages), exclusively suffixing (13 languages), and mixed languages (29 languages). If affixation is related in any way to the development or identification of neutralizing rules, we should expect exclusively prefixing and suffixing languages to show opposite patterns. In Figure 5, we see that although suffixing languages show a stronger end-favoring pattern for neutralizing rules, the pattern is not reversed for prefixing languages (see Figure 5 legend for more details). We conclude that the observed significant tendency for neutralizing rules to target word-ends in this dataset cannot be fully explained as a bias toward neutralization in coda position, or as arising through a suffixing-bias in the languages within the dataset.

Figure 5:

Edge-bias in neutralizing rules by affix-preference.

4 Diachronic phonotactics

A increasingly well-developed body of evidence suggests that many kinds of language change operating at many different levels of description and timescales can be usefully understood as evolutionary processes (for arguments, see Ritt 2004; Blevins 2004). A striking result of work conducted over the last few decades which fits neatly with this framework is that many types of phonological language change appear to proceed from the particular to the general: usage-level biases influence the type of variants that arise, where frequent variants can come to shift individual linguistic representations in memory, which then at longer time-scales can shift broader lexical and phonological patterns (e.g. Sóskuthy and Hay 2017, see also Blevins 2004; Phillips 2006). As we steadily learn more about the mechanisms of language production, perception, and categorization in usage, we simultaneously gain new hypotheses about the biases that may prime these evolutionary changes, or alternatively, gain hypotheses about what linguistic structures might be produced.

Here we have argued that incremental processing of phonetic cues in lexical access exerts an influence on the evolution of individual word forms, and more broadly the evolution of grammaticalized phonological patterns. To do so, we draw on the fact that incremental processing imposes constraints on the amount of information provided at different temporal positions in the signal. But how might communicative factors like the information a segment contributes to lexical access influence the evolution of linguistic structures? Existing evidence provides at least a partial answer, suggesting that the observed patterns can be accounted for through a greater rate of phonetic reduction in more predictable words in non-initial positions.

A variety of studies have shown that reduction in phonological and phonetic material in lexical outputs is correlated with greater predictability. As reviewed above, speakers are more likely to choose truncated forms for predictable lexical items (e.g. lab ~ laboratory, Mahowald et al. 2013; Kanwal et al. 2017). At the more gradient phonetic level, contextually more predictable segments are produced with a faster speech rate (e.g. Aylett and Turk 2004; Van Son and Van Santen 2005; Arnon and Cohen-Priva 2013; Gahl et al. 2012) and less vowel dispersion (e.g. Aylett and Turk 2006). Van Son and colleagues showed that segments which contribute less information in context to lexical disambiguation in context are reduced in duration and in a variety of measures of articulatory extent (Van Son and Pols 2003; Van Son and Van Santen 2005). When sufficiently consistent across usage events and speakers, these predictability-driven reductions in duration and articulatory extent can eventually drive community-wide shifts in lexical representation (see Cohen Priva 2015, Cohen Priva 2017 for discussion; see also Blevins 2004). If low information segments are more often phonetically reduced in usage, we expect them to undergo more categorical changes such as deletion or phonemic neutralization more rapidly at a diachronic timescale. The segments that on average convey the least information are those that are toward the end of high predictability words, leaving us with the prediction that higher predictability words should evolve more quickly to lose material through deletion, and lose contrast through neutralization. Finally, all else being equal, these processes should be faster at later positions in the word. (Note that while this usage-based pathway is supported by the idea that change is more rapid for more frequent forms, it is unlikely to be solely a practice effect (Bybee 2002) because differential practice for different words cannot straightforwardly explain differential rates of reduction at the beginning versus the ends of words.) As an illustrative example of a possible predictability-driven change in progress, many undergraduate students in the first author’s introductory linguistics classes report that their primary lexical representation for the high frequency word memory has two syllables (/mɛmɹi/), rather than the prescriptively correct three syllables (/mɛməɹi/). As a comparison, these same students all report that their lexical representation for the nearly identical, but low frequency form mammary (/mæməɹi/) has three syllables. Because the frequent memory is more likely to be reduced, repeated reduced tokens eventually cause the stored form of the word to be reduced itself (see Pierrehumbert 2002 for discussion).

This pathway for differential rates of reduction does not exclude the possibility that there are also language change pathways that can lead to an increase in segmental information in positions that support it, such as word initial positions. Wedel et al. (2018) showed that in a corpus of natural speech, word-initial stop VOT was hyperarticulated to be more contrastive if the voicing value of the stop distinguished the word from a minimal pair competitor (e.g. pat ~ bat). Likewise, initial syllable vowels were found to be hyperarticulated away from vowel competitors in F1-F2 space if that competitor created a minimal pair (e.g. lift ~ left; see also Buz et al. 2016; Seyfarth et al. 2016). These contrastive hyperarticulation effects serve to increase perceptual distance to close lexical competitors and may provide a diachronic pathway for high information segments to not only resist reduction, but also to become more informative through phonological processes such as fortition and epenthesis (see Hall et al. 2016).

Finally, all of these potential pathways to modify the segmental content of individual words have been argued to prime the development of grammaticalized phonological patterns (see Hall et al. 2016). For example, Cohen Priva (2017) has shown that the average information (‘informativity’) contributed by word-final stops in a variety of languages successfully predicts whether they are subject to phonological deletion processes: in English, word-final /t/ contributes, on average, the least information to word disambiguation and it is also the stop that is most likely to be deleted word-finally. In Indonesian, it is word-final /k/ that on average contributes least, and it is the stop that is most likely to be deleted.

5 Conclusions

Here, we reviewed and synthesized the results of two research projects which both sprang from the observation that lexical processing is incremental, which makes the strong prediction that early cues can in principle provide more information to the listener than later cues. Given evidence that language structures appear to evolve under constraints to maintain communicative efficiency (e.g. Piantadosi et al. 2011), we looked for and found evidence in a diverse set of languages that (i) less predictable words tend to have more informative early segments, and (ii) phonological rules that reduce phonemic contrast between words are almost only found at lexical domain ends, where segmental identity carries less information. These findings fit neatly into a larger body of research that shows that communicative factors do predict sound-level variation at different levels of description (reviewed in Hall et al. 2018), and that more broadly, language evolves under pressure to maintain communicative efficiency.

References

• Allopenna, Paul, James Magnuson & Michael Tanenhaus. 1998. Tracking the time course of spoken word recognition using eye movements: Evidence for continuous mapping models. Journal of Memory and Language 38(4). 419–439. .

• Arnon, Inbal & Uriel Cohen Priva. 2013. More than words: The effect of multi-word frequency and constituency on phonetic duration. Language and speech 56(3). 349–371.

• Aylett, Matthew & Alice Turk. 2004. The smooth signal redundancy hypothesis: A functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Language and Speech 47(1). 31–56. .

• Aylett, Matthew & Alice Turk. 2006. Language redundancy predicts syllabic duration and the spectral characteristics of vocalic syllable nuclei. The Journal of the Acoustical Society of America 119(5). 3048–3058.

• Baayen, R. Harald, Richard Piepenbrock & Léon Gulikers. 1995. The CELEX lexical database (release 2). Distributed by the linguistic data consortium, University of Pennsylvania. Google Scholar

• Blevins, Juliette. 2004. Evolutionary phonology: The emergence of sound patterns. Cambridge: Cambridge University Press. Google Scholar

• Buz, Esteban, Michael Tanenhaus & T. Florian Jaeger. 2016. Dynamically adapted contextspecific hyper-articulation: Feedback from interlocutors affects speakers’ subsequent pronunciations. Journal of Memory and Language 89. 68–86. .

• Bybee, Joan. 2002. Word frequency and context of use in the lexical diffusion of phonetically conditioned sound change. Language Variation and Change 14. 261–290. .

• Cohen Priva, Uriel. 2015. Informativity affects consonant duration and deletion rates. Laboratory Phonology 6(2). 243–278. .

• Cohen Priva, Uriel. 2017. Informativity and the actuation of lenition. Language 93(3). 569–597. .

• Dahan, Delphine & James Magnuson. 2006. Spoken word recognition. In Matthew Traxler & Morton A. Gernsbacher (eds.), Handbook of psycholinguistics, Second edn., 249–283. Cambridge, Massachusetts: Academic Press. Google Scholar

• Dahan, Delphine, James Magnuson, Michael Tanenhaus & Ellen Hogan. 2001. Subcategorical mismatches and the time course of lexical access: Evidence for lexical competition. Language and Cognitive Processes 16(5–6). 507–534. .

• Fernald, Anne, Daniel Swingley & John Pinto. 2001. When half a word is enough: Infants can recognize spoken words using partial phonetic information. Child Development 72(4). 1003–1015. .

• Gahl, Susanne, Yao Yao & Keith Johnson. 2012. Why reduce? Phonological neighborhood density and phonetic reduction in spontaneous speech. Journal of Memory and Language 66(4). 789.

• Hall, Kathleen C., T. Elizabeth Hume, Florian Jaeger & Andrew Wedel. 2016. The message shapes phonology. Manuscript. University of British Columbia, University of Canterbury, University of Rochester & University of Arizona. PsyArXiv. psyarxiv.com/sbyqk. Google Scholar

• Hall, Kathleen C., T. Elizabeth Hume, Florian Jaeger & Andrew Wedel. 2018. The role of predictability in shaping phonological patterns. Linguistics Vanguard 4(S2). 1–15. .

• Hockett, Charles F. 1960. The origin of speech. Scientific American 203. 88–96. .

• Houlihan, Kathleen. 1975. The role of word boundary in phonological processes. Austin: University of Texas PhD dissertation. https://repositories.lib.utexas.edu/handle/2152/68636

• Houlihan, Kathleen & Gregory Iverson. 1979. Functionally-constrained phonology. In Daniel A. Dinnsen (ed.), Current approaches to phonological theory, 50–73. Bloomington: Indiana University Press. Google Scholar

• Jakobson, Roman, Gunnar Fant & Morris Halle. 1965. Preliminaries to speech analysis: The distinctive features and their correlates. Cambridge, Massachusetts: MIT Press. Google Scholar

• Kanwal, Jasmeen, Kenny Smith, Jennifer Culbertson & Simon Kirby. 2017. Zipf’s law of abbreviation and the principle of least effort: Language users optimise a miniature lexicon for efficient communication. Cognition 165. 45–52. .

• King, Adam & Andrew Wedel (under review). Early disambiguating information for low predictability words: The lexicon is shaped by incremental processing. Google Scholar

• Köhler, Reinhard. 1987. System theoretical linguistics. Theoretical Linguistics 14(2–3). 241–258. .

• Köhler, Reinhard. 1993. Synergetic linguistics. In Reinhard Köhler & Burghard Rieger (eds.), Contributions to quantitative linguistics, 41–51. Dordrecht: Springer. Google Scholar

• Levi, Susannah V. 2015. Generalization of phonetic detail: Cross-segmental, within-category priming of VOT. Language and Speech 58(4). 549–562. .

• Mahowald, Kyle, Evelina Fedorenko, Steven T. Piantadosi & Edward Gibson. 2013. Info/information theory: Speakers choose shorter words in predictive contexts. Cognition 126. 313–318. .

• Nielsen, Kuniko. 2011. Specificity and abstractness of VOT imitation. Journal of Phonetics 39(2). 132–142. .

• Nooteboom, Sieb G. 1981. Lexical retrieval from fragments of spoken words: Beginnings vs. endings. Journal of Phonetics 9(4). 407–424. Google Scholar

• Phillips, Betty. 2006. Word frequency and lexical diffusion. New York: Palgrave MacMillan. Google Scholar

• Piantadosi, Steven. T., Harry Tily & Edward Gibson. 2011. Word lengths are optimized for efficient communication. Proceedings of the National Academy of Sciences 108(9). 3526–3529. .

• Piantadosi, Steven T., Harry Tily & Edward Gibson. 2009. The communicative lexicon hypothesis. The 31st Annual Meeting of the Cognitive Science Society (CogSci09), 2582–2587. Austin, TX: Cognitive Science Society. Google Scholar

• Pierrehumbert, Janet B. 2002. Word-specific phonetics. In Carlos Gussenhoven & Natasha Warner (eds.), Laboratory phonology VII, 101–139. Berlin: Mouton de Gruyter. .

• Ritt, Nikolaus 2004. Selfish sounds and linguistic evolution: A Darwinian approach to language change. Cambridge: Cambridge University Press. https://doi.org/10.1017/cbo9780511486449

• Seyfarth, Seth. 2014. Word informativity influences acoustic duration: Effects of contextual predictability on lexical representation. Cognition 133(1). 140–155. .

• Seyfarth, Seth, Esteban Buz & T. Florian Jaeger. 2016. Dynamic hyperarticulation of coda voicing contrasts. Journal of the Acoustical Society of America 139(2). EL31–37.

• Shannon, Claude E. 1948. A mathematical theory of communication. Bell System Technical Journal 27. 623–656. .

• Shannon, Claude E. 1949. Communication in the presence of noise. Proceedings of the IRE 37. 10–21. .

• Sóskuthy, Marton & Jennifer Hay. 2017. Changing word usage predicts changing word durations in New Zealand English. Cognition 166. 298–313. .

• Taft, Lori A. 1984. Prosodic constraints and lexical parsing strategies. Amherst: University of Massachusetts Ph.D. Dissertation. Google Scholar

• Van Son, R. J. J. H. & P. H. Jan Van Santen. 2005. Duration and spectral balance of intervocalic consonants: A case for efficient communication. Speech Communication 47(1–2). 100–123. .

• van Son, R. J. J. H. & Louis C. W. Pols. 2003. How efficient is speech. In Proceedings of the Institute of Phonetic Sciences 25. 171–184. Google Scholar

• Wang, William. 1969. Competing changes as a cause of residue. Language 45(1). 9–25. .

• Wedel, Andrew, Adam Ussishkin & Adam King (in press). Crosslinguistic evidence of a strong statistical universal: phonological neutralization rules target word ends. LanguageGoogle Scholar

• Zipf, George K. 1935. The psychobiology of language. Boston: Houghton-Mifflin. Google Scholar

• Zwitserlood, Pienie. 1989. The locus of the effects of sentential-semantic context in spoken-word processing. Cognition 32(1). 25–64. .

Revised: 2019-01-02

Accepted: 2019-02-01

Published Online: 2019-07-28

Published in Print: 2019-07-26

Citation Information: Folia Linguistica, Volume 40, Issue 1, Pages 231–248, ISSN (Online) 1614-7308, ISSN (Print) 0165-4004,

Export Citation

© 2019 Walter de Gruyter GmbH, Berlin/Boston.