Skip to content
Publicly Available Published by De Gruyter Mouton October 14, 2015

Lexically conditioned phonetic variation in motherese: age-of-acquisition and other word-specific factors in infant- and adult-directed speech

  • Georgia Zellou EMAIL logo and Rebecca Scarborough
From the journal Laboratory Phonology

Abstract

Words produced to infants exhibit phonetic modifications relative to speech to adult interlocutors, such as longer, more canonical segments and prosodic enhancement. Meanwhile, within speech directed towards adults, phonetic variation is conditioned by word properties: lower word frequency and higher phonological neighborhood density (ND) correlate with increased hyperarticulation and degree of coarticulation. Both of these types of findings have interpretations that recruit listener-directed motivations, suggesting that talkers modify their speech in an effort to enhance the perceptibility of the speech signal. In that vein, the present study examines lexically-conditioned variation in infant-directed speech. Specifically, we predict that the adult-reported age at which a word was learned – lexical age-of-acquisition (AoA) – conditions phonetic variation in infant-directed speech. This prediction is indeed borne out in spontaneous infant-directed speech: later-acquired words are produced with more hyperarticulated vowels and a greater degree of nasal coarticulation. Meanwhile, ND predicts phonetic variation in data from spontaneous adult-directed speech, while AoA does not independently influence production. The patterns of findings in the current study support the stance that evaluation of the need for clarity is tuned to the listener. Lexical difficulty is evaluated by AoA in infant-directed speech, while ND is most relevant in adult-directed speech.

1 Introduction

Speech signals are highly variable – not just between speakers, who may differ in their physical characteristics as well as their linguistic backgrounds, but also between communicative contexts (who is speaking to whom and under what conditions), between phonetic contexts, and even between individual words. This variability is viewed alternately as ‘noise’ that must be factored out by listeners (Liberman et al. 1967) or (at least for some of it) as potentially useful information that constitutes part of an adaptive communicative system (Lindblom 1990). In this study, we focus on a specialized context, namely speech directed to infants (IDS), which has various phonetic characteristics that differ from those of speech directed to adults (ADS, which we take to be the default). Within this context we investigate word-by-word variability in particular. We consider that the systematic phonetic properties characterizing this type of variability are conditioned by a general goal of enhancing the perceptibility of the speech signal.

1.1 Phonetic variation in infant-directed speech

Speech directed toward infants is characterized by differences in pronunciation relative to speech to adults. Overall, mothers talking to their infants tend to produce slower rates of speech (Fernald 1992), higher and broader pitch ranges (Fernald 1984; Smith and Trainor 2008), hyperaticulated vowels (Kuhl et al. 1997, Kuhl et al. 2008; Rattanasone et al. 2013), and more canonical allophonic consonant variants (Dilley et al. 2014), compared to speech directed toward adults. Such infant-directed effects have been found in a number of languages (Ferguson 1964; Fernald et al. 1989), though the details of the effects are not always identical (e.g., Benders 2013; Igarashi et al. 2013).

The speech adjustments observed in infant-directed speech are often assumed to be made for the ‘benefit’ of a specific kind of interlocutor – an infant language learner – though the exact nature of this benefit is not entirely agreed upon. IDS modifications, particularly prosodic effects such as higher F0 and expanded pitch range (Fernald 1984), seem to serve to regulate the attention of the infant (Fernald 1991, Fernald 1993; Trainor and Desjardins 2002), as well as to communicate speaker affect (Bryant and Barrett 2007). Indeed, it has been demonstrated that speech exhibiting these infant-directed speech properties is preferred by infants over adult-directed speech (Fernald 1985; Fernald and Kuhl 1987). (See, e.g., Cristia 2013; Soderstrom 2007 for reviews of IDS properties.)

Some researchers have suggested that IDS modifications occur to promote language acquisition (Werker et al. 1994), for example by highlighting phonetic parameters that carry phonemic distinctions in a language (Kuhl et al. 1997, Kuhl et al. 2008) or by facilitating word recognition and word segmentation by infants (Thiessen et al. 2005; Singh et al. 2009; Song et al. 2010). In any case, the stronger attention that infants pay to IDS could increase the opportunity for the features of IDS to influence language learning, even beyond what might be expected from the relative frequency with which infants hear this type of speech (de Boer 2005). And it has been demonstrated computationally, in some studies at least, that IDS is more learnable than ADS, at least as far as the identification of vowel qualities is concerned (de Boer and Kuhl 2003; de Boer 2005).

Other researchers have suggested that the phonetic enhancement features of IDS in particular are not goal-directed at all, but are rather a by-product of the prosody and rate features of IDS (e.g., McMurray et al. 2013; Martin et al. 2014). For instance, because IDS is typically slower with more stressed monosyllabic words, it may simply result in less formant undershoot and greater prosodically-conditioned hyperarticulation (McMurray et al. 2013). So, while some features of IDS may be produced for the benefit of the infant, others may not be a specifically encoded part of the signal at all. In fact, other recent research even finds evidence for non-hyperarticulatory patterns in IDS. For example, VOT increased for both voiced and voiceless stops in infant-directed speech, such that the voicing contrast is not actually enhanced (McMurray et al. 2013). Furthermore, although the vowels /i/, /a/, and /u/ may be more peripheral in IDS (Kuhl et al. 1997), other vowels are not (McMurray et al. 2013; Cristia and Seidl 2014); and even the vowels reported to be hyperarticulated show broader distributions and greater overlap in IDS (Kirchhoff and Schimmel 2005; Soderstrom 2007; Cristia and Seidl 2014; Martin et al. 2014). This variability results in a tendency for contrasts in IDS to be less clear and to yield worse performance for automatic speech recognition when algorithms are trained on IDS input (Kirchhoff and Schimmel 2005; Martin et al. 2015).

However, whether infant-directed speech effects promote, intentionally or not, language acquisition, or whether they are simply a coincidence, they are part of a listener-specific speech style. Speech to all kinds of listeners (e.g., listeners in a noisy environment, hearing-impaired adults, non-native listeners) includes specialized features, many of which overlap with the features of IDS. Such speech is known to enhance clarity or intelligibility, without any didactic intent. (See a brief review of such ‘clear speech’ effects below.) It would be somewhat surprising, in fact, if speech to infants did not attend to clarity in some manner as well.

1.2 Clear speech

As just noted, in laboratory speech more generally and in speech directed toward (particular groups of) adult interlocutors, there is also systematic variation in acoustic features of pronunciation, which is generally assumed to occur for the benefit of a listener. Communicative context leads speakers to adjust properties of their speech. For example, speakers talk louder and slower in noisy environments (Lombard 1911; Lane and Tranel 1971). Speech directed toward listeners with a hearing impairment is louder and slower (Picheny et al. 1986; Bradlow et al. 2003). In fact, explicitly instructing talkers to speak ‘clearly’, relative to ‘conversational’ or ‘citation’ speech, produces a range of acoustic-phonetic adjustments, including increased intensity (Picheny et al. 1986), longer segment durations and slower speech rate (e.g., Picheny et al. 1986; Bradlow et al. 2003), larger F0 range (Bradlow et al. 2003), and enhancement of higher frequency (>1000 Hz) components in the long-term spectrum (Krause and Braida 2003). Prior studies have also shown more segmentally-focused spectral effects in clear speech in the form of vowel hyperarticulation or vowel space expansion (Picheny et al. 1986; Moon and Lindblom 1989, Moon and Lindblom 1994; Bradlow et al. 2003; Krause and Braida 2003; Scarborough and Zellou 2013), as well as maintained or increased consonant-to-vowel coarticulation (Mathies et al. 2001; Bradlow 2002; Scarborough and Zellou 2013). Indeed, these clear speech modifications yield speech that is more intelligible than ‘conversational’ speech (Picheny et al. 1986; Bradlow et al. 1996).

Context also conditions speech adjustments at the level of individual words. Less predictable words are produced more clearly than highly predictable words (Lieberman 1963; Scarborough 2010). Similarly, the first mention of a word in a narrative is produced more clearly than the second mention of that word (Fowler and Housum 1987). Thus, factors that influence the predictability of a word within a communicative context condition speech clarity at the word level. In other words, clarity is increased where listeners might be less able to rely on inferences from context to provide top-down cues to aid them in lexical perception.

1.3 Lexical factors and clarity

Other word-level factors, in particular lexical frequency and phonological neighborhood density, have been shown to have systematic effects on phonetic realization in adult-directed speech as well. With respect to lexical frequency, it has been shown repeatedly that high-frequency words are reduced relative to low-frequency words (e.g., Zipf 1935): they are produced with temporal reduction (Fidelholz 1975; Jurafsky et al. 2000; Bell et al. 2009), more contracted vowel spaces (Munson and Solomon 2004), and greater final segment deletion (Raymond et al. 2006). Phonological neighborhood density is a measure of the number of phonologically similar words (‘neighbors’) there are for a given word (Luce et al. 1990). Words with more neighbors, i.e., those from dense phonological neighborhoods, exhibit more hyperarticulated vowel spaces (Wright 2003; Munson and Solomon 2004), as well as a greater degree of nasal and vowel-to-vowel coarticulation (Scarborough 2013), than words from sparser neighborhoods.

These word-level properties have systematic influences on lexical perception as well. Specifically, high-frequency and low-ND words are perceived more quickly and accurately than low-frequency and high-ND words across a range of tasks (Howes 1957; Luce 1986; Goldinger et al. 1989; Vitevitch 1997, Vitevitch 2002; Vitevitch and Luce 1998; Luce and Pisoni 1998), suggesting a processing advantage for more frequently occurring words and words with more unique phonological structures. Such effects can be understood in terms of the process of spoken word recognition, which involves picking out a target word from among its competitors in the lexicon, all of whose lexical representations are simultaneously activated by an acoustic-phonetic input (Luce 1986; Marslen-Wilson and Zwitserlood 1989; Luce and Pisoni 1998; Norris et al. 2000). More frequent words have stronger representations, due to previous repeated activation, and are therefore more likely to be correctly picked than their less frequent competitors. And words from sparser neighborhoods are simply subject to less competition, and are therefore more likely to be correctly picked.

An explicit connection between lexically conditioned production patterns and perceptual difficulty in the low-frequency and high-ND words has been made by some researchers (though cf. Gahl et al. 2012). For instance, the hyperarticulation in high-ND words has been interpreted as serving to mitigate lexical difficulty, since the hyperarticulated vowels would render the high-ND words more distinguishable from their competitors (Wright 2003; Scarborough 2013; Scarborough and Zellou 2013). The increase in coarticulation that co-occurs with increased hyperarticulation in high ND words suggests that the coarticulation too may serve to improve the perceptibility of these words (Scarborough 2013). Since coarticulation is the overlapping production of discrete sounds, it is a source of robust, if redundant, information about the phonemes present in a word (Beddor 2009). For example, the nasalized vowel [ʌ̃] in ‘bun’ provides information about the upcoming nasal consonant, even before that segment occurs. Such information can improve word perception by delivering cues to two phonemes at once and thus allowing early prediction of the nasal (Ali et al. 1971; Lahiri and Marslen-Wilson 1991).

Indeed, recent empirical evidence indicates that both hyperarticulation and increased nasal coarticulation facilitate lexical perception. Speech exhibiting hyperarticulated vowels has been shown to be more intelligible than speech with less hyperarticulated vowels (Bradlow et al. 1996). Furthermore, words produced with hyperarticulation and the greatest increased nasal coarticulation, as found in listener-directed clear speech, are perceived more quickly than words produced with hyperarticulation but decreased nasal coarticulation (Scarborough and Zellou 2013). Also, earlier onset of anticipatory nasality in a vowel preceding a nasal consonant speeds lexical identification (Beddor et al. 2013). These findings suggest that that modifications made by speakers in clear speech, including increasing degree of nasal coarticulation and/or increasing vowel hyperarticulation, enhance the intelligibility of the speech signal (intentionally or not).

1.4 Lexical factors in infant-directed speech

If lexical factors like neighborhood density and word frequency are factors that condition accommodative adjustments in lab speech or speech directed to adults, we might predict that they should have similar effects in infant-directed speech, which is fairly explicitly accommodative. However, an infant lexicon is quite different from an adult lexicon – both in size and in organization. Not only do infants know fewer words, but they also have less experience with each word, and they know less about relations between words. Thus, neither lexical frequency nor neighborhood density, as calculated for adults, is a good representation of lexical difficulty for infants. Thus, if frequency and neighborhood density condition effects in speech production because they represent lexical difficulty for a listener that speakers try to compensate for, they would seem not to be relevant factors in IDS after all. Rather, a more age-appropriate lexical measure should be a better basis for lexical adjustments in infant-directed speech.

One such alternate approach might be to consider frequency and neighborhood density calculated over child speech. In fact, such lexical calculations have been made in work by Storkel and colleagues (e.g., Storkel and Hoover 2010, Storkel and Hoover 2011; Storkel et al. 2013), based on corpora of speech from kindergarteners and first-graders. However, even such norms are not fully appropriate for the current study, which investigates infant interlocutors (from whom speech corpora are obviously not available). Furthermore, although child-derived lexical statistics might be very useful in explaining child productions, word-learning in children, or lexical perception by children, they are possibly not relevant in the domain of speech produced by adults, even if produced to infants, since it is not clear how adults would have access to statistics derived from a real child lexicon in any case.

Lexical age-of-acquisition (AoA), or the adult-reported age at which a word was learned (Caroll and White 1973; Gilhooly and Logie 1981), is a better candidate for an appropriate measure of lexical difficulty for infants. First, such ratings make explicit reference to the developing lexicon, and insofar as children acquire easier words earlier, they refer directly to lexical difficulty for children. Further, the fact that AoA is based on subjective ratings from adults (Kuperman et al. 2012) means that AoA represents an adult’s assessment of how hard a given word might be for a child – precisely the sort of information that is accessible to an adult and that could influence the adult’s production to a child word-by-word.

Even though AoA measures are subjective ratings by adults, numerous studies report evidence that these measures do indeed reflect lexical difficulty for young children, both in producing and perceiving words. For example, elementary school-age children are faster at producing words with a lower AoA, both in picture-naming (Gilhooly and Gilhooly 1979; Gilhooly and Watson 1981) and word repetition (Garlock et al. 2001). Likewise, earlier AoA accurately predicts faster spoken word recognition (auditory gating task: Garlock et al. 2001; and auditory lexical decision: Cirrin 1984) and more accurate mispronunciation detection (Walley and Metsala 1992) in elementary school-age children. (Note that similar perceptual effects of AoA from word naming and lexical decision have been demonstrated for adults, as well; Brown and Watson 1987; Izura et al. 2011; Kuperman et al. 2012; inter alia). These effects have been explained as effects of lifetime cumulative frequency or a result of order of acquisition on representational activation (Kuperman et al. 2012). But since AoA measures are inherently subjective, they might also reflect frequency, ND, or any number of other linguistic, cognitive, or social factors (e.g., word length, semantic content, concept complexity, observational experience, etc.) that contribute to an adult’s evaluation of the difficulty of a given word in acquisition.

1.5 Current study

In this study, then, we investigate lexically conditioned variation in infant-directed speech, comparing it to lexically conditioned variation in adult-directed speech. The goal of the language-learning infant is not just to acquire the phonological inventory of the ambient language but also its lexicon, just as the goal of linguistic communication is to express not just sounds, but meaningful words (and phrases). Thus, we predict that infant-directed speech, like adult-directed speech, should contain lexically conditioned phonetic variation.

To that end, we investigate phonetic variation as a function of lexical factors in spontaneous infant-directed speech. To address the questions of (1) whether the lexically conditioned patterns of production in infant-directed speech, if they occur, are similar to those found in adult-directed speech, and (2) whether the factors that condition any such modifications are special to infant lexicons (or to assumed or deduced infant lexicons), we compare the patterns in infant-directed speech to those in spontaneous adult-directed speech. We consider these patterns in light of their hypothesized goal of enhancing the perceptibility of the speech signal for a given interlocutor. Thus, we predict specifically that lexical age-of-acquisition will condition phonetic variation in infant-directed speech, above and beyond any contribution of ND or lexical frequency, which condition variation in adult-directed speech.

2 Methods

2.1 Infant-directed speech

The infant-directed speech data for this study come from the Brent corpus (Brent and Siskind 2001). This corpus is a collection of spontaneous speech recordings of 16 English-speaking mothers talking to their preverbal infants, ranging in age from 9 to 15 months. The speakers were recruited and recorded in and around Baltimore, Maryland. Recording was done in the speakers’ homes, without an experimenter present, over more than a dozen sessions of 1.5–2 hours each. The middle 75 minutes of each session were extracted and transcribed by Brent and colleagues. (See Brent and Siskind 2001 for more detailed information about the data-collection methods for this corpus.) We automatically force-aligned the sound files to the transcripts at both the word and phoneme level using the FAVE-align component of the Forced Alignment and Vowel Extraction (FAVE) suite (Rosenfelder et al. 2011), based on the Penn Forced Aligner (Yuan and Liberman 2008).

2.2 Target words

The current study focuses on monosyllabic, monomorphemic content words containing exactly one nasal segment (either a pre-vocalic nasal segment (NV), e.g., nap, or a post-vocalic nasal segment (VN), e.g., hand) extracted from the recordings of these 16 mothers talking to their infants. As is discussed in Section 2.3.1, high vowels elude accurate acoustic nasality measurement, so words containing high vowels were excluded. The full data set consists of 109 unique word types containing a nasal segment, represented by 8,127 tokens. The average number of tokens per mother was 507 tokens, with a range of 179–1,135.

2.3 Acoustic measurements

2.3.1 Acoustic nasality (A1-P0 dB)

Lowering of the velum acoustically couples the nasal passages with the oral cavity, introducing nasal resonances in addition to the oral ones during vowel production. These nasal formants fall in relatively predictable and stable frequency ranges, with the lowest nasal formant around 250 Hz and the second nasal formant around 900 Hz (Chen 1997). As nasality increases, the relative amplitude (in dB) of these nasal formant peaks tends to increase, while the amplitude of the oral formant peaks, especially F1, tends to decrease. Thus the difference in amplitude between one of the nasal formants and F1 gives us a relative measure of nasalization: A1-P0 (where A1 is the amplitude of the F1 harmonic peak and P0 is the amplitude of the lowest nasal peak). The low F1 of high vowels can overlap with and thus obscure the lower nasal peak (P0) (Chen 1997); therefore, only words with non-high vowels were targeted in the current study.

The spectral characteristics of oral and nasalized vowels are illustrated in Figure 1, which compares vowels from the English words bad and hand, as spoken by one mother in this corpus. In bad (top spectrum), the amplitude of the first formant peak (A1) is greater than the peak corresponding to the nasal formant frequency (P0). Meanwhile, in a nasalized vowel from the word hand (bottom spectrum), the first formant peak has decreased in amplitude while the nasal formant peak has increased in amplitude. Since as nasality increases, A1 decreases and P0 increases, smaller A1-P0 values indicate greater vowel nasality.

Figure 1:

Spectra for an oral vowel, from bad (a), and for a nasalized vowel, from hand (b), from the speech of a mother in this study.

For each measurement, the boundary between the nasal and a vowel segment, which was placed automatically during forced alignment, was verified and hand-corrected as necessary. The accurate boundary was taken to be the point at which there was an abrupt reduction in amplitude of the higher formant frequencies in the spectrogram, accompanied by an abrupt change in amplitude in the waveform and simplification of waveform cycles. These criteria were used for both VN and NV sequences (but with the waveform cues in reverse order for the latter). A1-P0 measurements were made at the midpoint of each vowel via a Praat script which automatically identified A1 and P0. The frequencies of P0 and F1 were verified to ensure that they were appropriate for a given speaker and a given vowel quality.

2.3.2 Hyperarticulation

Hyperarticulation is measured as acoustic distance in F1-F2 space from vowel space center (e.g., Bradlow et al. 1996; Wright 2003). Vowel space center was calculated for each speaker by taking the F1 and F2 means of the high front vowel /i/ and the low back vowel /a/. For this purpose (because the test word set does not include high vowels), words containing /i/ were extracted from the Brent corpus in the same manner as the test words (i.e., monosyllabic content words with a vowel-adjacent nasal consonant); 931 tokens representing /i/ and 309 tokens representing /a/ (taken from the test words) were used to calculate vowel space centers.

F1 and F2 measurements were taken automatically via script in Praat at the midpoint of each target vowel, based on default Burg formant tracking analyses (5 formants in 5,500 Hz; 25 ms window) and verified by visual examination of wide band spectrograms. The Euclidean distance from vowel space center was calculated for each midpoint measurement, in bark-transformed F1-F2 space.

2.4 Lexical variables

2.4.1 Lexical age-of-acquisition (AoA)

Lexical age-of-acquisition is estimated as the adult-reported age at which a word was learned (Carroll and White 1973; Gilhooly and Gilhooly 1979). The AoA norms used in the current study were from Kuperman et al. (2012), who collected subjective ratings of over 30,000 English words from 1,960 raters, crowdsourced through Amazon Mechanical Turk. These raters were instructed for each word to provide “the age at which you would have understood that word if somebody had used it in front of you, EVEN IF YOU DID NOT use, read, or write it at the time” (Kuperman et al. 2012: 980, emphasis original). For the current study, these AoA ratings were provided for each word type extracted from the corpus, and AoA was centered.

2.4.2 Phonological neighborhood density (ND)

Phonological neighborhood density is a measure of the number of lexical neighbors for a given word. Neighbors are defined as words that differ from the target word by the addition, deletion, or substitution of a single phoneme (Luce et al. 1990). Nasal coarticulation has been shown to vary systematically in words depending on the number of phonological neighbors: words with many neighbors are produced with a greater degree of vowel nasality than words with fewer phonological neighbors (Scarborough 2013).

Frequency-weighted neighborhood density (ND), defined as the summed log frequencies in tokens per million words (SUBTLEX, Brysbaert and New 2009) of a word and its neighbors, was calculated for each lexical item in our samples (for example, the word snob has eight monosyllabic phonological neighbors in English and its summed log frequency is 14.4, while the word son has ten neighbors but its ND is 31.7). Neighbors were determined using the Hoosier Mental Lexicon (Nusbaum et al. 1984). The summed log frequencies (=ND) were centered.

2.4.3 Frequency

The SUBTLEX frequencies that we used are based on 51 million words of English movie subtitles. The SUBTLEX norms have been shown to predict lexical decision and naming reaction times more accurately than other available frequency measures, specifically compared to Kučera and Francis (1967) and CELEX (Baayen et al. 1996), suggesting that they could also provide a more useful basis for understanding more complex behavioral implications of frequency (Brysbaert and New 2009). Frequency counts per million words for each token were log transformed and centered.

Means, standard deviations, and ranges for Frequency, ND, and AoA (presented as non-centered) from the 109 word types containing exactly one nasal segment extracted from the Brent corpus are provided in Table 1.

Table 1:

Lexical statistics of the words extracted from the Brent corpus.

(log) frequencyNDAoA
Mean (SD)3.38 (0.78)39.23 (23.84)5.02 (1.4)
Range1.36–5.24.47–119.352.53–10.1

2.5 Model design

2.5.1 Fixed effects structure

Two separate linear mixed effects models were run for each dependent variable (nasal coarticulation and hyperarticulation). Each model consisted of the same fixed effects predictors and structure.

The lexical variables examined in the present study tend to strongly correlate with each other. For example, early-acquired words tend to be highly frequent words, and they also tend to have simple phonological structure, leading them to have a large number of lexical competitors (high ND). But, the studies focusing on AoA mentioned above have been careful to illustrate effects independent from the influence of the other factors, either through partialization of predictors (Carroll and White 1973) or binning of the continuous predictors into crossed categorical factors (Garlock et al. 2001).

We handle the collinearity between these variables through residualization and partialization (Gorman 2010). Multicollinearity, or non-independence of predictors, is a serious problem for regression because it violates the assumption of predictor orthogonality, making it impossible for the model-fitting procedure to correctly attribute variance to one predictor or the other. Residualization is a method for orthogonalizing predictor variables: given two highly collinear variables, one predictor is regressed onto another and the resulting residuals of that regression are used to replace the original predictor variables. In so doing, the goal is to assess the independent contributions to the dependent variable of otherwise related predictors (Kuperman et al. 2008). Partialization is the same technique, but for more than two multilinear predictors (Gorman 2010).

Residualization allows us to isolate the effects of a single predictor while simultaneously evaluating the same data for effects from multiple originally collinear predictors. It has been shown, for example, that a simultaneous analysis using residualization of multiple predictors produces the same result for a given predictor as if the variable was run by itself in a single-predictor model (Wurm and Fisicaro 2014). Concerns that residualization has been misused and/or misinterpreted are outlined by Wurm and Fisicaro (2014). Most notably, they object to residualization being used to orthogonalize predictors included “for the purposes of statistical control” (p. 38). However, for the purposes of the current study, residualization is both appropriate and well-motivated. Specifically, we aim to simultaneously assess the independent contributions of three lexical properties (age-of-acquisition, frequency, and ND) which are multicollinear. [1] Note that some researchers have dealt with the multicollinearity between multiple lexical predictors through stepwise multiple regression analysis, as opposed to residualization (see, e.g., Brysbaert et al. 2011; Kuperman et al. 2012). As discussed in Wurm and Fisicaro (2014: 46), the result of such a hierarchical analysis is the same as a simultaneous analysis using residualization.

Models were fit using the lme4 package (Bates et al. 2013, version 1.0-4) in R (R Core Team 2013), and we use the languageR package (Baayen 2011, version 1.1.4) to produce probabilities based on Markov Chain Monte Carlo sampling. As noted above, to mitigate the potentially misleading influence of multicollinearity, we first residualized ND on Frequency. (One member of a pair of multicollinear predictors is taken as a baseline (here, Frequency), and the other predictor (here, ND) was regressed linearly on the values of the baseline.) Next, we partialized AoA on both Frequency and the residuals of ND. Order of residualization/partialization procedure was determined from studies attributing the greatest amount of variance in visual lexical decision reaction times to frequency, then to ND and related structural properties, then to AoA (see Brysbaert and Cortese 2010; Brysbaert et al. 2011). Each model consisted of fixed predictors of Frequency (centered), ND (centered, residualized on Frequency), and AoA (centered, partialized on Frequency and ND residuals). The models also included two- and three-way interaction terms between these three main variables, testing the possibility that phonetic variation in IDS contains simultaneous influences of these factors.

Finally, we performed a critical test of this method to verify the accuracy of the results reported using residualization. The residualization/partialization procedure was reversed, i.e., we residualized ND on AoA (rather than AoA on ND), and the same patterns of results were obtained; that is, the same predictors that were computed as significant or not significant in the models reported in the current study were also found with the reverse residualization procedure. This is important because it means that the patterns in the data are indeed best explained by the independent contributions of the significant main predictor(s), rather than by some factor that is shared by both multiple predictors simultaneously which we are unable to tease apart using this procedure.

2.5.2 Random effects structure

We fit random intercepts for word and speaker in both models. This random effects structure allows for the joint possibilities of speaker idiosyncrasy with respect to overall degree of coarticulation and hyperarticulation, as well as lexical item idiosyncrasy with respect to overall phonetic patterns.

3 Results

Two linear mixed models analyzed hyperarticulation and nasal coarticulation measured from 8,127 tokens (109 unique word types containing a non-high vowel and a nasal consonant) from 16 mothers talking to their prelinguistic infants.

3.1 Hyperarticulation

The estimated coefficients and associated p-values from the model on hyperarticulation in IDS are presented in Table 2.

Table 2:

Fixed effects parameters of the hyperarticulation in IDS model. Bolded terms indicate significance at p < 0.05.

EstimateStd. errort-valuep-value
(Intercept)0.0030.080.040.96
AoA0.260.122.130.002
ND−0.080.12−0.610.32
Freq−0.150.13−1.120.06
AoA*ND0.190.250.740.23
AoA*Freq0.260.211.230.07
ND*Freq−0.070.24−0.290.47
AoA*ND*Freq0.360.351.0070.11

The model of hyperarticulation in IDS includes a significant main effect of AoA: with a positive coefficient, this effect indicates that words learned later have a greater degree of hyperarticulation. Word frequency was a marginally significant predictor of hyperarticulation: higher frequency words tended to be less hyperarticulated than lower frequency words. Neighborhood density was not a significant predictor of hyperarticulation in this IDS data set. There were no significant interactions among the three lexical predictors in the IDS hyperarticulation dataset.

The isolated univariate effects of (non-residualized) AoA and Frequency on hyperarticulation in IDS are illustrated in Figure 2: vowels from words that are reported as later-acquired and from words that are less frequent are more hyperarticulated than vowels from words that are reported to be learned earlier and words that are more frequent.

Figure 2:

Euclidean distance from vowel space center (in Bark) means for each word type, by AoA (a) and Frequency (b), from spontaneous infant-directed speech.

Recall that prior studies that have reported hyperarticulation in IDS as a register have focused on point vowels (/i/, /a/, /u/) (e.g., Kuhl et al. 1997, Kuhl et al. 2008; Cristia and Seidl 2014), and in fact, hyperarticulation may even be limited to these point vowels (McMurray et al. 2013). Although the current study does not address the presence or absence of hyperarticulation in IDS in general, we wanted to ask whether the AoA effects on hyperarticulation in IDS might also be limited to this subset of vowels. Since the current dataset examines only non-high vowels, /a/ is the only point vowel. Therefore, we ran a post-hoc model on a subset of the data excluding /a/ (i.e., on the non-point vowels). This non-point vowel model also revealed a significant effect of AoA (est=0.24, SE=0.14, p=0.008), a non-significant effect of ND (p=0.5), as well as a significant effect of frequency (est=−0.21, SE=0.17 p=0.03). Thus, it appears that the AoA-conditioned hyperarticulation generalizes across vowels in our data.

3.2 Nasal coarticulation

The estimated coefficients and associated p-values from the model on nasal coarticulation in IDS are presented in Table 3.

Table 3:

Fixed effects parameters of the nasal coarticulation in IDS model. Bolded terms indicate significance at p < 0.05.

EstimateStd. errort-valuep-value
(Intercept)6.090.4413.740.0001
AoA−0.830.29−2.860.01
ND0.030.011.790.09
Freq1.350.482.780.01
AoA*ND−0.030.021.590.16
AoA*Freq0.930.432.170.06
ND*Freq0.010.020.520.67
AoA*ND*Freq0.020.021.070.31

The model of nasal coarticulation in IDS includes a significant negative coefficient for AOA: later-acquired words have a lower A1-P0, which means they have greater nasalization. Word frequency was also a significant predictor of degree of nasality: higher-frequency words had less nasality than lower-frequency words. The model on nasality in IDS did not yield a significant main effect of neighborhood density. [2]

The isolated univariate effects of (non-residualized) AoA and Frequency on degree of nasal coarticulation in IDS are illustrated in Figure 3: words that are reported as later acquired and words that are less frequent are produced with a greater degree of vowel nasality (lower A1-P0 dB) than earlier-learned and more-frequent words. [3]

Figure 3:

Acoustic nasality (A1-P0; smaller=more nasal) means for each word type by AoA (a) and Frequency (b), from spontaneous infant-directed speech.

3.3 Interim discussion

The two models on vowel hyperarticulation and vowel nasality yielded systematic lexical AoA effects. In spontaneous speech to infants, mothers produced greater degrees of vowel hyperarticulation and nasal coarticulation in words that were subjectively rated as later-acquired, above and beyond other lexical effects. Meanwhile, ND had no demonstrable effect on these phonetic variables. Recall that prior studies have found evidence of consistent ND effects on hyperarticulation and coarticulation in laboratory speech, elicited as standard citation form speech or as explicitly (adult) listener-directed (e.g., Wright 2003; Munson and Solomon 2004; Scarborough 2013; Scarborough and Zellou 2013).

In order to understand the relationship between the current findings and those from prior studies, two possible explanations are considered: (1) Context matters in conditioning lexical variation; that is, AoA influences phonetic variation in IDS since that is the best metric for lexical difficulty for language learners; meanwhile, ND conditions production patterns in adult-directed speech because that is the best metric for lexical difficulty for interlocutors with fully-developed lexicons. Or, (2) AoA might condition lexical variation in all speech styles, for reasons related, for instance, to representational strength or cumulative frequency that this measure might represent, and prior studies of ADS did not uncover its influence since it had not previously been included as a predictor of speech production. These two possibilities can be addressed empirically via investigation of which lexical factors are significant in a model of adult-directed speech, with the same predictors as in the present IDS study. If the first explanation is correct, we would expect a non-significant effect of AoA and a significant effect of ND on conditioning phonetic variation in adult-directed speech. If the second explanation is correct, we would expect a significant AoA effect in the adult-directed speech, just as in the infant-directed speech. In the next section, we test these two hypotheses on a set of data from a spontaneous adult-directed speech corpus.

4 Adult-directed speech

4.1 Buckeye corpus of adult-directed speech

The spontaneous adult-directed speech data came from the Buckeye corpus (Pitt et al. 2005, Pitt et al. 2007). The entire corpus consists of spontaneous speech directed toward an adult interviewer from 40 speakers from Columbus, Ohio, each lasting approximately one hour (see Pitt et al. 2005 for more information about this corpus). In order to construct a sample of speakers comparable to our sample of IDS speakers, we analyzed a subset of this corpus – all female speakers who were under the age of 30 – resulting in 10 individuals’ recordings.

The sound files were automatically forced-aligned to the transcripts at both the word and phoneme level using FAVE-align. As in the IDS study, monosyllabic, monomorphemic content words containing exactly one nasal segment were extracted from these recordings. The ADS data set consists of 88 unique word types containing a nasal segment, represented by 897 tokens (with 77 additional tokens of nasal words with /i/ in order to calculate vowel space centers for each subject using /a/ and /i/ means; these 77 tokens were not independently included in the analysis). Hyperarticulation and degree of nasality (as A1-P0) were measured and calculated for each token using the same methods described in Section 2.3.

Means, standard deviations, and ranges for Frequency, ND, and AoA (all non-centered) from the 88 word types containing exactly one nasal segment extracted from the subset of 10 young female speakers selected from Buckeye corpus are provided in Table 4. Note that the words are strikingly similar to the words from the IDS part of the study with respect to their lexical statistics.

Table 4:

Lexical statistics of the words extracted from the Buckeye corpus.

(log) frequencyNDAoA
Mean (SD)3.74 (0.62)39.89 (25.7)5.36 (1.7)
Range2.39–4.993.98–119.352.53–9.79

Two lmer models were run on the hyperarticulation and nasal coarticulation data from the Buckeye ADS speech in the same manner as for the Brent data: fixed effects of centered Frequency, centered ND (residualized by frequency) and centered AoA (partialized by Frequency and residuals of ND), along with two- and three-way interaction terms. [4] Random intercepts of Speaker and Item were included as well.

4.2 Results

Two linear mixed models analyzed hyperarticulation and nasal coarticulation measured from 897 tokens (88 unique word types containing a non-high vowel and a nasal consonant) from 10 young women talking to an adult interlocutor.

The estimated coefficients and associated p-values from the models on hyperarticulation and nasality in ADS are presented in Tables 5 and 6, respectively.

Table 5:

Fixed-effects parameters of the hyperarticulation in ADS model. Bolded terms indicate significance at p < 0.05.

EstimateStd. errort-valuep-value
(Intercept)0.320.271.180.19
AoA0.50.281.760.08
ND0.80.372.180.01
Freq0.080.061.290.12
AoA*ND−1.330.59−2.240.02
AoA*Freq0.120.081.640.11
ND*Freq−0.170.09−1.790.04
AoA*ND*Freq0.40.162.420.01
Table 6:

Fixed-effects parameters of the nasal coarticulation in ADS model. Bolded terms indicate significance at p < 0.05.

EstimateStd. errort-valuep-value
(Intercept)0.120.180.680.52
AoA0.20.230.840.39
ND0.690.34−2.040.04
Freq0.030.040.890.41
AoA*ND0.190.510.360.7
AoA*Freq0.040.060.740.44
ND*Freq0.160.082.060.04
AoA*ND*Freq0.050.130.370.7

In both models, ND is a significant predictor of phonetic variation: high ND words are more hyperarticulated and have a greater degree of nasality (=lower A1-P0). AoA is not a significant main predictor of either of these phonetic variables in the ADS data.

We note that there are several significant interactions present in the two models of ADS. First, in both the model on hyperarticulation and the model on nasal coarticulation, there is a significant interaction of ND and Frequency. In order to interpret this interaction, we binned the word types extracted for the adult-directed speech data set into categories split by the mean of each sample (i.e., high/low frequency, high/low ND, high/low AoA). Figure 4 illustrates the significant ND by frequency interaction in each model, with means (and standard errors) of nasal coarticulation and hyperarticulation in each of the sub-categories. It appears that the interaction is driven by differentiation of high- and low-frequency words in high-ND words, but not in low-ND words. In other words, the effect of phonetic modification was most extreme in the most difficult adult word types – high-ND, low-frequency words.

Figure 4:

Means and standard errors of phonetic variables (Euclidean distance [bark; a] and acoustic nasality [b]), binned by ND and Frequency, from adult-directed speech.

These patterns were confirmed in a series of post-hoc linear mixed models on subsets of the data. First, for hyperarticulation, two post-hoc lmer regressions were run on the ND-binned data – one for higher-ND words and one for lower-ND words – with Frequency set as a continuous fixed predictor and random intercepts for Speaker and Item. The post-hoc regression for higher-ND words showed a significant correlation with Frequency (est.=−0.19, SE=0.09, t=−2), while the post-hoc regression for lower-ND words did not show a significant frequency effect (t=0.97). For nasal coarticulation, frequency again correlated with acoustic nasality in higher-ND words (est.=1.6, SE=0.94, t=1.8), but not in lower-ND words (t=1.1).

While AoA was not a significant main effect in either of the ADS models, AoA did participate in a significant two-way interaction with ND and a three-way interaction with ND and Frequency in the hyperarticulation model (but not in the nasal coarticulation model). The two-way interaction, illustrated with word types binned by high/low ND and high/low AoA, is shown in Figure 5.

Figure 5:

Means and standard errors of Euclidean distance (bark), binned by ND and AoA, from adult-directed speech.

As can be seen in Figure 5, the significant interaction appears to be driven by the (small) difference between early and late age of acquisition in high ND words, while there is no AoA-differentiation in hyperarticulation for low-ND words. Like with the ND by Frequency interaction, the effect of phonetic adjustment for AoA only emerges in the more difficult high-ND words. To confirm this effect, two post-hoc lmer regressions were run on the ND-binned data – one for higher-ND words and one for lower-ND words – with AoA set as a continuous fixed predictor and random intercepts for Speaker and Item. The post-hoc regression for higher-ND words revealed a significant effect of AoA (est.=0.22, SE=0.06, t=3.5), while the post-hoc regression for lower-ND words did not have a significant AoA effect (est.=0.006, SE=0.35, t=0.12).

5 General discussion

Several main observations stem from the patterns of lexically-conditioned phonetic variation in the two types of listener-directed speech examined in the current study.

First, we find that the infant-directed speech contains lexically-conditioned phonetic variation, specifically, as a function of lexical age-of-acquisition. Insofar as IDS may be structured to promote clarity or acquisition, it seems appropriate that IDS would have lexically-conditioned phonetic variation because words are the meaningful units that speakers seek to express and that learners need to be able to perceive and learn. Indeed, recent evidence suggests that words are targets of early language acquisition: for example, 9-month-old infants (in some cases, even as young as 6 months) display comprehension of common words, suggesting that language acquisition consists of identifying lexical, as well as phonemic, structure (Bergelson and Swingley 2012, Bergelson and Swingley 2013). Therefore, the idea that mothers might intuitively structure the speech signal to maximize perceptibility not only of individual phonemes but also on a word-by-word basis, as a function of how difficult they might be for the learner, is not implausible. (Future replication in different samples of infant-directed speech could confirm that AoA-differentiation is a general property of IDS as a register).

Secondly, coarticulation, as well as hyperarticulation, is controlled in infant-directed speech. Specifically, in IDS, speakers produced hyperarticulation and increased coarticulation in later-acquired words. With respect to hyperarticulation, we note that the results in the current study are not evidence that IDS vowels are overall hyperarticulated compared to ADS [5] (cf. McMurray et al. 2013); rather, they demonstrate that degree of hyperarticulation (like degree of coarticulation) is lexically conditioned, specifically by lexical age-of-acquisition, in IDS. With respect to coarticulation, our results are additionally novel in that patterns of coarticulation have been little-studied previously in IDS. The only previous study, to our knowledge, looking at coarticulation in speech directed to infants found greater coarticulation in IDS than in ADS (Andruski and Kuhl 1996), parallelling the greater hyperarticulation frequently reported for IDS. Other reported segment-level articulation effects in IDS include more canonical consonant allophones and less variable vowel allophones (Kuhl et al. 1997; Dilley et al. 2014; though cf. McMurray et al. 2013). Although these effects might seem contrary to the increased coarticulation in IDS (Andruski and Kuhl 1996), and in later-acquired words in IDS in particular (in the current study), they could in fact be interpreted as having a similar goal: to provide better information about the segments in a word. More extreme vowel articulation and more canonical consonant allophones straightforwardly provide better information about the identity of these segments; increased (consonant-to-vowel) coarticulation provides temporally-distributed and predictive information about the identity of the upcoming consonant (e.g., Ali et al. 1971; Lahiri and Marslen-Wilson 1991; Beddor et al. 2013; Scarborough and Zellou 2013). Thus, the confluent increase in degree of contextual nasalization (from consonant to vowel) and more extreme vowel articulation in ostensibly harder, later-acquired words in IDS are not incompatible and may even serve a similar purpose.

Thirdly, the lexically-conditioned phonetic variation patterns vary as a function of interlocutor type. Hyperarticulation and coarticulation vary systematically by ND (but not primarily by AoA [6]) across a variety of adult-directed speech contexts, with greater hyperarticulation and greater coarticulation in words from denser neighborhoods. Meanwhile, hyperarticulation and coarticulation vary systematically by AoA (but not ND) in infant-directed speech, with the same properties – greater hyperarticulation and coarticulation – in later-acquired words. This fundamental similarity between ND effects in ADS and AoA effects in IDS suggests that the two factors represent similar cognitive or communicative phenomena, but in a listener-targeted way. In other words, we could interpret both ND and AoA as measures of lexical difficulty, but where difficulty is evaluated differently for adult interlocutors than for infants.

Note that if AoA were reflective of cumulative frequency or affecting speech output via representational strength (as suggested by Kuperman et al. 2012), we would expect to see its effects consistently across interlocutors. Rather, AoA plays a role where real age-of-acquisition (or whether or not a child would know a word) is relevant, i.e., when speaking to a child acquiring language. In other words, AoA is a relevant measure for exactly what it is – an adult’s assessment of lexical difficulty for a child. The lack of ND effects in IDS suggests a parallel argument: if these ND effects were exclusively the result of representation or speaker-internal mechanisms (e.g., Gahl et al. 2012), we would expect to see the effects consistently across interlocutors. In fact, just the empirical finding that different lexical variables condition phonetic variation across different speech styles demonstrates that speakers make lexical adjustments in their speech based on context. It should be acknowledged that, of course, properties other than age of interlocutor (adult vs. child) differed between the IDS and ADS corpora used in the current study (for example, familiarity between speaker and interlocutor, place of recording, interlocutor verbal or non-verbal responses, regional dialect, etc.). We cannot discount that these factors may contribute as well to the differences in the patterns observed in the two samples, though we have no predictions about the specific effects of any of these variables on lexically-conditioned effects.

The current study did reveal one mostly consistent lexical influence across both speech styles: lexical frequency. In both the IDS and ADS data, lower-frequency words were more hyperarticulated and showed greater nasal coarticulation (marginally so in IDS). In other words, lexical frequency appears to have a stable effect across interlocutors. (See also Lahey and Ernestus 2014, who report a stable influence of word frequency on reduction in IDS and ADS.) This suggests that frequency effects may result from a different cognitive mechanism than either ND or AoA effects. Such a distinction between frequency and ND effects has been supported elsewhere as well. For example, when participants are instructed to respond immediately to a word prompt (i.e., lexical access is ‘stressed’), vowel hyperarticulation is influenced by both frequency and ND; yet, when participants are instructed to wait for one second before responding to a word prompt (i.e., lexical access is ‘facilitated’), frequency no longer has an effect on hyperarticulation, though ND still does (Munson 2007). In other words, frequency and ND effects do not appear to result from the same cognitive processes: frequency effects can be attributed to the process of lexical activation of a word from memory, while ND effects can be attributed to real-time speech production processes (Munson 2007). This interpretation fits with the patterns reported in the current study, as well: the influence of frequency on lexical access remains consistent regardless of interlocutor type, but the mechanisms that influence production as a function of word difficulty occur in later stages of production and can be adjusted for different interlocutors.

Finally, taken together, we believe the patterns of findings in the current study support the stance that clarity-based adjustments in speech are sensitive to lexical-level factors and that evaluation of the need for clarity is tuned to the listener (cf. Hazan and Baker 2011). Perhaps the strongest version of such a position with respect to the age-of-acquisition effects found in IDS in the current study would postulate that caregivers’ adjustments in IDS are attempts to aid the child in acquiring language – in this case, producing subjectively harder words for the infant in a manner that might make them more learnable. (In fact, however, we might expect that lexically-specific hyperarticulation in IDS would be focused on lower-AoA words that were closer to the capabilities of these very young children, and thus more likely targets of acquisition, e.g., Sundberg 1998.) But such a didactic position is certainly not entailed by our findings. They may rather be part of a more general communicative strategy whereby speakers modulate speech clarity to compensate for difficulties encountered by listeners (e.g., Lindblom 1990, inter alia), even difficulties at a word-by-word level (e.g., Wright 2003; Scarborough 2013; Scarborough and Zellou 2013). Just as it is axiomatic that speech is variable, in part to ensure clarity, so too are listeners variable, and the type of information that would be useful and informative for some listeners may not help others. The results in the current study suggest that speakers are sensitive to this fact. More specifically, we suggest that speakers modify their speech to compensate for word difficulty and that speakers assess word difficulty on the basis of interlocutor type. The present patterns of findings seem to indicate that lexical difficulty can be evaluated by AoA in speech directed toward infants, while ND is a more relevant measure of lexical difficulty in adult-directed speech.

Acknowledgments

We are grateful to Eric Doty, Dave Embick, Will Styler, Meredith Tamminga, and Santiago Barreda for various discussions, assistance, and feedback. We also thank the editors, two anonymous reviewers, and the audience of LabPhon 14 for constructive comments and suggestions.

References

Ali, Latif, T. Gallagher, J. Goldstein, & Raymond Daniloff. 1971. Perception of coarticulated nasality. Journal of the Acoustical Society of America 49. 538–540.10.1121/1.1912384Search in Google Scholar

Andruski, Jean E., & Parricia K. Kuhl. 1996. The acoustic structure of vowels in mothers’ speech to infants and adults. In Proceedings of the Fourth International Conference on Spoken Language, Vol. 3, 1545–1548. Philadelphia, PA: IEEE.10.1109/ICSLP.1996.607913Search in Google Scholar

Baayen, R. Harald 2011. languageR: Data sets and functions. With Analyzing linguistic data: A practical introduction to statistics. R package version 1.10.1017/CBO9780511801686.002Search in Google Scholar

Baayen, R. Harald, Richard Piepenbrock & L. Gulikers. 1996. The CELEX Lexical Database (Release 2) [CD-ROM]. Philadelphia, PA: Linguistic Data Consortium, University of Pennsylvania [distributor].Search in Google Scholar

Bates, Douglas, Martin Maechler, Ben Bolker & Steven Walker. 2013. lme4: Linear mixed-effects models using Eigen and S4. R package version 1.0–4.Search in Google Scholar

Beddor, Patrice S. 2009. A coarticulatory path to sound change. Language 85(4). 785–821.10.1353/lan.0.0165Search in Google Scholar

Beddor, Patrice S., Kevin B. McGowan, Julie E. Boland, Andries W. Coetzee & Anthony Brasher. 2013. The time course of perception of coarticulation. The Journal of the Acoustical Society of America 133(4). 2350–2366.10.1121/1.4794366Search in Google Scholar

Bell, Alan, Jason Brenier, Michelle Gregory, Cynthia Girand & Dan Jurafsky. 2009. Predictability effects on durations on content and function words in conversational English. Journal of Memory and Language 60. 92–111.10.1016/j.jml.2008.06.003Search in Google Scholar

Benders, Titia. 2013. Mommy is only happy! Dutch mothers’ realisation of speech sounds in infant-directed speech expresses emotion, not didactic intent. Infant Behavior and Development 36(4). 847–862.10.1016/j.infbeh.2013.09.001Search in Google Scholar

Bergelson, Elika & Daniel Swingley. 2012. At 6–9 months, human infants know the meanings of many common nouns. Proceedings of the National Academy of Sciences 109(9). 3253–3258.10.1073/pnas.1113380109Search in Google Scholar

Bergelson, Elika & Daniel Swingley. 2013. The acquisition of abstract words by young infants. Cognition 127(3). 391–397.10.1016/j.cognition.2013.02.011Search in Google Scholar

Bortfeld, Heather & James L. Morgan. 2010. Is early word-form processing stress-full? How natural variability supports recognition. Cognitive Psychology 60(4). 241–266.10.1016/j.cogpsych.2010.01.002Search in Google Scholar

Bradlow, Ann R. 2002. Confluent talker- and listener-related forces in clear speech production. In Carlos Gussenhoven & Natasha Warner (eds.), Laboratory phonology 7, 241–273. Berlin & New York: Mouton de Gruyter.10.1515/9783110197105.1.241Search in Google Scholar

Bradlow, Ann R., Nina Kraus & Erin Hayes. 2003. Speaking clearly for children with learning disabilities. Journal of Speech, Language, and Hearing Research 46. 80–97.10.1044/1092-4388(2003/007)Search in Google Scholar

Bradlow, Ann R., Gina M. Torretta & David B. Pisoni. 1996. Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics. Speech Communication 20(3). 255–272.10.1016/S0167-6393(96)00063-5Search in Google Scholar

Brent, Michael R. & Jeffrey M. Siskind. 2001. The role of exposure to isolated words in early vocabulary development. Cognition 81(2). B33–B44.10.1016/S0010-0277(01)00122-6Search in Google Scholar

Brown, Gordon D. & Frances L. Watson. 1987. First in, first out: Word learning age and spoken word frequency as predictors of word familiarity and word naming latency. Memory & Cognition 15(3). 208–216.10.3758/BF03197718Search in Google Scholar

Bryant, Gregory A. & H. Clark Barrett. 2007. Recognizing intentions in infant-directed speech: Evidence for universals. Psychological Science 18(8). 746–751.10.1111/j.1467-9280.2007.01970.xSearch in Google Scholar

Brysbaert, Marc, Matthias Buchmeier, Markus Conrad, Arthur M. Jacobs, Jens Bölte & Andrea Böhl. 2011. The word frequency effect: a review of recent developments and implications for the choice of frequency estimates in German. Experimental Psychology 58(5). 412.10.1027/1618-3169/a000123Search in Google Scholar

Brysbaert, Marc & Michael J. Cortese. 2010. Do the effects of subjective frequency and age of acquisition survive better word frequency norms? The Quarterly Journal of Experimental Psychology 64(3). 545–559.10.1080/17470218.2010.503374Search in Google Scholar

Brysbaert, Marc & Boris New. 2009. Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods 41(4). 977–990.10.3758/BRM.41.4.977Search in Google Scholar

Carroll, John B. & Margaret N. White. 1973. Age-of-acquisition norms for 220 picturable nouns. Journal of Verbal Learning and Verbal Behavior 12(5). 563–576.10.1016/S0022-5371(73)80036-2Search in Google Scholar

Chen, Marilyn Y. 1997. Acoustic correlates of English and French nasalized vowels. The Journal of the Acoustical Society of America 102(4). 2360–2370.10.1121/1.419620Search in Google Scholar

Cirrin, Frank M. 1984. Lexical search speed in children and adults. Journal of Experimental Child Psychology 37(1). 158–175.10.1016/0022-0965(84)90064-XSearch in Google Scholar

Cristia, Alejandrina. 2013. Input to language: The phonetics and perception of infant‐directed speech. Language and Linguistics Compass 7(3). 157–170.10.1111/lnc3.12015Search in Google Scholar

Cristia, Alejandrina & Amanda Seidl. 2014. The hyperarticulation hypothesis of infant-directed speech. Journal of Child Language 41(4). 913–934.10.1017/S0305000912000669Search in Google Scholar

de Boer, Bart. 2005. Infant directed speech and the evolution of language, In Maggie Tallerman (ed.), Evolutionary prerequisites for language, 100–121. Oxford: Oxford University Press.Search in Google Scholar

de Boer, Bart & Patricia K. Kuhl. 2003. Investigating the role of infant-directed speech with a computer model. Acoustics Research Letters Online 4(4). 129–134.10.1121/1.1613311Search in Google Scholar

Dilley, Laura C., Amanda L. Millett, J. Devin McAuley & Tonya Bergeson. 2014. Phonetic variation in consonants in infant-directed and adult-directed speech: The case of regressive place assimilation in word-final alveolar stops. Journal of Child Language 41(1). 155–175.10.1017/S0305000912000670Search in Google Scholar

Ferguson, Charles A. 1964. Baby talk in six languages. American Anthropologist 103–114.10.1525/aa.1964.66.suppl_3.02a00060Search in Google Scholar

Fernald, Anne. 1984. The perceptual and affective salience of mother’s speech to infants. In Lynne Feagans, Catherine Garvey & Roberta Golinkoff (eds.), The origins of growth and communication. Norwood, NJ: Ablex.Search in Google Scholar

Fernald, Anne. 1985. Four-month-old infants prefer to listen to motherese. Infant Behavior and Development 8(2). 181–195.10.1016/S0163-6383(85)80005-9Search in Google Scholar

Fernald, Anne. 1991. Prosody in speech to children: Prelinguistic and linguistic functions. Annals of Child Development 8. 43–80.Search in Google Scholar

Fernald, Anne. 1992. Meaningful melodies in mother’s speech to infants. In H. Papousek, U. Jurgens & M. Papousek (eds.), Nonverbal vocal behavior, 262–280. Cambridge: Cambridge University Press.Search in Google Scholar

Fernald, Anne. 1993. Approval and disapproval: Infant responsiveness to vocal affect in familiar and unfamiliar languages. Child Development 64(3). 657–674.10.2307/1131209Search in Google Scholar

Fernald, Anne & Patricia Kuhl. 1987. Acoustic determinants of infant preference for motherese speech. Infant Behavior and Development 10(3). 279–293.10.1016/0163-6383(87)90017-8Search in Google Scholar

Fernald, Anne, Traute Taeschner, Judy Dunn, Mechthild Papousek, Bénédicte de Boysson-Bardies & Ikuko Fukui. 1989. A cross-language study of prosodic modifications in mothers’ and fathers’ speech to preverbal infants. Journal of Child Language 16(3). 477–501.10.1017/S0305000900010679Search in Google Scholar

Fidelholz, James. 1975. Word frequency and vowel reduction in English. In CLS-75, 200–213. Chicago: University of Chicago Press.Search in Google Scholar

Fowler, Carol & Jonathan Housum. 1987. Talkers’ signaling of “new” and “old” words in speech and listeners’ perception and use of the distinction. Journal of Memory and Language 26. 489–504.10.1016/0749-596X(87)90136-7Search in Google Scholar

Gahl, Susanne, Yao Yao & Keith Johnson. 2012. Why reduce? Phonological neighborhood density and phonetic reduction in spontaneous speech. Journal of Memory and Language 66. 789–806.10.1016/j.jml.2011.11.006Search in Google Scholar

Garlock, Victoria M., Amanda C. Walley & Jamie L. Metsala. 2001. Age-of-acquisition, word frequency, and neighborhood density effects on spoken word recognition by children and adults. Journal of Memory and Language 45(3). 468–492.10.1006/jmla.2000.2784Search in Google Scholar

Gilhooly, Ken J. & Mary L. Gilhooly. 1979. Age-of-acquisition effects in lexical and episodic memory tasks. Memory & Cognition 7(3). 214–223.10.3758/BF03197541Search in Google Scholar

Gilhooly, Ken J. & Robert H. Logie. 1981. Word age-of-acquisition, reading latencies and auditory recognition. Current Psychological Research 1(3–4). 251–262.10.1007/BF03186735Search in Google Scholar

Gilhooly, Ken J. & F. L. Watson. 1981. Word age-of-acquisition effects: A review. Current Psychological Reviews 1(3). 269–286.10.1007/BF02684489Search in Google Scholar

Goldinger, Stephen D., Paul A. Luce & David B. Pisoni. 1989. Priming lexical neighbors of spoken words: Effects of competition and inhibition. Journal of Memory and Language 28. 501–518.10.1016/0749-596X(89)90009-0Search in Google Scholar

Gorman, Kyle. 2010. The consequences of multicollinearity among socioeconomic predictors of negative concord in Philadelphia. University of Pennsylvania Working Papers in Linguistics 16(2). 66–75.Search in Google Scholar

Hazan, Valerie & Rachel Baker. 2011. Acoustic-phonetic characteristics of speech produced with communicative intent to counter adverse listening conditions. The Journal of the Acoustical Society of America 130(4). 2139–2152.10.1121/1.3623753Search in Google Scholar

Howes, Davis H. 1957. On the relation between the intelligibility and frequency of occurrence of English words. Journal of the Acoustical Society of America 29. 296–305.10.1121/1.1908862Search in Google Scholar

Igarashi, Yosuke, Ken’ya Nishikawa, Kuniyoshi Tanaka & Reiko Mazuka. 2013. Phonological theory informs the analysis of intonational exaggeration in Japanese infant-directed speech. The Journal of the Acoustical Society of America 134(2). 1283–1294.10.1121/1.4812755Search in Google Scholar

Izura, Cristina, Miguel Pérez, Elizabeth Agallou, Victoria C. Wright, Javier Marín, Hans Stadthagen-González & Andrew W. Ellis. 2011. Age/order of acquisition effects and the cumulative learning of foreign words: A word training study. Journal of Memory and Language 64(1). 32–58.10.1016/j.jml.2010.09.002Search in Google Scholar

Jurafsky, Daniel, Alan Bell, Michelle Gregory & William D. Raymond. 2000. Probabilistic relations between words: Evidence from reduction in lexical production. In Joan Bybee & Peter Hopper (eds.), Frequency and the emergence of linguistic structure, 229–254. Amsterdam: John Benjamins.10.1075/tsl.45.13jurSearch in Google Scholar

Kirchhoff, Katrin & Steven Schimmel. 2005. Statistical properties of infant-directed versus adult-directed speech: Insights from speech recognition. The Journal of the Acoustical Society of America 117(4). 2238–2246.10.1121/1.1869172Search in Google Scholar

Krause, Jean & Louis D. Braida. 2003. Acoustic properties of naturally produced clear speech at normal speaking rates. Journal of the Acoustical Society of America 115. 362–378.10.1121/1.1635842Search in Google Scholar

Kučera, Henry & W. Nelson Francis. 1967. Computational analysis of present-day American English. Providence, RI: Brown University Press.Search in Google Scholar

Kuhl, Patricia K., Jean E. Andruski, Inna A. Chistovich, Ludmilla A. Chistovich, Elena V. Kozhevnikova, Viktoria L. Ryskina, Elvira I. Stolyarova, Ulla Sundberg & Francisco Lacerda. 1997. Cross-language analysis of phonetic units in language addressed to infants. Science 277(5326). 684–686.10.1126/science.277.5326.684Search in Google Scholar

Kuhl, Patricia K., Barbara Conboy, Sharon Coffey-Corina, Denise Padden, M. Rivera-Gaxiola, & T. Nelson. 2008. Phonetic learning as a pathway to language: New data and native language magnet theory expanded (NLM-e). Philosophical Transactions of the Royal Society B: Biological Sciences 363(1493). 979–1000.10.1098/rstb.2007.2154Search in Google Scholar

Kuperman, Victor, Raymond Bertram & R. Harald Baayen. 2008. Morphological dynamics in compound processing. Language and Cognitive Processes 23(7–8), 1089–1132.10.1080/01690960802193688Search in Google Scholar

Kuperman, Victor, Hans Stadthagen-Gonzalez & Marc Brysbaert. 2012. Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods 44(4). 978–990.10.3758/s13428-012-0210-4Search in Google Scholar

Lahey, Mybeth & Mirjam Ernestus. 2014. Pronunciation variation in infant-directed speech: Phonetic reduction of two highly frequent words. Language Learning and Development 10(4). 308–327.10.1080/15475441.2013.860813Search in Google Scholar

Lahiri, Aditi & William Marslen-Wilson. 1991. The mental representation of lexical form: A phonological approach to the recognition lexicon. Cognition 38(3). 245–294.10.1016/0010-0277(91)90008-RSearch in Google Scholar

Lane, Harlan & Bernard Tranel. 1971. The Lombard sign and the role of hearing in speech. Journal of Speech and Hearing Research 14. 677–709.10.1044/jshr.1404.677Search in Google Scholar

Liberman, Alvin M., F. S. Cooper, Donald P. Shankweiler & Michael Studdert-Kennedy. 1967. Perception of the speech code. Psychological Review 74(6). 431.10.1037/h0020279Search in Google Scholar

Lieberman, P. 1963. Some effects of semantic and grammatical context and isolation. Journal of Speech Hearing Disorders 22. 87–90.Search in Google Scholar

Lindblom, Björn. 1990. Explaining phonetic variation: A sketch of the H&H theory. In William J. Hardcastle & Alain Marchal (eds.), Speech production and speech modelling, 403–439. Amsterdam: Springer Netherlands.10.1007/978-94-009-2037-8_16Search in Google Scholar

Lombard, Étienne. 1911. Le signe de l’élévation de la voix. Annales des Maladies de L’Oreille et du Larynx XXXVII 2. 101–109.Search in Google Scholar

Luce, Paul A. 1986. Neighborhoods of words in the mental lexicon. Research on Speech Perception. Technical Report No. 6.Search in Google Scholar

Luce, Paul A. & David B. Pisoni. 1998. Recognizing spoken words: The neighborhood activation model. Ear and Hearing 19(1). 1–36.10.1097/00003446-199802000-00001Search in Google Scholar

Luce, Paul A., David B. Pisoni & Steven Goldinger. 1990. Similarity neighborhoods of spoken words. In Gerry Altmann (ed.), Cognitive models of speech processing, 122–147. Cambridge, MA: MIT Press.Search in Google Scholar

Marslen-Wilson, William & Pienie Zwitserlood. 1989. Accessing spoken words: The importance of word onsets. Journal of Experimental Psychology 15(3). 576–585.10.1037/0096-1523.15.3.576Search in Google Scholar

Martin, Andrew, Thomas Schatz, Maarten Versteegh, Kouki Miyazawa, Reiko Mazuka, Emmanuel Dupoux & Alejandrina Cristia. 2015. Mothers speak less clearly to infants than to adults: A comprehensive test of the hyperarticulation hypothesis. Psychological Science 26(3), 341–347.10.1177/0956797614562453Search in Google Scholar

Martin, Andrew, Akira Utsugi & Reiko Mazuka. 2014. The multidimensional nature of hyperspeech: Evidence from Japanese vowel devoicing. Cognition 132(2). 216–228.10.1016/j.cognition.2014.04.003Search in Google Scholar

Mathies, Melanie, Pascal Perrier, Joseph S. Perkell & Majid Zandipour. 2001. Variation in anticipatory coarticulation with changes in clarity and rate. Journal of Speech and Hearing Research 44. 340–353.10.1044/1092-4388(2001/028)Search in Google Scholar

McMurray, Bob, Kristine A. Kovack-Lesh, Dresden Goodwin & William McEchron. 2013. Infant directed speech and the development of speech perception: Enhancing development or an unintended consequence? Cognition 129(2). 362–378.10.1016/j.cognition.2013.07.015Search in Google Scholar

Moon, Seung-Jae & Björn Lindblom. 1989. Formant undershoot in clear and citation-form speech: A second progress report. Speech Transmission Laboratory, QPSR 1. 121–123.Search in Google Scholar

Moon, Seung-Jae & Björn Lindblom. 1994. Interaction between duration, context and speaking style in English stressed vowels. Journal of the Acoustical Society of America 96(1). 40–55.10.1121/1.410492Search in Google Scholar

Munson, Benjamin & Nancy Pearl Solomon. 2004. The influence of phonological neighborhood density on vowel articulation. Journal of Speech, Language, and Hearing Research 47. 1048–1058.10.1044/1092-4388(2004/078)Search in Google Scholar

Munson, Benjamin. 2007. Lexical access, lexical representation, and vowel production. Laboratory Phonology 9. 201–228.Search in Google Scholar

Norris, Dennis, James M. McQueen & Anne Cutler. 2000. Merging information in speech recognition: Feedback is never necessary. Behavioral and Brain Sciences 23(3). 299–325.10.1017/S0140525X00003241Search in Google Scholar

Nusbaum, Howard C., David B. Pisoni & Christopher K. Davis. 1984. Sizing up the Hoosier mental lexicon: Measuring the familiarity of 20,000 words. Research on Speech Perception Progress Report 10. 357–376.Search in Google Scholar

Picheny, Michael, Nathaniel Durlach & Louis Braida. 1986. Speaking clearly for the hard-of-hearing: 2. Acoustic characteristics of clear and conversational speech. Journal of Speech and Hearing Research 29. 434–446.10.1044/jshr.2904.434Search in Google Scholar

Pitt, Mark A., Laura Dilley, Keith Johnson, Scott Kiesling, William Raymond, Elizabeth Hume & Eric Fosler-Lussier. 2007. Buckeye corpus of conversational speech (2nd release). Columbus, OH: Department of Psychology, Ohio State University.Search in Google Scholar

Pitt, Mark A., Keith Johnson, Elizabeth Hume, Scott Kiesling & William Raymond. 2005. The Buckeye corpus of conversational speech: Labeling conventions and a test of transcriber reliability. Speech Communication 45(1). 89–95.10.1016/j.specom.2004.09.001Search in Google Scholar

Rattanasone, Nan Xu, Denis Burnham & Ronan Gabriel Reilly. 2013. Tone and vowel enhancement in Cantonese infant-directed speech at 3, 6, 9, and 12 months of age. Journal of Phonetics 41(5). 332–343.10.1016/j.wocn.2013.06.001Search in Google Scholar

Raymond, William D., Robin Dautricourt & Elizabeth Hume. 2006. Word-internal /t, d/ deletion in spontaneous speech: Modeling the effects of extra-linguistic, lexical, and phonological factors. Language Variation and Change 18(01). 55–97.10.1017/S0954394506060042Search in Google Scholar

R Core Team. 2013. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. http://www.R-project.org/ (accessed 1 January 2013).Search in Google Scholar

Rosenfelder, Ingrid, Joe Fruehwald, Keelan Evanini & Jiahong Yuan. 2011. FAVE (forced alignment and vowel extraction) program suite. http://fave.ling.upenn.edu (accessed 1 January 2013).Search in Google Scholar

Scarborough, Rebecca. 2010. Lexical and contextual predictability: Confluent effects on the production of vowels. In Cécile Fougeron, Barbara Kuhnert, Mariapaola D’Imperio & Nathalie Vallée (eds.), Papers in laboratory phonology X, 557–586. Berlin: Mouton de Gruyter.Search in Google Scholar

Scarborough, Rebecca. 2013. Neighborhood-conditioned patterns in phonetic detail: Relating coarticulation and hyperarticulation. Journal of Phonetics 41(6). 491–508.10.1016/j.wocn.2013.09.004Search in Google Scholar

Scarborough, Rebecca & Georgia Zellou. 2013. Clarity in communication: “Clear” speech authenticity and lexical neighborhood density effects in speech production and perception. The Journal of the Acoustical Society of America 134(5). 3793–3807.10.1121/1.4824120Search in Google Scholar

Singh, Leher, Sarah Nestor, Chandni Parikh & Ashley Yull. 2009. Influences of infant-directed speech on early word recognition. Infancy 14(6). 654–666.10.1080/15250000903263973Search in Google Scholar

Smith, Nicholas A. & Laurel J. Trainor. 2008. Infant-directed speech is modulated by infant feedback. Infancy 13(4). 410–420.10.1080/15250000802188719Search in Google Scholar

Soderstrom, Melanie. 2007. Beyond babytalk: Re-evaluating the nature and content of speech input to preverbal infants. Developmental Review 27(4). 501–532.10.1016/j.dr.2007.06.002Search in Google Scholar

Song, Jae Yung, Katherine Demuth & James Morgan. 2010. Effects of the acoustic properties of infant-directed speech on infant word recognition. The Journal of the Acoustical Society of America 128(1). 389–400.10.1121/1.3419786Search in Google Scholar

Storkel, Holly L., Daniel Bontempo, Andrew J. Aschenbrenner, Junko Maekawa & Su-Yeon Lee. 2013. The effect of incremental changes in phonotactic probability and neighborhood density on word learning by preschool children. Journal of Speech, Language, and Hearing Research 56(5). 1689–1700.10.1044/1092-4388(2013/12-0245)Search in Google Scholar

Storkel, Holly L. & Jill R. Hoover. 2010. An online calculator to compute phonotactic probability and neighborhood density on the basis of child corpora of spoken American English. Behavior Research Methods 42(2). 497–506.10.3758/BRM.42.2.497Search in Google Scholar

Storkel, Holly L. & Jill R. Hoover. 2011. The influence of part-word phonotactic probability/neighborhood density on word learning by preschool children varying in expressive vocabulary. Journal of Child Language 38(03). 628–643.10.1017/S0305000910000176Search in Google Scholar

Sundberg, Ulla. 1998. Mother tongue-phonetic aspects of infant-directed speech. Stockholm: PERILUS.Search in Google Scholar

Thiessen, Erik D., Emily A. Hill & Jenny R. Saffran. 2005. Infant‐directed speech facilitates word segmentation. Infancy 7(1). 53–71.10.1207/s15327078in0701_5Search in Google Scholar

Trainor, Laurel J. & Renée N. Desjardins. 2002. Pitch characteristics of infant-directed speech affect infants’ ability to discriminate vowels. Psychonomic Bulletin & Review 9(2). 335–340.10.3758/BF03196290Search in Google Scholar

Vitevitch, Michael. 1997. The neighborhood characteristics of malapropisms. Language and Speech 40. 211–228.10.1177/002383099704000301Search in Google Scholar

Vitevitch, Michael. 2002. The influence of phonological similarity neighborhoods on speech production. Journal of Experimental Psychology: Learning, Memory, and Cognition 28. 735–747.10.1037/0278-7393.28.4.735Search in Google Scholar

Vitevitch, Michael & Paul Luce. 1998. When words compete: Levels of processing in perception of spoken words. Psychological Science 9. 325–329.10.1111/1467-9280.00064Search in Google Scholar

Vosoughi, Soroush & Deb K. Roy. 2012. A longitudinal study of prosodic exaggeration in child‐directed speech. In Proceedings of the Speech Prosody 6th International Conference. 194–197.Search in Google Scholar

Walley, Amanda C. & Jamie L. Metsala. 1992. Young children’s age-of-acquisition estimates for spoken words. Memory & Cognition 20(2). 171–182.10.3758/BF03197166Search in Google Scholar

Werker, Janet F., Judith E. Pegg & Peter J. McLeod. 1994. A cross-language investigation of infant preference for infant-directed communication. Infant Behavior and Development 17(3). 323–333.10.1016/0163-6383(94)90012-4Search in Google Scholar

Wright, Richard. 2003. Factors of lexical competition in vowel articulation. In John Local, Richard Ogden & Rosalind Temple (eds.), Papers in laboratory phonology VI, 26–50. Cambridge: Cambridge University Press.Search in Google Scholar

Wurm, Lee H. & Sebastiano A. Fisicaro. 2014. What residualizing predictors in regression analyses does (and what it does not do). Journal of Memory and Language 72. 37–48.10.1016/j.jml.2013.12.003Search in Google Scholar

Yuan, Jiahong & Mark Liberman. 2008. Speaker identification on the SCOTUS corpus. Journal of the Acoustical Society of America 123(5). 3878.10.1121/1.2935783Search in Google Scholar

Zipf, George K. 1935. The psychobiology of language. New York: Houghton-Mifflin.Search in Google Scholar

Published Online: 2015-10-14
Published in Print: 2015-10-1

©2015 by De Gruyter Mouton

Downloaded on 30.11.2023 from https://www.degruyter.com/document/doi/10.1515/lp-2015-0010/html
Scroll to top button