Speech signals are highly variable – not just between speakers, who may differ in their physical characteristics as well as their linguistic backgrounds, but also between communicative contexts (who is speaking to whom and under what conditions), between phonetic contexts, and even between individual words. This variability is viewed alternately as ‘noise’ that must be factored out by listeners (Liberman et al. 1967) or (at least for some of it) as potentially useful information that constitutes part of an adaptive communicative system (Lindblom 1990). In this study, we focus on a specialized context, namely speech directed to infants (IDS), which has various phonetic characteristics that differ from those of speech directed to adults (ADS, which we take to be the default). Within this context we investigate word-by-word variability in particular. We consider that the systematic phonetic properties characterizing this type of variability are conditioned by a general goal of enhancing the perceptibility of the speech signal.
1.1 Phonetic variation in infant-directed speech
Speech directed toward infants is characterized by differences in pronunciation relative to speech to adults. Overall, mothers talking to their infants tend to produce slower rates of speech (Fernald 1992), higher and broader pitch ranges (Fernald 1984; Smith and Trainor 2008), hyperaticulated vowels (Kuhl et al. 1997, Kuhl et al. 2008; Rattanasone et al. 2013), and more canonical allophonic consonant variants (Dilley et al. 2014), compared to speech directed toward adults. Such infant-directed effects have been found in a number of languages (Ferguson 1964; Fernald et al. 1989), though the details of the effects are not always identical (e.g., Benders 2013; Igarashi et al. 2013).
The speech adjustments observed in infant-directed speech are often assumed to be made for the ‘benefit’ of a specific kind of interlocutor – an infant language learner – though the exact nature of this benefit is not entirely agreed upon. IDS modifications, particularly prosodic effects such as higher F0 and expanded pitch range (Fernald 1984), seem to serve to regulate the attention of the infant (Fernald 1991, Fernald 1993; Trainor and Desjardins 2002), as well as to communicate speaker affect (Bryant and Barrett 2007). Indeed, it has been demonstrated that speech exhibiting these infant-directed speech properties is preferred by infants over adult-directed speech (Fernald 1985; Fernald and Kuhl 1987). (See, e.g., Cristia 2013; Soderstrom 2007 for reviews of IDS properties.)
Some researchers have suggested that IDS modifications occur to promote language acquisition (Werker et al. 1994), for example by highlighting phonetic parameters that carry phonemic distinctions in a language (Kuhl et al. 1997, Kuhl et al. 2008) or by facilitating word recognition and word segmentation by infants (Thiessen et al. 2005; Singh et al. 2009; Song et al. 2010). In any case, the stronger attention that infants pay to IDS could increase the opportunity for the features of IDS to influence language learning, even beyond what might be expected from the relative frequency with which infants hear this type of speech (de Boer 2005). And it has been demonstrated computationally, in some studies at least, that IDS is more learnable than ADS, at least as far as the identification of vowel qualities is concerned (de Boer and Kuhl 2003; de Boer 2005).
Other researchers have suggested that the phonetic enhancement features of IDS in particular are not goal-directed at all, but are rather a by-product of the prosody and rate features of IDS (e.g., McMurray et al. 2013; Martin et al. 2014). For instance, because IDS is typically slower with more stressed monosyllabic words, it may simply result in less formant undershoot and greater prosodically-conditioned hyperarticulation (McMurray et al. 2013). So, while some features of IDS may be produced for the benefit of the infant, others may not be a specifically encoded part of the signal at all. In fact, other recent research even finds evidence for non-hyperarticulatory patterns in IDS. For example, VOT increased for both voiced and voiceless stops in infant-directed speech, such that the voicing contrast is not actually enhanced (McMurray et al. 2013). Furthermore, although the vowels /i/, /a/, and /u/ may be more peripheral in IDS (Kuhl et al. 1997), other vowels are not (McMurray et al. 2013; Cristia and Seidl 2014); and even the vowels reported to be hyperarticulated show broader distributions and greater overlap in IDS (Kirchhoff and Schimmel 2005; Soderstrom 2007; Cristia and Seidl 2014; Martin et al. 2014). This variability results in a tendency for contrasts in IDS to be less clear and to yield worse performance for automatic speech recognition when algorithms are trained on IDS input (Kirchhoff and Schimmel 2005; Martin et al. 2015).
However, whether infant-directed speech effects promote, intentionally or not, language acquisition, or whether they are simply a coincidence, they are part of a listener-specific speech style. Speech to all kinds of listeners (e.g., listeners in a noisy environment, hearing-impaired adults, non-native listeners) includes specialized features, many of which overlap with the features of IDS. Such speech is known to enhance clarity or intelligibility, without any didactic intent. (See a brief review of such ‘clear speech’ effects below.) It would be somewhat surprising, in fact, if speech to infants did not attend to clarity in some manner as well.
1.2 Clear speech
As just noted, in laboratory speech more generally and in speech directed toward (particular groups of) adult interlocutors, there is also systematic variation in acoustic features of pronunciation, which is generally assumed to occur for the benefit of a listener. Communicative context leads speakers to adjust properties of their speech. For example, speakers talk louder and slower in noisy environments (Lombard 1911; Lane and Tranel 1971). Speech directed toward listeners with a hearing impairment is louder and slower (Picheny et al. 1986; Bradlow et al. 2003). In fact, explicitly instructing talkers to speak ‘clearly’, relative to ‘conversational’ or ‘citation’ speech, produces a range of acoustic-phonetic adjustments, including increased intensity (Picheny et al. 1986), longer segment durations and slower speech rate (e.g., Picheny et al. 1986; Bradlow et al. 2003), larger F0 range (Bradlow et al. 2003), and enhancement of higher frequency (>1000 Hz) components in the long-term spectrum (Krause and Braida 2003). Prior studies have also shown more segmentally-focused spectral effects in clear speech in the form of vowel hyperarticulation or vowel space expansion (Picheny et al. 1986; Moon and Lindblom 1989, Moon and Lindblom 1994; Bradlow et al. 2003; Krause and Braida 2003; Scarborough and Zellou 2013), as well as maintained or increased consonant-to-vowel coarticulation (Mathies et al. 2001; Bradlow 2002; Scarborough and Zellou 2013). Indeed, these clear speech modifications yield speech that is more intelligible than ‘conversational’ speech (Picheny et al. 1986; Bradlow et al. 1996).
Context also conditions speech adjustments at the level of individual words. Less predictable words are produced more clearly than highly predictable words (Lieberman 1963; Scarborough 2010). Similarly, the first mention of a word in a narrative is produced more clearly than the second mention of that word (Fowler and Housum 1987). Thus, factors that influence the predictability of a word within a communicative context condition speech clarity at the word level. In other words, clarity is increased where listeners might be less able to rely on inferences from context to provide top-down cues to aid them in lexical perception.
1.3 Lexical factors and clarity
Other word-level factors, in particular lexical frequency and phonological neighborhood density, have been shown to have systematic effects on phonetic realization in adult-directed speech as well. With respect to lexical frequency, it has been shown repeatedly that high-frequency words are reduced relative to low-frequency words (e.g., Zipf 1935): they are produced with temporal reduction (Fidelholz 1975; Jurafsky et al. 2000; Bell et al. 2009), more contracted vowel spaces (Munson and Solomon 2004), and greater final segment deletion (Raymond et al. 2006). Phonological neighborhood density is a measure of the number of phonologically similar words (‘neighbors’) there are for a given word (Luce et al. 1990). Words with more neighbors, i.e., those from dense phonological neighborhoods, exhibit more hyperarticulated vowel spaces (Wright 2003; Munson and Solomon 2004), as well as a greater degree of nasal and vowel-to-vowel coarticulation (Scarborough 2013), than words from sparser neighborhoods.
These word-level properties have systematic influences on lexical perception as well. Specifically, high-frequency and low-ND words are perceived more quickly and accurately than low-frequency and high-ND words across a range of tasks (Howes 1957; Luce 1986; Goldinger et al. 1989; Vitevitch 1997, Vitevitch 2002; Vitevitch and Luce 1998; Luce and Pisoni 1998), suggesting a processing advantage for more frequently occurring words and words with more unique phonological structures. Such effects can be understood in terms of the process of spoken word recognition, which involves picking out a target word from among its competitors in the lexicon, all of whose lexical representations are simultaneously activated by an acoustic-phonetic input (Luce 1986; Marslen-Wilson and Zwitserlood 1989; Luce and Pisoni 1998; Norris et al. 2000). More frequent words have stronger representations, due to previous repeated activation, and are therefore more likely to be correctly picked than their less frequent competitors. And words from sparser neighborhoods are simply subject to less competition, and are therefore more likely to be correctly picked.
An explicit connection between lexically conditioned production patterns and perceptual difficulty in the low-frequency and high-ND words has been made by some researchers (though cf. Gahl et al. 2012). For instance, the hyperarticulation in high-ND words has been interpreted as serving to mitigate lexical difficulty, since the hyperarticulated vowels would render the high-ND words more distinguishable from their competitors (Wright 2003; Scarborough 2013; Scarborough and Zellou 2013). The increase in coarticulation that co-occurs with increased hyperarticulation in high ND words suggests that the coarticulation too may serve to improve the perceptibility of these words (Scarborough 2013). Since coarticulation is the overlapping production of discrete sounds, it is a source of robust, if redundant, information about the phonemes present in a word (Beddor 2009). For example, the nasalized vowel [ʌ̃] in ‘bun’ provides information about the upcoming nasal consonant, even before that segment occurs. Such information can improve word perception by delivering cues to two phonemes at once and thus allowing early prediction of the nasal (Ali et al. 1971; Lahiri and Marslen-Wilson 1991).
Indeed, recent empirical evidence indicates that both hyperarticulation and increased nasal coarticulation facilitate lexical perception. Speech exhibiting hyperarticulated vowels has been shown to be more intelligible than speech with less hyperarticulated vowels (Bradlow et al. 1996). Furthermore, words produced with hyperarticulation and the greatest increased nasal coarticulation, as found in listener-directed clear speech, are perceived more quickly than words produced with hyperarticulation but decreased nasal coarticulation (Scarborough and Zellou 2013). Also, earlier onset of anticipatory nasality in a vowel preceding a nasal consonant speeds lexical identification (Beddor et al. 2013). These findings suggest that that modifications made by speakers in clear speech, including increasing degree of nasal coarticulation and/or increasing vowel hyperarticulation, enhance the intelligibility of the speech signal (intentionally or not).
1.4 Lexical factors in infant-directed speech
If lexical factors like neighborhood density and word frequency are factors that condition accommodative adjustments in lab speech or speech directed to adults, we might predict that they should have similar effects in infant-directed speech, which is fairly explicitly accommodative. However, an infant lexicon is quite different from an adult lexicon – both in size and in organization. Not only do infants know fewer words, but they also have less experience with each word, and they know less about relations between words. Thus, neither lexical frequency nor neighborhood density, as calculated for adults, is a good representation of lexical difficulty for infants. Thus, if frequency and neighborhood density condition effects in speech production because they represent lexical difficulty for a listener that speakers try to compensate for, they would seem not to be relevant factors in IDS after all. Rather, a more age-appropriate lexical measure should be a better basis for lexical adjustments in infant-directed speech.
One such alternate approach might be to consider frequency and neighborhood density calculated over child speech. In fact, such lexical calculations have been made in work by Storkel and colleagues (e.g., Storkel and Hoover 2010, Storkel and Hoover 2011; Storkel et al. 2013), based on corpora of speech from kindergarteners and first-graders. However, even such norms are not fully appropriate for the current study, which investigates infant interlocutors (from whom speech corpora are obviously not available). Furthermore, although child-derived lexical statistics might be very useful in explaining child productions, word-learning in children, or lexical perception by children, they are possibly not relevant in the domain of speech produced by adults, even if produced to infants, since it is not clear how adults would have access to statistics derived from a real child lexicon in any case.
Lexical age-of-acquisition (AoA), or the adult-reported age at which a word was learned (Caroll and White 1973; Gilhooly and Logie 1981), is a better candidate for an appropriate measure of lexical difficulty for infants. First, such ratings make explicit reference to the developing lexicon, and insofar as children acquire easier words earlier, they refer directly to lexical difficulty for children. Further, the fact that AoA is based on subjective ratings from adults (Kuperman et al. 2012) means that AoA represents an adult’s assessment of how hard a given word might be for a child – precisely the sort of information that is accessible to an adult and that could influence the adult’s production to a child word-by-word.
Even though AoA measures are subjective ratings by adults, numerous studies report evidence that these measures do indeed reflect lexical difficulty for young children, both in producing and perceiving words. For example, elementary school-age children are faster at producing words with a lower AoA, both in picture-naming (Gilhooly and Gilhooly 1979; Gilhooly and Watson 1981) and word repetition (Garlock et al. 2001). Likewise, earlier AoA accurately predicts faster spoken word recognition (auditory gating task: Garlock et al. 2001; and auditory lexical decision: Cirrin 1984) and more accurate mispronunciation detection (Walley and Metsala 1992) in elementary school-age children. (Note that similar perceptual effects of AoA from word naming and lexical decision have been demonstrated for adults, as well; Brown and Watson 1987; Izura et al. 2011; Kuperman et al. 2012; inter alia). These effects have been explained as effects of lifetime cumulative frequency or a result of order of acquisition on representational activation (Kuperman et al. 2012). But since AoA measures are inherently subjective, they might also reflect frequency, ND, or any number of other linguistic, cognitive, or social factors (e.g., word length, semantic content, concept complexity, observational experience, etc.) that contribute to an adult’s evaluation of the difficulty of a given word in acquisition.
1.5 Current study
In this study, then, we investigate lexically conditioned variation in infant-directed speech, comparing it to lexically conditioned variation in adult-directed speech. The goal of the language-learning infant is not just to acquire the phonological inventory of the ambient language but also its lexicon, just as the goal of linguistic communication is to express not just sounds, but meaningful words (and phrases). Thus, we predict that infant-directed speech, like adult-directed speech, should contain lexically conditioned phonetic variation.
To that end, we investigate phonetic variation as a function of lexical factors in spontaneous infant-directed speech. To address the questions of (1) whether the lexically conditioned patterns of production in infant-directed speech, if they occur, are similar to those found in adult-directed speech, and (2) whether the factors that condition any such modifications are special to infant lexicons (or to assumed or deduced infant lexicons), we compare the patterns in infant-directed speech to those in spontaneous adult-directed speech. We consider these patterns in light of their hypothesized goal of enhancing the perceptibility of the speech signal for a given interlocutor. Thus, we predict specifically that lexical age-of-acquisition will condition phonetic variation in infant-directed speech, above and beyond any contribution of ND or lexical frequency, which condition variation in adult-directed speech.
2.1 Infant-directed speech
The infant-directed speech data for this study come from the Brent corpus (Brent and Siskind 2001). This corpus is a collection of spontaneous speech recordings of 16 English-speaking mothers talking to their preverbal infants, ranging in age from 9 to 15 months. The speakers were recruited and recorded in and around Baltimore, Maryland. Recording was done in the speakers’ homes, without an experimenter present, over more than a dozen sessions of 1.5–2 hours each. The middle 75 minutes of each session were extracted and transcribed by Brent and colleagues. (See Brent and Siskind 2001 for more detailed information about the data-collection methods for this corpus.) We automatically force-aligned the sound files to the transcripts at both the word and phoneme level using the FAVE-align component of the Forced Alignment and Vowel Extraction (FAVE) suite (Rosenfelder et al. 2011), based on the Penn Forced Aligner (Yuan and Liberman 2008).
2.2 Target words
The current study focuses on monosyllabic, monomorphemic content words containing exactly one nasal segment (either a pre-vocalic nasal segment (NV), e.g., nap, or a post-vocalic nasal segment (VN), e.g., hand) extracted from the recordings of these 16 mothers talking to their infants. As is discussed in Section 2.3.1, high vowels elude accurate acoustic nasality measurement, so words containing high vowels were excluded. The full data set consists of 109 unique word types containing a nasal segment, represented by 8,127 tokens. The average number of tokens per mother was 507 tokens, with a range of 179–1,135.
2.3 Acoustic measurements
2.3.1 Acoustic nasality (A1-P0 dB)
Lowering of the velum acoustically couples the nasal passages with the oral cavity, introducing nasal resonances in addition to the oral ones during vowel production. These nasal formants fall in relatively predictable and stable frequency ranges, with the lowest nasal formant around 250 Hz and the second nasal formant around 900 Hz (Chen 1997). As nasality increases, the relative amplitude (in dB) of these nasal formant peaks tends to increase, while the amplitude of the oral formant peaks, especially F1, tends to decrease. Thus the difference in amplitude between one of the nasal formants and F1 gives us a relative measure of nasalization: A1-P0 (where A1 is the amplitude of the F1 harmonic peak and P0 is the amplitude of the lowest nasal peak). The low F1 of high vowels can overlap with and thus obscure the lower nasal peak (P0) (Chen 1997); therefore, only words with non-high vowels were targeted in the current study.
The spectral characteristics of oral and nasalized vowels are illustrated in Figure 1, which compares vowels from the English words bad and hand, as spoken by one mother in this corpus. In bad (top spectrum), the amplitude of the first formant peak (A1) is greater than the peak corresponding to the nasal formant frequency (P0). Meanwhile, in a nasalized vowel from the word hand (bottom spectrum), the first formant peak has decreased in amplitude while the nasal formant peak has increased in amplitude. Since as nasality increases, A1 decreases and P0 increases, smaller A1-P0 values indicate greater vowel nasality.
For each measurement, the boundary between the nasal and a vowel segment, which was placed automatically during forced alignment, was verified and hand-corrected as necessary. The accurate boundary was taken to be the point at which there was an abrupt reduction in amplitude of the higher formant frequencies in the spectrogram, accompanied by an abrupt change in amplitude in the waveform and simplification of waveform cycles. These criteria were used for both VN and NV sequences (but with the waveform cues in reverse order for the latter). A1-P0 measurements were made at the midpoint of each vowel via a Praat script which automatically identified A1 and P0. The frequencies of P0 and F1 were verified to ensure that they were appropriate for a given speaker and a given vowel quality.
Hyperarticulation is measured as acoustic distance in F1-F2 space from vowel space center (e.g., Bradlow et al. 1996; Wright 2003). Vowel space center was calculated for each speaker by taking the F1 and F2 means of the high front vowel /i/ and the low back vowel /a/. For this purpose (because the test word set does not include high vowels), words containing /i/ were extracted from the Brent corpus in the same manner as the test words (i.e., monosyllabic content words with a vowel-adjacent nasal consonant); 931 tokens representing /i/ and 309 tokens representing /a/ (taken from the test words) were used to calculate vowel space centers.
F1 and F2 measurements were taken automatically via script in Praat at the midpoint of each target vowel, based on default Burg formant tracking analyses (5 formants in 5,500 Hz; 25 ms window) and verified by visual examination of wide band spectrograms. The Euclidean distance from vowel space center was calculated for each midpoint measurement, in bark-transformed F1-F2 space.
2.4 Lexical variables
2.4.1 Lexical age-of-acquisition (AoA)
Lexical age-of-acquisition is estimated as the adult-reported age at which a word was learned (Carroll and White 1973; Gilhooly and Gilhooly 1979). The AoA norms used in the current study were from Kuperman et al. (2012), who collected subjective ratings of over 30,000 English words from 1,960 raters, crowdsourced through Amazon Mechanical Turk. These raters were instructed for each word to provide “the age at which you would have understood that word if somebody had used it in front of you, EVEN IF YOU DID NOT use, read, or write it at the time” (Kuperman et al. 2012: 980, emphasis original). For the current study, these AoA ratings were provided for each word type extracted from the corpus, and AoA was centered.
2.4.2 Phonological neighborhood density (ND)
Phonological neighborhood density is a measure of the number of lexical neighbors for a given word. Neighbors are defined as words that differ from the target word by the addition, deletion, or substitution of a single phoneme (Luce et al. 1990). Nasal coarticulation has been shown to vary systematically in words depending on the number of phonological neighbors: words with many neighbors are produced with a greater degree of vowel nasality than words with fewer phonological neighbors (Scarborough 2013).
Frequency-weighted neighborhood density (ND), defined as the summed log frequencies in tokens per million words (SUBTLEX, Brysbaert and New 2009) of a word and its neighbors, was calculated for each lexical item in our samples (for example, the word snob has eight monosyllabic phonological neighbors in English and its summed log frequency is 14.4, while the word son has ten neighbors but its ND is 31.7). Neighbors were determined using the Hoosier Mental Lexicon (Nusbaum et al. 1984). The summed log frequencies (=ND) were centered.
The SUBTLEX frequencies that we used are based on 51 million words of English movie subtitles. The SUBTLEX norms have been shown to predict lexical decision and naming reaction times more accurately than other available frequency measures, specifically compared to Kučera and Francis (1967) and CELEX (Baayen et al. 1996), suggesting that they could also provide a more useful basis for understanding more complex behavioral implications of frequency (Brysbaert and New 2009). Frequency counts per million words for each token were log transformed and centered.
Means, standard deviations, and ranges for Frequency, ND, and AoA (presented as non-centered) from the 109 word types containing exactly one nasal segment extracted from the Brent corpus are provided in Table 1.
2.5 Model design
2.5.1 Fixed effects structure
Two separate linear mixed effects models were run for each dependent variable (nasal coarticulation and hyperarticulation). Each model consisted of the same fixed effects predictors and structure.
The lexical variables examined in the present study tend to strongly correlate with each other. For example, early-acquired words tend to be highly frequent words, and they also tend to have simple phonological structure, leading them to have a large number of lexical competitors (high ND). But, the studies focusing on AoA mentioned above have been careful to illustrate effects independent from the influence of the other factors, either through partialization of predictors (Carroll and White 1973) or binning of the continuous predictors into crossed categorical factors (Garlock et al. 2001).
We handle the collinearity between these variables through residualization and partialization (Gorman 2010). Multicollinearity, or non-independence of predictors, is a serious problem for regression because it violates the assumption of predictor orthogonality, making it impossible for the model-fitting procedure to correctly attribute variance to one predictor or the other. Residualization is a method for orthogonalizing predictor variables: given two highly collinear variables, one predictor is regressed onto another and the resulting residuals of that regression are used to replace the original predictor variables. In so doing, the goal is to assess the independent contributions to the dependent variable of otherwise related predictors (Kuperman et al. 2008). Partialization is the same technique, but for more than two multilinear predictors (Gorman 2010).
Residualization allows us to isolate the effects of a single predictor while simultaneously evaluating the same data for effects from multiple originally collinear predictors. It has been shown, for example, that a simultaneous analysis using residualization of multiple predictors produces the same result for a given predictor as if the variable was run by itself in a single-predictor model (Wurm and Fisicaro 2014). Concerns that residualization has been misused and/or misinterpreted are outlined by Wurm and Fisicaro (2014). Most notably, they object to residualization being used to orthogonalize predictors included “for the purposes of statistical control” (p. 38). However, for the purposes of the current study, residualization is both appropriate and well-motivated. Specifically, we aim to simultaneously assess the independent contributions of three lexical properties (age-of-acquisition, frequency, and ND) which are multicollinear. 1 Note that some researchers have dealt with the multicollinearity between multiple lexical predictors through stepwise multiple regression analysis, as opposed to residualization (see, e.g., Brysbaert et al. 2011; Kuperman et al. 2012). As discussed in Wurm and Fisicaro (2014: 46), the result of such a hierarchical analysis is the same as a simultaneous analysis using residualization.
Models were fit using the lme4 package (Bates et al. 2013, version 1.0-4) in R (R Core Team 2013), and we use the languageR package (Baayen 2011, version 1.1.4) to produce probabilities based on Markov Chain Monte Carlo sampling. As noted above, to mitigate the potentially misleading influence of multicollinearity, we first residualized ND on Frequency. (One member of a pair of multicollinear predictors is taken as a baseline (here, Frequency), and the other predictor (here, ND) was regressed linearly on the values of the baseline.) Next, we partialized AoA on both Frequency and the residuals of ND. Order of residualization/partialization procedure was determined from studies attributing the greatest amount of variance in visual lexical decision reaction times to frequency, then to ND and related structural properties, then to AoA (see Brysbaert and Cortese 2010; Brysbaert et al. 2011). Each model consisted of fixed predictors of Frequency (centered), ND (centered, residualized on Frequency), and AoA (centered, partialized on Frequency and ND residuals). The models also included two- and three-way interaction terms between these three main variables, testing the possibility that phonetic variation in IDS contains simultaneous influences of these factors.
Finally, we performed a critical test of this method to verify the accuracy of the results reported using residualization. The residualization/partialization procedure was reversed, i.e., we residualized ND on AoA (rather than AoA on ND), and the same patterns of results were obtained; that is, the same predictors that were computed as significant or not significant in the models reported in the current study were also found with the reverse residualization procedure. This is important because it means that the patterns in the data are indeed best explained by the independent contributions of the significant main predictor(s), rather than by some factor that is shared by both multiple predictors simultaneously which we are unable to tease apart using this procedure.
2.5.2 Random effects structure
We fit random intercepts for word and speaker in both models. This random effects structure allows for the joint possibilities of speaker idiosyncrasy with respect to overall degree of coarticulation and hyperarticulation, as well as lexical item idiosyncrasy with respect to overall phonetic patterns.
Two linear mixed models analyzed hyperarticulation and nasal coarticulation measured from 8,127 tokens (109 unique word types containing a non-high vowel and a nasal consonant) from 16 mothers talking to their prelinguistic infants.
The estimated coefficients and associated p-values from the model on hyperarticulation in IDS are presented in Table 2.
The model of hyperarticulation in IDS includes a significant main effect of AoA: with a positive coefficient, this effect indicates that words learned later have a greater degree of hyperarticulation. Word frequency was a marginally significant predictor of hyperarticulation: higher frequency words tended to be less hyperarticulated than lower frequency words. Neighborhood density was not a significant predictor of hyperarticulation in this IDS data set. There were no significant interactions among the three lexical predictors in the IDS hyperarticulation dataset.
The isolated univariate effects of (non-residualized) AoA and Frequency on hyperarticulation in IDS are illustrated in Figure 2: vowels from words that are reported as later-acquired and from words that are less frequent are more hyperarticulated than vowels from words that are reported to be learned earlier and words that are more frequent.
Recall that prior studies that have reported hyperarticulation in IDS as a register have focused on point vowels (/i/, /a/, /u/) (e.g., Kuhl et al. 1997, Kuhl et al. 2008; Cristia and Seidl 2014), and in fact, hyperarticulation may even be limited to these point vowels (McMurray et al. 2013). Although the current study does not address the presence or absence of hyperarticulation in IDS in general, we wanted to ask whether the AoA effects on hyperarticulation in IDS might also be limited to this subset of vowels. Since the current dataset examines only non-high vowels, /a/ is the only point vowel. Therefore, we ran a post-hoc model on a subset of the data excluding /a/ (i.e., on the non-point vowels). This non-point vowel model also revealed a significant effect of AoA (est=0.24, SE=0.14, p=0.008), a non-significant effect of ND (p=0.5), as well as a significant effect of frequency (est=−0.21, SE=0.17 p=0.03). Thus, it appears that the AoA-conditioned hyperarticulation generalizes across vowels in our data.
3.2 Nasal coarticulation
The estimated coefficients and associated p-values from the model on nasal coarticulation in IDS are presented in Table 3.
The model of nasal coarticulation in IDS includes a significant negative coefficient for AOA: later-acquired words have a lower A1-P0, which means they have greater nasalization. Word frequency was also a significant predictor of degree of nasality: higher-frequency words had less nasality than lower-frequency words. The model on nasality in IDS did not yield a significant main effect of neighborhood density. 2
The isolated univariate effects of (non-residualized) AoA and Frequency on degree of nasal coarticulation in IDS are illustrated in Figure 3: words that are reported as later acquired and words that are less frequent are produced with a greater degree of vowel nasality (lower A1-P0 dB) than earlier-learned and more-frequent words. 3
3.3 Interim discussion
The two models on vowel hyperarticulation and vowel nasality yielded systematic lexical AoA effects. In spontaneous speech to infants, mothers produced greater degrees of vowel hyperarticulation and nasal coarticulation in words that were subjectively rated as later-acquired, above and beyond other lexical effects. Meanwhile, ND had no demonstrable effect on these phonetic variables. Recall that prior studies have found evidence of consistent ND effects on hyperarticulation and coarticulation in laboratory speech, elicited as standard citation form speech or as explicitly (adult) listener-directed (e.g., Wright 2003; Munson and Solomon 2004; Scarborough 2013; Scarborough and Zellou 2013).
In order to understand the relationship between the current findings and those from prior studies, two possible explanations are considered: (1) Context matters in conditioning lexical variation; that is, AoA influences phonetic variation in IDS since that is the best metric for lexical difficulty for language learners; meanwhile, ND conditions production patterns in adult-directed speech because that is the best metric for lexical difficulty for interlocutors with fully-developed lexicons. Or, (2) AoA might condition lexical variation in all speech styles, for reasons related, for instance, to representational strength or cumulative frequency that this measure might represent, and prior studies of ADS did not uncover its influence since it had not previously been included as a predictor of speech production. These two possibilities can be addressed empirically via investigation of which lexical factors are significant in a model of adult-directed speech, with the same predictors as in the present IDS study. If the first explanation is correct, we would expect a non-significant effect of AoA and a significant effect of ND on conditioning phonetic variation in adult-directed speech. If the second explanation is correct, we would expect a significant AoA effect in the adult-directed speech, just as in the infant-directed speech. In the next section, we test these two hypotheses on a set of data from a spontaneous adult-directed speech corpus.
4 Adult-directed speech
4.1 Buckeye corpus of adult-directed speech
The spontaneous adult-directed speech data came from the Buckeye corpus (Pitt et al. 2005, Pitt et al. 2007). The entire corpus consists of spontaneous speech directed toward an adult interviewer from 40 speakers from Columbus, Ohio, each lasting approximately one hour (see Pitt et al. 2005 for more information about this corpus). In order to construct a sample of speakers comparable to our sample of IDS speakers, we analyzed a subset of this corpus – all female speakers who were under the age of 30 – resulting in 10 individuals’ recordings.
The sound files were automatically forced-aligned to the transcripts at both the word and phoneme level using FAVE-align. As in the IDS study, monosyllabic, monomorphemic content words containing exactly one nasal segment were extracted from these recordings. The ADS data set consists of 88 unique word types containing a nasal segment, represented by 897 tokens (with 77 additional tokens of nasal words with /i/ in order to calculate vowel space centers for each subject using /a/ and /i/ means; these 77 tokens were not independently included in the analysis). Hyperarticulation and degree of nasality (as A1-P0) were measured and calculated for each token using the same methods described in Section 2.3.
Means, standard deviations, and ranges for Frequency, ND, and AoA (all non-centered) from the 88 word types containing exactly one nasal segment extracted from the subset of 10 young female speakers selected from Buckeye corpus are provided in Table 4. Note that the words are strikingly similar to the words from the IDS part of the study with respect to their lexical statistics.
Two lmer models were run on the hyperarticulation and nasal coarticulation data from the Buckeye ADS speech in the same manner as for the Brent data: fixed effects of centered Frequency, centered ND (residualized by frequency) and centered AoA (partialized by Frequency and residuals of ND), along with two- and three-way interaction terms. 4 Random intercepts of Speaker and Item were included as well.
Two linear mixed models analyzed hyperarticulation and nasal coarticulation measured from 897 tokens (88 unique word types containing a non-high vowel and a nasal consonant) from 10 young women talking to an adult interlocutor.
In both models, ND is a significant predictor of phonetic variation: high ND words are more hyperarticulated and have a greater degree of nasality (=lower A1-P0). AoA is not a significant main predictor of either of these phonetic variables in the ADS data.
We note that there are several significant interactions present in the two models of ADS. First, in both the model on hyperarticulation and the model on nasal coarticulation, there is a significant interaction of ND and Frequency. In order to interpret this interaction, we binned the word types extracted for the adult-directed speech data set into categories split by the mean of each sample (i.e., high/low frequency, high/low ND, high/low AoA). Figure 4 illustrates the significant ND by frequency interaction in each model, with means (and standard errors) of nasal coarticulation and hyperarticulation in each of the sub-categories. It appears that the interaction is driven by differentiation of high- and low-frequency words in high-ND words, but not in low-ND words. In other words, the effect of phonetic modification was most extreme in the most difficult adult word types – high-ND, low-frequency words.
These patterns were confirmed in a series of post-hoc linear mixed models on subsets of the data. First, for hyperarticulation, two post-hoc lmer regressions were run on the ND-binned data – one for higher-ND words and one for lower-ND words – with Frequency set as a continuous fixed predictor and random intercepts for Speaker and Item. The post-hoc regression for higher-ND words showed a significant correlation with Frequency (est.=−0.19, SE=0.09, t=−2), while the post-hoc regression for lower-ND words did not show a significant frequency effect (t=0.97). For nasal coarticulation, frequency again correlated with acoustic nasality in higher-ND words (est.=1.6, SE=0.94, t=1.8), but not in lower-ND words (t=1.1).
While AoA was not a significant main effect in either of the ADS models, AoA did participate in a significant two-way interaction with ND and a three-way interaction with ND and Frequency in the hyperarticulation model (but not in the nasal coarticulation model). The two-way interaction, illustrated with word types binned by high/low ND and high/low AoA, is shown in Figure 5.
As can be seen in Figure 5, the significant interaction appears to be driven by the (small) difference between early and late age of acquisition in high ND words, while there is no AoA-differentiation in hyperarticulation for low-ND words. Like with the ND by Frequency interaction, the effect of phonetic adjustment for AoA only emerges in the more difficult high-ND words. To confirm this effect, two post-hoc lmer regressions were run on the ND-binned data – one for higher-ND words and one for lower-ND words – with AoA set as a continuous fixed predictor and random intercepts for Speaker and Item. The post-hoc regression for higher-ND words revealed a significant effect of AoA (est.=0.22, SE=0.06, t=3.5), while the post-hoc regression for lower-ND words did not have a significant AoA effect (est.=0.006, SE=0.35, t=0.12).
5 General discussion
Several main observations stem from the patterns of lexically-conditioned phonetic variation in the two types of listener-directed speech examined in the current study.
First, we find that the infant-directed speech contains lexically-conditioned phonetic variation, specifically, as a function of lexical age-of-acquisition. Insofar as IDS may be structured to promote clarity or acquisition, it seems appropriate that IDS would have lexically-conditioned phonetic variation because words are the meaningful units that speakers seek to express and that learners need to be able to perceive and learn. Indeed, recent evidence suggests that words are targets of early language acquisition: for example, 9-month-old infants (in some cases, even as young as 6 months) display comprehension of common words, suggesting that language acquisition consists of identifying lexical, as well as phonemic, structure (Bergelson and Swingley 2012, Bergelson and Swingley 2013). Therefore, the idea that mothers might intuitively structure the speech signal to maximize perceptibility not only of individual phonemes but also on a word-by-word basis, as a function of how difficult they might be for the learner, is not implausible. (Future replication in different samples of infant-directed speech could confirm that AoA-differentiation is a general property of IDS as a register).
Secondly, coarticulation, as well as hyperarticulation, is controlled in infant-directed speech. Specifically, in IDS, speakers produced hyperarticulation and increased coarticulation in later-acquired words. With respect to hyperarticulation, we note that the results in the current study are not evidence that IDS vowels are overall hyperarticulated compared to ADS 5 (cf. McMurray et al. 2013); rather, they demonstrate that degree of hyperarticulation (like degree of coarticulation) is lexically conditioned, specifically by lexical age-of-acquisition, in IDS. With respect to coarticulation, our results are additionally novel in that patterns of coarticulation have been little-studied previously in IDS. The only previous study, to our knowledge, looking at coarticulation in speech directed to infants found greater coarticulation in IDS than in ADS (Andruski and Kuhl 1996), parallelling the greater hyperarticulation frequently reported for IDS. Other reported segment-level articulation effects in IDS include more canonical consonant allophones and less variable vowel allophones (Kuhl et al. 1997; Dilley et al. 2014; though cf. McMurray et al. 2013). Although these effects might seem contrary to the increased coarticulation in IDS (Andruski and Kuhl 1996), and in later-acquired words in IDS in particular (in the current study), they could in fact be interpreted as having a similar goal: to provide better information about the segments in a word. More extreme vowel articulation and more canonical consonant allophones straightforwardly provide better information about the identity of these segments; increased (consonant-to-vowel) coarticulation provides temporally-distributed and predictive information about the identity of the upcoming consonant (e.g., Ali et al. 1971; Lahiri and Marslen-Wilson 1991; Beddor et al. 2013; Scarborough and Zellou 2013). Thus, the confluent increase in degree of contextual nasalization (from consonant to vowel) and more extreme vowel articulation in ostensibly harder, later-acquired words in IDS are not incompatible and may even serve a similar purpose.
Thirdly, the lexically-conditioned phonetic variation patterns vary as a function of interlocutor type. Hyperarticulation and coarticulation vary systematically by ND (but not primarily by AoA 6) across a variety of adult-directed speech contexts, with greater hyperarticulation and greater coarticulation in words from denser neighborhoods. Meanwhile, hyperarticulation and coarticulation vary systematically by AoA (but not ND) in infant-directed speech, with the same properties – greater hyperarticulation and coarticulation – in later-acquired words. This fundamental similarity between ND effects in ADS and AoA effects in IDS suggests that the two factors represent similar cognitive or communicative phenomena, but in a listener-targeted way. In other words, we could interpret both ND and AoA as measures of lexical difficulty, but where difficulty is evaluated differently for adult interlocutors than for infants.
Note that if AoA were reflective of cumulative frequency or affecting speech output via representational strength (as suggested by Kuperman et al. 2012), we would expect to see its effects consistently across interlocutors. Rather, AoA plays a role where real age-of-acquisition (or whether or not a child would know a word) is relevant, i.e., when speaking to a child acquiring language. In other words, AoA is a relevant measure for exactly what it is – an adult’s assessment of lexical difficulty for a child. The lack of ND effects in IDS suggests a parallel argument: if these ND effects were exclusively the result of representation or speaker-internal mechanisms (e.g., Gahl et al. 2012), we would expect to see the effects consistently across interlocutors. In fact, just the empirical finding that different lexical variables condition phonetic variation across different speech styles demonstrates that speakers make lexical adjustments in their speech based on context. It should be acknowledged that, of course, properties other than age of interlocutor (adult vs. child) differed between the IDS and ADS corpora used in the current study (for example, familiarity between speaker and interlocutor, place of recording, interlocutor verbal or non-verbal responses, regional dialect, etc.). We cannot discount that these factors may contribute as well to the differences in the patterns observed in the two samples, though we have no predictions about the specific effects of any of these variables on lexically-conditioned effects.
The current study did reveal one mostly consistent lexical influence across both speech styles: lexical frequency. In both the IDS and ADS data, lower-frequency words were more hyperarticulated and showed greater nasal coarticulation (marginally so in IDS). In other words, lexical frequency appears to have a stable effect across interlocutors. (See also Lahey and Ernestus 2014, who report a stable influence of word frequency on reduction in IDS and ADS.) This suggests that frequency effects may result from a different cognitive mechanism than either ND or AoA effects. Such a distinction between frequency and ND effects has been supported elsewhere as well. For example, when participants are instructed to respond immediately to a word prompt (i.e., lexical access is ‘stressed’), vowel hyperarticulation is influenced by both frequency and ND; yet, when participants are instructed to wait for one second before responding to a word prompt (i.e., lexical access is ‘facilitated’), frequency no longer has an effect on hyperarticulation, though ND still does (Munson 2007). In other words, frequency and ND effects do not appear to result from the same cognitive processes: frequency effects can be attributed to the process of lexical activation of a word from memory, while ND effects can be attributed to real-time speech production processes (Munson 2007). This interpretation fits with the patterns reported in the current study, as well: the influence of frequency on lexical access remains consistent regardless of interlocutor type, but the mechanisms that influence production as a function of word difficulty occur in later stages of production and can be adjusted for different interlocutors.
Finally, taken together, we believe the patterns of findings in the current study support the stance that clarity-based adjustments in speech are sensitive to lexical-level factors and that evaluation of the need for clarity is tuned to the listener (cf. Hazan and Baker 2011). Perhaps the strongest version of such a position with respect to the age-of-acquisition effects found in IDS in the current study would postulate that caregivers’ adjustments in IDS are attempts to aid the child in acquiring language – in this case, producing subjectively harder words for the infant in a manner that might make them more learnable. (In fact, however, we might expect that lexically-specific hyperarticulation in IDS would be focused on lower-AoA words that were closer to the capabilities of these very young children, and thus more likely targets of acquisition, e.g., Sundberg 1998.) But such a didactic position is certainly not entailed by our findings. They may rather be part of a more general communicative strategy whereby speakers modulate speech clarity to compensate for difficulties encountered by listeners (e.g., Lindblom 1990, inter alia), even difficulties at a word-by-word level (e.g., Wright 2003; Scarborough 2013; Scarborough and Zellou 2013). Just as it is axiomatic that speech is variable, in part to ensure clarity, so too are listeners variable, and the type of information that would be useful and informative for some listeners may not help others. The results in the current study suggest that speakers are sensitive to this fact. More specifically, we suggest that speakers modify their speech to compensate for word difficulty and that speakers assess word difficulty on the basis of interlocutor type. The present patterns of findings seem to indicate that lexical difficulty can be evaluated by AoA in speech directed toward infants, while ND is a more relevant measure of lexical difficulty in adult-directed speech.
We are grateful to Eric Doty, Dave Embick, Will Styler, Meredith Tamminga, and Santiago Barreda for various discussions, assistance, and feedback. We also thank the editors, two anonymous reviewers, and the audience of LabPhon 14 for constructive comments and suggestions.
Ali, Latif, T. Gallagher, J. Goldstein, & Raymond Daniloff. 1971. Perception of coarticulated nasality. Journal of the Acoustical Society of America 49. 538–540. Google Scholar
Andruski, Jean E., & Parricia K. Kuhl. 1996. The acoustic structure of vowels in mothers’ speech to infants and adults. In Proceedings of the Fourth International Conference on Spoken Language, Vol. 3, 1545–1548. Philadelphia, PA: IEEE. Google Scholar
Beddor, Patrice S. 2009. A coarticulatory path to sound change. Language 85(4). 785–821. Google Scholar
Beddor, Patrice S., Kevin B. McGowan, Julie E. Boland, Andries W. Coetzee & Anthony Brasher. 2013. The time course of perception of coarticulation. The Journal of the Acoustical Society of America 133(4). 2350–2366. Google Scholar
Bell, Alan, Jason Brenier, Michelle Gregory, Cynthia Girand & Dan Jurafsky. 2009. Predictability effects on durations on content and function words in conversational English. Journal of Memory and Language 60. 92–111. Google Scholar
Benders, Titia. 2013. Mommy is only happy! Dutch mothers’ realisation of speech sounds in infant-directed speech expresses emotion, not didactic intent. Infant Behavior and Development 36(4). 847–862. Google Scholar
Bergelson, Elika & Daniel Swingley. 2012. At 6–9 months, human infants know the meanings of many common nouns. Proceedings of the National Academy of Sciences 109(9). 3253–3258. Google Scholar
Bergelson, Elika & Daniel Swingley. 2013. The acquisition of abstract words by young infants. Cognition 127(3). 391–397. Google Scholar
Bortfeld, Heather & James L. Morgan. 2010. Is early word-form processing stress-full? How natural variability supports recognition. Cognitive Psychology 60(4). 241–266. Google Scholar
Bradlow, Ann R. 2002. Confluent talker- and listener-related forces in clear speech production. In Carlos Gussenhoven & Natasha Warner (eds.), Laboratory phonology 7, 241–273. Berlin & New York: Mouton de Gruyter. Google Scholar
Bradlow, Ann R., Nina Kraus & Erin Hayes. 2003. Speaking clearly for children with learning disabilities. Journal of Speech, Language, and Hearing Research 46. 80–97. Google Scholar
Bradlow, Ann R., Gina M. Torretta & David B. Pisoni. 1996. Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics. Speech Communication 20(3). 255–272. Google Scholar
Brown, Gordon D. & Frances L. Watson. 1987. First in, first out: Word learning age and spoken word frequency as predictors of word familiarity and word naming latency. Memory & Cognition 15(3). 208–216. Google Scholar
Bryant, Gregory A. & H. Clark Barrett. 2007. Recognizing intentions in infant-directed speech: Evidence for universals. Psychological Science 18(8). 746–751. Google Scholar
Brysbaert, Marc, Matthias Buchmeier, Markus Conrad, Arthur M. Jacobs, Jens Bölte & Andrea Böhl. 2011. The word frequency effect: a review of recent developments and implications for the choice of frequency estimates in German. Experimental Psychology 58(5). 412. Google Scholar
Brysbaert, Marc & Michael J. Cortese. 2010. Do the effects of subjective frequency and age of acquisition survive better word frequency norms? The Quarterly Journal of Experimental Psychology 64(3). 545–559. Google Scholar
Brysbaert, Marc & Boris New. 2009. Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods 41(4). 977–990. Google Scholar
Carroll, John B. & Margaret N. White. 1973. Age-of-acquisition norms for 220 picturable nouns. Journal of Verbal Learning and Verbal Behavior 12(5). 563–576. Google Scholar
Chen, Marilyn Y. 1997. Acoustic correlates of English and French nasalized vowels. The Journal of the Acoustical Society of America 102(4). 2360–2370. Google Scholar
Cirrin, Frank M. 1984. Lexical search speed in children and adults. Journal of Experimental Child Psychology 37(1). 158–175. Google Scholar
Cristia, Alejandrina. 2013. Input to language: The phonetics and perception of infant‐directed speech. Language and Linguistics Compass 7(3). 157–170. Google Scholar
Cristia, Alejandrina & Amanda Seidl. 2014. The hyperarticulation hypothesis of infant-directed speech. Journal of Child Language 41(4). 913–934. Google Scholar
de Boer, Bart. 2005. Infant directed speech and the evolution of language, In Maggie Tallerman (ed.), Evolutionary prerequisites for language, 100–121. Oxford: Oxford University Press. Google Scholar
de Boer, Bart & Patricia K. Kuhl. 2003. Investigating the role of infant-directed speech with a computer model. Acoustics Research Letters Online 4(4). 129–134. Google Scholar
Dilley, Laura C., Amanda L. Millett, J. Devin McAuley & Tonya Bergeson. 2014. Phonetic variation in consonants in infant-directed and adult-directed speech: The case of regressive place assimilation in word-final alveolar stops. Journal of Child Language 41(1). 155–175. Google Scholar
Fernald, Anne. 1984. The perceptual and affective salience of mother’s speech to infants. In Lynne Feagans, Catherine Garvey & Roberta Golinkoff (eds.), The origins of growth and communication. Norwood, NJ: Ablex. Google Scholar
Fernald, Anne. 1985. Four-month-old infants prefer to listen to motherese. Infant Behavior and Development 8(2). 181–195. Google Scholar
Fernald, Anne. 1991. Prosody in speech to children: Prelinguistic and linguistic functions. Annals of Child Development 8. 43–80. Google Scholar
Fernald, Anne. 1992. Meaningful melodies in mother’s speech to infants. In H. Papousek, U. Jurgens & M. Papousek (eds.), Nonverbal vocal behavior, 262–280. Cambridge: Cambridge University Press. Google Scholar
Fernald, Anne. 1993. Approval and disapproval: Infant responsiveness to vocal affect in familiar and unfamiliar languages. Child Development 64(3). 657–674. Google Scholar
Fernald, Anne & Patricia Kuhl. 1987. Acoustic determinants of infant preference for motherese speech. Infant Behavior and Development 10(3). 279–293. Google Scholar
Fernald, Anne, Traute Taeschner, Judy Dunn, Mechthild Papousek, Bénédicte de Boysson-Bardies & Ikuko Fukui. 1989. A cross-language study of prosodic modifications in mothers’ and fathers’ speech to preverbal infants. Journal of Child Language 16(3). 477–501. Google Scholar
Fowler, Carol & Jonathan Housum. 1987. Talkers’ signaling of “new” and “old” words in speech and listeners’ perception and use of the distinction. Journal of Memory and Language 26. 489–504. Google Scholar
Gahl, Susanne, Yao Yao & Keith Johnson. 2012. Why reduce? Phonological neighborhood density and phonetic reduction in spontaneous speech. Journal of Memory and Language 66. 789–806. Google Scholar
Garlock, Victoria M., Amanda C. Walley & Jamie L. Metsala. 2001. Age-of-acquisition, word frequency, and neighborhood density effects on spoken word recognition by children and adults. Journal of Memory and Language 45(3). 468–492. Google Scholar
Gilhooly, Ken J. & Mary L. Gilhooly. 1979. Age-of-acquisition effects in lexical and episodic memory tasks. Memory & Cognition 7(3). 214–223. Google Scholar
Gilhooly, Ken J. & Robert H. Logie. 1981. Word age-of-acquisition, reading latencies and auditory recognition. Current Psychological Research 1(3–4). 251–262. Google Scholar
Gilhooly, Ken J. & F. L. Watson. 1981. Word age-of-acquisition effects: A review. Current Psychological Reviews 1(3). 269–286. Google Scholar
Goldinger, Stephen D., Paul A. Luce & David B. Pisoni. 1989. Priming lexical neighbors of spoken words: Effects of competition and inhibition. Journal of Memory and Language 28. 501–518. Google Scholar
Gorman, Kyle. 2010. The consequences of multicollinearity among socioeconomic predictors of negative concord in Philadelphia. University of Pennsylvania Working Papers in Linguistics 16(2). 66–75. Google Scholar
Hazan, Valerie & Rachel Baker. 2011. Acoustic-phonetic characteristics of speech produced with communicative intent to counter adverse listening conditions. The Journal of the Acoustical Society of America 130(4). 2139–2152. Google Scholar
Howes, Davis H. 1957. On the relation between the intelligibility and frequency of occurrence of English words. Journal of the Acoustical Society of America 29. 296–305. Google Scholar
Igarashi, Yosuke, Ken’ya Nishikawa, Kuniyoshi Tanaka & Reiko Mazuka. 2013. Phonological theory informs the analysis of intonational exaggeration in Japanese infant-directed speech. The Journal of the Acoustical Society of America 134(2). 1283–1294. Google Scholar
Izura, Cristina, Miguel Pérez, Elizabeth Agallou, Victoria C. Wright, Javier Marín, Hans Stadthagen-González & Andrew W. Ellis. 2011. Age/order of acquisition effects and the cumulative learning of foreign words: A word training study. Journal of Memory and Language 64(1). 32–58. Google Scholar
Jurafsky, Daniel, Alan Bell, Michelle Gregory & William D. Raymond. 2000. Probabilistic relations between words: Evidence from reduction in lexical production. In Joan Bybee & Peter Hopper (eds.), Frequency and the emergence of linguistic structure, 229–254. Amsterdam: John Benjamins. Google Scholar
Kirchhoff, Katrin & Steven Schimmel. 2005. Statistical properties of infant-directed versus adult-directed speech: Insights from speech recognition. The Journal of the Acoustical Society of America 117(4). 2238–2246. Google Scholar
Krause, Jean & Louis D. Braida. 2003. Acoustic properties of naturally produced clear speech at normal speaking rates. Journal of the Acoustical Society of America 115. 362–378. Google Scholar
Kučera, Henry & W. Nelson Francis. 1967. Computational analysis of present-day American English. Providence, RI: Brown University Press. Google Scholar
Kuhl, Patricia K., Jean E. Andruski, Inna A. Chistovich, Ludmilla A. Chistovich, Elena V. Kozhevnikova, Viktoria L. Ryskina, Elvira I. Stolyarova, Ulla Sundberg & Francisco Lacerda. 1997. Cross-language analysis of phonetic units in language addressed to infants. Science 277(5326). 684–686. Google Scholar
Kuhl, Patricia K., Barbara Conboy, Sharon Coffey-Corina, Denise Padden, M. Rivera-Gaxiola, & T. Nelson. 2008. Phonetic learning as a pathway to language: New data and native language magnet theory expanded (NLM-e). Philosophical Transactions of the Royal Society B: Biological Sciences 363(1493). 979–1000. Google Scholar
Kuperman, Victor, Raymond Bertram & R. Harald Baayen. 2008. Morphological dynamics in compound processing. Language and Cognitive Processes 23(7–8), 1089–1132. Google Scholar
Kuperman, Victor, Hans Stadthagen-Gonzalez & Marc Brysbaert. 2012. Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods 44(4). 978–990. Google Scholar
Lahey, Mybeth & Mirjam Ernestus. 2014. Pronunciation variation in infant-directed speech: Phonetic reduction of two highly frequent words. Language Learning and Development 10(4). 308–327. Google Scholar
Lahiri, Aditi & William Marslen-Wilson. 1991. The mental representation of lexical form: A phonological approach to the recognition lexicon. Cognition 38(3). 245–294. Google Scholar
Lane, Harlan & Bernard Tranel. 1971. The Lombard sign and the role of hearing in speech. Journal of Speech and Hearing Research 14. 677–709. Google Scholar
Liberman, Alvin M., F. S. Cooper, Donald P. Shankweiler & Michael Studdert-Kennedy. 1967. Perception of the speech code. Psychological Review 74(6). 431. Google Scholar
Lieberman, P. 1963. Some effects of semantic and grammatical context and isolation. Journal of Speech Hearing Disorders 22. 87–90. Google Scholar
Lindblom, Björn. 1990. Explaining phonetic variation: A sketch of the H&H theory. In William J. Hardcastle & Alain Marchal (eds.), Speech production and speech modelling, 403–439. Amsterdam: Springer Netherlands. Google Scholar
Lombard, Étienne. 1911. Le signe de l’élévation de la voix. Annales des Maladies de L’Oreille et du Larynx XXXVII 2. 101–109. Google Scholar
Luce, Paul A. & David B. Pisoni. 1998. Recognizing spoken words: The neighborhood activation model. Ear and Hearing 19(1). 1–36. Google Scholar
Luce, Paul A., David B. Pisoni & Steven Goldinger. 1990. Similarity neighborhoods of spoken words. In Gerry Altmann (ed.), Cognitive models of speech processing, 122–147. Cambridge, MA: MIT Press. Google Scholar
Marslen-Wilson, William & Pienie Zwitserlood. 1989. Accessing spoken words: The importance of word onsets. Journal of Experimental Psychology 15(3). 576–585. Google Scholar
Martin, Andrew, Thomas Schatz, Maarten Versteegh, Kouki Miyazawa, Reiko Mazuka, Emmanuel Dupoux & Alejandrina Cristia. 2015. Mothers speak less clearly to infants than to adults: A comprehensive test of the hyperarticulation hypothesis. Psychological Science 26(3), 341–347. Google Scholar
Martin, Andrew, Akira Utsugi & Reiko Mazuka. 2014. The multidimensional nature of hyperspeech: Evidence from Japanese vowel devoicing. Cognition 132(2). 216–228. Google Scholar
Mathies, Melanie, Pascal Perrier, Joseph S. Perkell & Majid Zandipour. 2001. Variation in anticipatory coarticulation with changes in clarity and rate. Journal of Speech and Hearing Research 44. 340–353. Google Scholar
McMurray, Bob, Kristine A. Kovack-Lesh, Dresden Goodwin & William McEchron. 2013. Infant directed speech and the development of speech perception: Enhancing development or an unintended consequence? Cognition 129(2). 362–378. Google Scholar
Moon, Seung-Jae & Björn Lindblom. 1989. Formant undershoot in clear and citation-form speech: A second progress report. Speech Transmission Laboratory, QPSR 1. 121–123. Google Scholar
Moon, Seung-Jae & Björn Lindblom. 1994. Interaction between duration, context and speaking style in English stressed vowels. Journal of the Acoustical Society of America 96(1). 40–55. Google Scholar
Munson, Benjamin & Nancy Pearl Solomon. 2004. The influence of phonological neighborhood density on vowel articulation. Journal of Speech, Language, and Hearing Research 47. 1048–1058. Google Scholar
Munson, Benjamin. 2007. Lexical access, lexical representation, and vowel production. Laboratory Phonology 9. 201–228. Google Scholar
Norris, Dennis, James M. McQueen & Anne Cutler. 2000. Merging information in speech recognition: Feedback is never necessary. Behavioral and Brain Sciences 23(3). 299–325. Google Scholar
Nusbaum, Howard C., David B. Pisoni & Christopher K. Davis. 1984. Sizing up the Hoosier mental lexicon: Measuring the familiarity of 20,000 words. Research on Speech Perception Progress Report 10. 357–376. Google Scholar
Picheny, Michael, Nathaniel Durlach & Louis Braida. 1986. Speaking clearly for the hard-of-hearing: 2. Acoustic characteristics of clear and conversational speech. Journal of Speech and Hearing Research 29. 434–446. Google Scholar
Pitt, Mark A., Laura Dilley, Keith Johnson, Scott Kiesling, William Raymond, Elizabeth Hume & Eric Fosler-Lussier. 2007. Buckeye corpus of conversational speech (2nd release). Columbus, OH: Department of Psychology, Ohio State University. Google Scholar
Pitt, Mark A., Keith Johnson, Elizabeth Hume, Scott Kiesling & William Raymond. 2005. The Buckeye corpus of conversational speech: Labeling conventions and a test of transcriber reliability. Speech Communication 45(1). 89–95. Google Scholar
Rattanasone, Nan Xu, Denis Burnham & Ronan Gabriel Reilly. 2013. Tone and vowel enhancement in Cantonese infant-directed speech at 3, 6, 9, and 12 months of age. Journal of Phonetics 41(5). 332–343. Google Scholar
Raymond, William D., Robin Dautricourt & Elizabeth Hume. 2006. Word-internal /t, d/ deletion in spontaneous speech: Modeling the effects of extra-linguistic, lexical, and phonological factors. Language Variation and Change 18(01). 55–97. Google Scholar
Scarborough, Rebecca. 2010. Lexical and contextual predictability: Confluent effects on the production of vowels. In Cécile Fougeron, Barbara Kuhnert, Mariapaola D’Imperio & Nathalie Vallée (eds.), Papers in laboratory phonology X, 557–586. Berlin: Mouton de Gruyter. Google Scholar
Scarborough, Rebecca. 2013. Neighborhood-conditioned patterns in phonetic detail: Relating coarticulation and hyperarticulation. Journal of Phonetics 41(6). 491–508. Google Scholar
Scarborough, Rebecca & Georgia Zellou. 2013. Clarity in communication: “Clear” speech authenticity and lexical neighborhood density effects in speech production and perception. The Journal of the Acoustical Society of America 134(5). 3793–3807. Google Scholar
Singh, Leher, Sarah Nestor, Chandni Parikh & Ashley Yull. 2009. Influences of infant-directed speech on early word recognition. Infancy 14(6). 654–666. Google Scholar
Smith, Nicholas A. & Laurel J. Trainor. 2008. Infant-directed speech is modulated by infant feedback. Infancy 13(4). 410–420. Google Scholar
Soderstrom, Melanie. 2007. Beyond babytalk: Re-evaluating the nature and content of speech input to preverbal infants. Developmental Review 27(4). 501–532. Google Scholar
Song, Jae Yung, Katherine Demuth & James Morgan. 2010. Effects of the acoustic properties of infant-directed speech on infant word recognition. The Journal of the Acoustical Society of America 128(1). 389–400. Google Scholar
Storkel, Holly L., Daniel Bontempo, Andrew J. Aschenbrenner, Junko Maekawa & Su-Yeon Lee. 2013. The effect of incremental changes in phonotactic probability and neighborhood density on word learning by preschool children. Journal of Speech, Language, and Hearing Research 56(5). 1689–1700. Google Scholar
Storkel, Holly L. & Jill R. Hoover. 2010. An online calculator to compute phonotactic probability and neighborhood density on the basis of child corpora of spoken American English. Behavior Research Methods 42(2). 497–506. Google Scholar
Storkel, Holly L. & Jill R. Hoover. 2011. The influence of part-word phonotactic probability/neighborhood density on word learning by preschool children varying in expressive vocabulary. Journal of Child Language 38(03). 628–643. Google Scholar
Sundberg, Ulla. 1998. Mother tongue-phonetic aspects of infant-directed speech. Stockholm: PERILUS. Google Scholar
Thiessen, Erik D., Emily A. Hill & Jenny R. Saffran. 2005. Infant‐directed speech facilitates word segmentation. Infancy 7(1). 53–71. Google Scholar
Trainor, Laurel J. & Renée N. Desjardins. 2002. Pitch characteristics of infant-directed speech affect infants’ ability to discriminate vowels. Psychonomic Bulletin & Review 9(2). 335–340. Google Scholar
Vitevitch, Michael. 1997. The neighborhood characteristics of malapropisms. Language and Speech 40. 211–228. Google Scholar
Vitevitch, Michael. 2002. The influence of phonological similarity neighborhoods on speech production. Journal of Experimental Psychology: Learning, Memory, and Cognition 28. 735–747. Google Scholar
Vitevitch, Michael & Paul Luce. 1998. When words compete: Levels of processing in perception of spoken words. Psychological Science 9. 325–329. Google Scholar
Walley, Amanda C. & Jamie L. Metsala. 1992. Young children’s age-of-acquisition estimates for spoken words. Memory & Cognition 20(2). 171–182. Google Scholar
Werker, Janet F., Judith E. Pegg & Peter J. McLeod. 1994. A cross-language investigation of infant preference for infant-directed communication. Infant Behavior and Development 17(3). 323–333. Google Scholar
Wright, Richard. 2003. Factors of lexical competition in vowel articulation. In John Local, Richard Ogden & Rosalind Temple (eds.), Papers in laboratory phonology VI, 26–50. Cambridge: Cambridge University Press. Google Scholar
Wurm, Lee H. & Sebastiano A. Fisicaro. 2014. What residualizing predictors in regression analyses does (and what it does not do). Journal of Memory and Language 72. 37–48. Google Scholar
Yuan, Jiahong & Mark Liberman. 2008. Speaker identification on the SCOTUS corpus. Journal of the Acoustical Society of America 123(5). 3878. Google Scholar
Zipf, George K. 1935. The psychobiology of language. New York: Houghton-Mifflin. Google Scholar
Simple linear regressions on the variables from the words in the extracted data set confirmed collinearity between Frequency and ND (r=0.46, p < 0.001), ND and AoA (r=−0.05, p < 0.001) and Frequency and AoA (r=−0.08, p < 0.001).
To avoid any possible effects of non-linearity created by the interaction of factors with factors they are residualized on, we reran both the hyperarticulation and nasality analyses without interaction terms. Main effects for AoA and ND, the predictors of interest, were very similar. In the hyperarticulation analysis, AoA remained significant (est.=0.15, t=2.3, p < 0.01), while ND (est.=−0.03, t=−0.37, p=0.7) and Freq (est.=−0.02, t=−0.17, p=0.6) were not. In the nasality analysis, AoA was significant (est.=−0.54, t=–1.8, p < 0.05), as was Freq (est.=1, t=2.1, p < 0.05), and ND was not significant (est.=0.01, t=1.5, p=.052).
We considered potential contributions of two additional factors, as suggested by an anonymous reviewer. First, as prosodic modifications (Vosoughi and Roy 2012) and vowel production (Rattanasone et al. 2013) in child-directed speech have been shown to change over time with child age, we included an added fixed effect of age of the child (in months) for each recording session in the nasal coarticulation model, which was not significant as a main effect (p=0.14) or in an interaction with AoA; the inclusion of this factor did not change the reported pattern of results. Second, as repeated mention of a given word is known to influence hyperarticulation in both default adult speech and IDS (Fowler and Housum 1987; Bortfeld and Morgan 2010), we investigated the possibility that our production results were influenced by different patterns of repetition in IDS vs. ADS. When a factor coding first vs. subsequent mention (within a given recording session for each mother) was included in the nasal coarticulation model, it was also non-significant (p=0.8), and its inclusion did not change the reported pattern of results.
As in the words from the IDS corpus, simple linear regressions on the variables of the extracted words from the ADS data set indicated collinearity between frequency and ND (r = 0.25, p < 0.001), ND and AoA (r = −0.17, p < 0.001), and frequency and AoA (r = −0.26, p < 0.001).
Overall, there was, in fact, greater measured hyperarticulation in the IDS dataset than in the ADS dataset (means: IDS=11.2, ADS=2.2), though direct comparison between these two datasets is hampered by differences between speakers and elicitation conditions. However, it is interesting to note that variability in hyperarticulation across samples was comparable (SDs: IDS=1.3, ADS=1.1). Variability was comparable across samples for nasal coarticulation as well (SDs for nasal coarticulation: IDS=11.6, ADS=11).
We did see some influence of AoA on hyperarticulation in ADS via interactions involving ND. This accords with literature that reports that both AoA and ND affect word naming and recognition in adults, as in children: later-acquired words and words from denser neighborhoods have longer reaction latencies (Goldinger et al. 1989; Garlock et al. 2001; Izura et al. 2011; Kuperman et al. 2012). However, the role AoA plays in the current study is quite limited; it is relevant only for hyperarticulation and only among high-ND words.