Accessible Requires Authentication Published by De Gruyter Mouton March 18, 2015

The rise and fall of the L-shaped morphome: diachronic and experimental studies

Andrew Nevins, Cilene Rodrigues and Kevin Tang
From the journal Probus

Abstract

It has been suggested that the Romance first person singular indicative constitutes a natural class with the present subjunctive paradigm for the purposes of stem selection (Maiden 2005), thus forming a kind of ‘diagonal syncretism’, as the latter shares no morphosyntactic features with the former. The existence of such patterns has been taken to be an argument for autonomous morphology and the existence of unnatural ‘morphomes’, in the sense of Aronoff (1994). Our experimental investigations with native speakers of Portuguese, Italian, and Spanish reveal that this pattern is underlearned, and that speakers do not generalize it to novel forms, instead preferring the 2nd person singular indicative to the 1st person as the base for the derivation of the subjunctive paradigm (and the 2nd person indicative as opposed to the 2nd person subjunctive as the base for the derivation of the 1st person indicative as well). The results implicate a role for naturalness biases in morphological structure, and an awareness that the first person singular is an unreliable and idiosyncratic base for productive inflectional identity. We then study the underlearning of the L-morphome in terms of historical change in the salience of these patterns. We demonstrate, through means of diachronic corpus studies spanning five centuries, a change in the ratio of first conjugation verbs to second & third conjugation verbs, and a resulting decrease in the relative type frequency of where morphomic verbs reside. If indeed learners need increased evidence in order to incorporate and actively uptake unnatural patterns, this lexical support has dwindled over time. Even though many of the morphomic verbs have maintained a very high token frequency (allowing them to survive as memorized), their productivity has diminished over time, and hence they go unlearned as a generalizable pattern. When the distribution of irregular alternations is overshadowed in the lexicon, a morphologically unnatural pattern may cease to maintain its status as part of the grammar.

Appendix 1: construction of best mixed-models

This section describes the procedures used to construct the best mixed models for each of the languages analyzed in the experiments, before proceeding to overall analyses of the best predictors of variance and the ratio of Natural responses to L-shaped responses in each, as detailed in Sections 4.1.2, 5.1.2, and 6.1.2, respectively.

Beginning with European Portuguese, we followed the modelling strategy as documented in Barr et al. (2013): we began with a saturated model, with fully crossed and fully specified random effects. This kind of model has an interaction term for all the predictors as fixed effects, with random intercepts and slopes:

(16)

Saturated model for Portuguese results:

Response ~ Group * Frame * Conjugation * Place + (1 + Group *

Frame * Conjugation * Place|Participant) + (1 + Group * Frame *

Conjugation * Place|item)

Due to non-convergence, we simplified the model until it partially-converged. We followed two principles for choosing which term to exclude for the purpose of simplification, 1) hierarchically: most complex (the largest interaction terms) to the least complex (single terms), and 2) by-item slopes before by-participant slopes. The latter principle is justified by the fact that our data were collected from controlled experiments – item variations tend to be smaller than participant variations. A model was deemed to have converged if the relative gradient is below 0.002, as recommended by Ben Bolker (one of the developers of lme4 (R-sig-ME mailing list n.d.)). By inspecting the partially-converged model, we excluded the slope associated with smallest variance. This process was repeated until the model converged, and the resulting converged model is shown below.

(17)

Converged model for Portuguese results:

Response ~ Group * Frame * Conjugation * Place + (1 + Group +

Frame + Conjugation + Place|Participant) + (1 + Frame + Place|item)

Next, we followed a data-driven approach to determine the random effect structure of our model, using the backward best-path algorithm, guided by which step of removal of a predictor would lead to the best next model. The model comparison was performed using an anova (test=χ2, α=0.1). The reason for α to be set at a liberal threshold of 0.1 is to be as conservative as possible with the detection of any potential predictors. If there were multiple subset models that resulted in p-values exceeding the α-level in their nested model comparisons with the superset model, the subset model with the strongest evidence (the highest p-value) was selected. Both of the random intercepts were kept by default, as it is a common practice to include them. The resulting model has the maximal effect structure supported by the data, following the terminology of Jaeger (2010):

(18)

Model with the maximal effect structure supported by the data for

Portuguese results

Response ~ Group * Frame * Conjugation * Place + (1 + Group + Frame +

Conjugation|Participant) + (1|item)

We then performed a series of nested model comparison using anova (test=χ2, α=0.1). The addition or removal of terms was justified by whether a significant improvement to the model was made. We adhered to the principle of marginality, which does not allow for models containing an interaction without its respective main effects and all lower order terms. The model selection algorithm was again the best-path algorithm; the direction of comparisons was first forward (inclusion) then backward (exclusion), and this pattern was repeated until no terms could be further included or excluded. When excluding terms, we excluded from the most complex (the largest interaction terms) to the least complex (single terms), with the reverse being true when including terms. The comparison process was alternated between random effects and fixed effects. The resultant model has the maximal effect structure justified by model comparison, following the terminology of Jaeger (2010):

(19)

Model with the maximal effect structure justified by model comparison

for Portuguese results:

Response ~ Group + Frame + Conjugation + Place + Group:Frame +

(1 + Group + Frame|Participant) + (1|item)

This model thereby included main effect terms for Group, Frame, Conjugation, and Place, an interaction term for Group x Frame as well as random effect terms such as a random intercept term for Item, and a random intercept for Participant with random slopes for Group and Frame.

For Italian, the same model selection procedure was used as for Portuguese, reported above. We began with a saturated model as shown below.

(20)

Saturated model for Italian:

Response ~ Group * Frame * Alternation + (1 + Group * Frame *

Alternation|Participant) + (1 + Group * Frame * Alternation|item)

The converged model is as shown below.

(21)

Converged model for Italian:

Response ~ Group * Frame * Alternation + (1 + Group * Frame *

Alternation – Group:Frame:Alternation|Participant) + (1 + Group +

Frame + Alternation|item)

The model with the maximal effect structure supported by the data is as shown below.

(22)

Model with the maximal effect structure supported by the data for Italian:

Response ~ Group * Frame * Alternation + (1|Participant) + (1|item)

We then performed a series of nested model comparison using anova (test=χ2, α=0.1) as before. The resultant model with the maximal effect structure justified by model comparison is shown below.

(23)

Model with the maximal effect structure justified by model comparison for Italian results:

Response ~ Group * Frame * Alternation + (1|Participant) + (1|item)

In summary, this model included an interaction term for Group x Frame x Alternation as well as random effect terms such as a random intercept term for Item and for Participant.

For Spanish, the same model selection procedure was used as with the other two languages. We began with a saturated model as shown below.

(24)

Saturated model for Spanish:

Response ~ Group * Frame * Person * Place + (1 + Group * Frame * Person *

Place|Participant) + (1 + Group * Frame * Person * Place|item)

The converged model is as shown below.

(25)

Converged model for Spanish:

Response ~ Group * Frame * Person * Place + (1 + Group + Frame +

Person + Place|Participant) + (1 + Group + Frame|item)

The model with the maximal effect structure supported by the data is as shown below.

(26)

Model with the maximal effect structure supported by the data for Spanish:

Response ~ Group * Frame * Person * Place + (1 + Group + Frame + Place|

Participant) + (1|item)

We then performed a series of nested model comparison using anova (test=χ2, α=0.1) as before. The resultant model with the maximal effect structure justified by model comparison is shown below.

(27)

Model with the maximal effect structure justified by model comparison for Spanish:

Response + Group + Frame + Person + Place + Group:Frame + Group:

Person + Group:Place + Frame:Place + Place:Person + Group:Frame:

Place + Group:Place:Person + (1 + Group + Frame + Place|Participant) +

(1|item)

In sum, the model for Spanish included main effect terms for Group, Frame, Person, and Place, a number of interaction effects, and random effect terms such as a random intercept term for Item, and a random intercept for Participant with random slopes for Group, Frame, and Place.

Acknowledgement

Many thanks to our interlocutors along the course of this research: Asaf Bachrach, Michael Becker, Ricardo Bermúdez-Otero, Ana Castro, Maria Garraffa, Thomas Graf, Kyle Gorman, Martin Maiden, Gertjan Postma, Erica Rodrigues, Leticia Sicuro Corrêa, Donca Steriade, Leo Wetzels, and Marcos Zampieri.

References

Albright, Adam. 2003. A quantitative study of Spanish paradigm gaps. In Proceedings of WCCFL 25, 114. Search in Google Scholar

Anderson, Stephen. 1981. Why phonology isn’t ‘natural’. Linguistic Inquiry12. 493539. Search in Google Scholar

Aronoff, Mark. 1994. Morphology by itself. Cambridge, MA: MIT Press. Search in Google Scholar

Baayen, R. Harald.2001. Word frequency distributions, vol. 18. Cambridge, MA: MIT Press. Search in Google Scholar

Bachrach, Asaf. & AndrewNevins. 2008. Introduction: Approaching inflectional identity. In AsafBachrach and AndrewNevins (eds.), Inflectional identity, 128. Oxford: Oxford University Press. Search in Google Scholar

Barr, Dale J., RogerLevy, ChristophScheepers & Harry J.Tily. 2013. Random effects structure for confirmatory hypothesis testing: keep it maximal. Journal of Memory and Language68. 255278. Search in Google Scholar

Barto, Kamil. 2014. Mumin: Multi-model inference. R package version 1.10.0. http://CRAN.R-project.org/package=MuMIn. Accessed 16 February 2015. Search in Google Scholar

Bates, Douglas, MartinMaechler, BenBolker & StevenWalker. 2014. lme4: Linear mixed-effects models using Eigen and s4. R package version 1.0-6. http://CRAN.R-project.org/package=lme4. Accessed 16 February 2015. Search in Google Scholar

Becker, Michael, NihanKetrez & AndrewNevins. 2011. The surfeit of the stimulus: Analytic biases filter lexical statistics in Turkish laryngeal alternations. Language87(1). 84125. Search in Google Scholar

Becker, Michael & JonathanLevine. 2010. Experigen – An online experiment platform. https://github.com/tlozoot/experigen. Accessed 16 February 2015. Search in Google Scholar

Berent, Iris, TracyLennertz, PaulSmolensky & VeredVaknin-Nusbaum. 2012. Listeners’ knowledge of phonological universals: Evidence from nasal clusters. Phonology26. 15501562. Search in Google Scholar

Blevins, Juliette. 2004. Evolutionary phonology. Cambridge: Cambridge University Press. Search in Google Scholar

Blevins, James P. 2010. The morphome as a unit of predictive value. Paper presented at the 14th International Morphology Meeting, Budapest. Search in Google Scholar

Bobaljik, Jonathan. 2004. Universals in comparative morphology. Cambridge, MA: MIT Press. Search in Google Scholar

Bourne, Lyle E. 1970. Knowing and using concepts. Psychological Review77. 546556. Search in Google Scholar

Cristia, Alejandrina, JeffMielke, RobertDaland & SharonPeperkamp. 2013. Constrained generalization of implicitly learned sound patterns. Journal of Laboratory Phonology4. 259285. Search in Google Scholar

Davies, Mark & MichaelFerreira. 2006. Corpus do português: 45 million words, 1300s–1900s. 2006. http://www.corpusdoportugues.org. Accessed 16 February 2015. Search in Google Scholar

Dressler, Wolfgang, WilliMayerthaler, OswaldPanagl & WolfgangWurzel. 1987. Leitmotifs in natural morphology. Amsterdam: Johns Benjamins. Search in Google Scholar

Feldman, Jacob. 2000. Minimization of Boolean complexity in human concept learning. Nature407. 630633. Search in Google Scholar

Finley, Sara. 2012. Typological asymmetries in round vowel harmony: Support from artificial grammar learning. Language and Cognitive Processes27. 15501562. Search in Google Scholar

Graf, Thomas. 2012. An algebraic perspective on the person case constraint. In UCLA Working Papers in Linguistics 17, 85–90. Search in Google Scholar

Hale, Mark. & CharlesReiss. 2008. The phonological enterprise. Oxford: Oxford University Press. Search in Google Scholar

Halle, Morris. & AlecMarantz. 1993. Distributed morphology and the pieces of inflection. In KennethHale and SamuelJay Keyser (eds.), The view from building 20, 111176. Cambridge, MA: MIT Press. Search in Google Scholar

Hayes, Bruce, DoncaSteriade & RobertKirchner. 2003. Phonetically-Based Phonology. Cambridge: Cambridge University Press. Search in Google Scholar

Heinz, Jeffrey. 2010. Learning long-distance phonotactics. Linguistic Inquiry41. 623661. Search in Google Scholar

Jaeger, T. Florian. 2010. Random effect: Should I stay or should I go? http://hlplab.wordpress.com/2009/05/14/random-effect-structure/. Accessed 16 February 2015. Search in Google Scholar

Johnson, Paul C. D. 2014. Extension of Nakagawa & Schielzeth’s R2GLMM to random slopes models. Methods in Ecology and Evolution 5: 944–946. http://dx.doi.org/10.1111/2041-210X.12225. Search in Google Scholar

Kenstowicz, Michael. 1997. Base-identity and uniform exponence: Alternatives to cyclicity. In JacquesDurand & BernardLaks (eds.), Current trends in phonology: Models and methods, 363394. University of Salford. Search in Google Scholar

Killick, Rebecca. & Idris A.Eckley. 2011. Changepoint: An R package for changepoint analysis. Lancaster: Lancaster University. Search in Google Scholar

Lin, Yuri, Jean-BaptisteMichel, Erez LiebermanAiden, JonOrwant, WillBrockman & SlavPetrov. 2012. Syntactic annotations for the Google books Ngram corpus. In Proceedings of the ACL 2012 System Demonstrations, 169–174. Association for Computational Linguistics. Search in Google Scholar

Maiden, Martin. 2005. Morphological autonomy and diachrony. Yearbook of Morphology 2004, 137175. Search in Google Scholar

Maiden, Martin & PaulO’Neill. 2010. On morphomic defectiveness: evidence from the Romance languages of the Iberian Peninsula. In MatthewBaerman, Greville G.Corbett, and DunstanBrown (eds.), Defective Paradigms: Missing forms and what they tell us, 103124. Oxford: Oxford University Press. Search in Google Scholar

Matthews, P. H. 1991. Morphology. Cambridge: Cambridge University Press. Search in Google Scholar

McCarthy, John.2005. Optimal paradigms. In LauraDowning, Tracy AlanHall, and RenateRaffelsiefen (eds.), Paradigms in phonological theory, 170210. Oxford: Oxford University Press. Search in Google Scholar

Moreton, Elliott. 2008. Analytic bias and phonological typology. Phonology25. 83127. Search in Google Scholar

Nakagawa, Shinichi & HolgerSchielzeth. 2013. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution4.133142. Search in Google Scholar

Nevins, Andrew. 2007. The representation of third person and its consequences for person-case effects. Natural Language and Linguistic Theory25. 273313. Search in Google Scholar

Nevins, Andrew, GeanDamulakis & Maria LuisaFreitas. 2014. Phonological regularities among defective verbs. Cadernos De Estudos Lingüísticos56(1). 1121. Search in Google Scholar

Nevins, Andrew & CileneRodrigues. 2012. Naturalness biases, ‘morphomes’, and the Romance first person singular. In Paper presented at Univ. Coimbra, Newcastle University, University of York, and Univ. Paris VII. Handout available online: http://ling.auf.net/lingbuzz/001469. Accessed: 16 February 2015. Search in Google Scholar

Onelli, Corinna, DomenicoProietti, CorradoSeidenari & FabioTamburini. 2006. The DiaCORIS project: A diachronic corpus of written Italian. In Proceedings of LREC-2006, The Fifth International Conference on Language Resources and Evaluation, 12121215. Search in Google Scholar

Page, E. S. 1954. Continuous inspection schemes. Biometrika41. 100115. Search in Google Scholar

Paiva, Maria da Conceição & Maria Eugênia LamogliaDuarte. 2013. Mudança lingüstica: observações no tempo real. In M. C.Mollica and M. L.Braga (eds.), Introdução à sociolingustica: o tratamento da variação, 4th ed., 179190. São Paulo: Contexto. Search in Google Scholar

Pertsova, Katya. 2010. Learning biases in the acquisition of Russian genitive plural allomorphy. Ms., Univ. North Carolina. Search in Google Scholar

Pertsova, Katya. 2011. Grounding systematic syncretism in learning. Linguistic Inquiry42.225266. Search in Google Scholar

Prasada, Sandeep & StevenPinker. 1993. Generalization of regular and irregular morphological patterns. Language and Cognitive Processes8. 156. Search in Google Scholar

R Core Team. 2013. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. http://www.R-project.org/. Accessed 16 February 2015. Search in Google Scholar

R-sig-ME mailing list. n.d. https://stat.ethz.ch/pipermail/r-sig-mixed-models/2014q2/021993.html (accessed 16 February 2015). Search in Google Scholar

Sánchez-Martnez, Felipe, IsabelMartínez-Sempere, XavierIvars-Ribes & Rafael C.Carrasco. 2013. An open diachronic corpus of historical Spanish. Language Resources and Evaluation47. 13271342. doi:10.1007/s10579-013-9239-y. Search in Google Scholar

Steriade, Donca. 2008. A pseudo-cyclic effect in Romanian morphophonology. In AsafBachrach and AndrewNevins (eds.), Inflectional identity, 313359. Oxford: Oxford University Press. Search in Google Scholar

Stump, Gregory. 2001. Inflectional morphology: A theory of paradigm structure. Cambridge: Cambridge University Press. Search in Google Scholar

Tang, Kevin & AndrewNevins. 2013. Quantifying the diachronic productivity of irregular verbal patterns in Romance. UCL Working Papers in Linguistics25. 289308. Search in Google Scholar

Tucker, Emily. 2000. Multiple allomorphs in the formation of the Italian agentive. Master’s thesis, UCLA. Search in Google Scholar

Wilson, Colin. 2006. Learning phonology with substantive bias: An experimental and computational study of velar palatalization. Cognitive Science30. 945982. Search in Google Scholar

Yang, C. 2005. On productivity. Linguistic Variation Yearbook5. 265302. Search in Google Scholar

Zampieri, Marcos & MartinBecker. 2013. Colonia: Corpus of historical Portuguese. In MarcosZampieri and SaschaDiwersy (eds.), Special volume on non-standard data sources in corpus-based research. Volume 5 of ZSM Studien, 77–84. Köln: Shaker. Search in Google Scholar

Zhang, Jie, YuwenLai & CraigTurnbull-Sailor. 2006. Wug-testing the “Tone Circle” in Taiwanese. In Proceedings of WCCFL 25, 453–461. Search in Google Scholar

Zimmer, Karl. 1969. Psychological correlates of some Turkish morpheme structure conditions. Language45(2). 309321. Search in Google Scholar

Note

The authors’ names are listed in alphabetical order. Earlier descriptions of the experimental research were presented in handout form as Nevins and Rodrigues (2012), which the present work supersedes.

Published Online: 2015-3-18
Published in Print: 2015-5-1

©2015 by De Gruyter Mouton