Less is more: why all paradigms are defective, and why that is a good thing

A. Laura Janda 1  and M. Francis Tyers 2
  • 1 HSL, UiT Norges arktiske universitet, Tromso, Norway
  • 2 School of Linguistics, Nacional’nyj issledovatel’skij universitet Vyssaa skola ekonomiki, Moskva, Russia
A. Laura Janda
  • Corresponding author
  • HSL, UiT Norges arktiske universitet, Tromso, Norway
  • Email
  • Further information
  • Laura A. Janda (born 1957, Ph.D., UCLA, 1984) is Professor of Russian Linguistics at UiT the Arctic University of Norway. Her special areas of interest are the complex factors associated with the grammatical categories of case and aspect and how these can be investigated using corpus data and experiments.
  • Search for other articles:
  • degruyter.comGoogle Scholar
and M. Francis Tyers
  • School of Linguistics, Nacional’nyj issledovatel’skij universitet Vyssaa skola ekonomiki, Moskva, Russia
  • Email
  • Further information
  • Francis M. Tyers (born 1983, Ph.D., Universitat d’Alacant, 2013) is Assistant Professor of Linguistics at Higher School of Economics in Moscow. He is passionate about language technology for lesser-resourced languages and has co-organised workshops on machine translation in a number of countries including Russia and Finland.
  • Search for other articles:
  • degruyter.comGoogle Scholar

Abstract

Only a fraction of lexemes are encountered in all their paradigm forms in any corpus or even in the lifetime of any speaker. This raises a question as to how it is that native speakers confidently produce and comprehend word forms that they have never witnessed. We present the results of an experiment using a recurrent neural network computational learning model. In particular, we compare the model’s production of unencountered forms using two types of training data: full paradigms vs. single word forms for Russian nouns, verbs, and adjectives. In the long run, the model displays better performance when exposed to the more naturalistic training on single word forms, even though the other training data is much larger as it includes full paradigms for each and every word. We discuss why “defective” paradigms may be better for human learners as well.

  • Ackerman, Farrell, James P Blevins & Robert Malouf. 2009. Parts and wholes: Patterns of relatedness in complex morphological systems and why they matter. In James P Blevins & Juliette Blevins (eds.), Analogy in Grammar: Form and Acquisition, 54–82. Oxford: Oxford University Press.

    • Crossref
    • Export Citation
  • Ackerman, Farrell & Robert Malouf. 2016. Implicative relations in word-based morphological systems. In Andrew Hippisley & Gregory Stump (eds.), Cambridge Handbook of Morphology, 297–328. Cambridge: Cambridge University Press.

  • Aharoni, Roee, Yoav Goldberg & Yonatan Belnikov. 2016. Improving sequence to sequence learning for morphological inflection generation: The BIU-MIT Systems for the SIGMORPHON 2016 shared task for morphological reinflection. In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology (SIGMORPHON at ACL) 2016. DOI:

    • Crossref
    • Export Citation
  • Albright, Adam. 2003. A quantitative study of Spanish paradigm gaps. In G. Garding & M. Tsujimura (eds.), West Coast Conference on Formal Linguistics 22 proceedings. Somerville, MA: Cascadilla Press, 1–14. http://web.mit.edu/albright/www/papers/Albright-WCCFL22.pdf

  • Andrjušina, N. P. 2006. Leksičeskij minimum po russkomu jazyku kak inostrannomu. Bazovyj uroven’. Obščee vladenie. Moscow/St. Petersburg: TsMO MGU/Zlatoust.

  • Arppe, Antti. 2006. Frequency considerations in morphology, revisited - Finnish verbs differ, too. In M. Suominen, A. Arppe, A. Airola, O. Heinämäki, M. Miestamo, U. Määttä, J. Niemi, K. K. Pitkänen, K. Sinnemäki & Kaius (eds.), A Man of Measure. Festschrift in Honour of Fred Karlsson in his 60th Birthday, Special Supplement to SKY Journal of Linguistics. vol. 19/2006. 175–189. Turku: Linguistic Association of Finland. http://www.ling.helsinki.fi/sky/julkaisut/SKY2006_1/1.3.1.ARPPE.pdf.

  • Baayen, R. Harald. 1992. Quantitative aspects of morphological productivity. In Gert E Booij & J. Van Marle (eds.), Yearbook of Morphology 1991, 109–149. Dordrecht: Kluwer Academic Publishers.

    • Crossref
    • Export Citation
  • Baayen, R. Harald. 1993. On frequency, transparency, and productivity. In Gert E Booij & J. Van Marle (eds.), Yearbook of Morphology 1992, 181–208. Dordrecht: Kluwer Academic Publishers.

    • Crossref
    • Export Citation
  • Baerman, Matthew. 2011. Defectiveness and homophony avoidance. Journal of Linguistics. 47(1) 1–29.

    • Crossref
    • Export Citation
  • Blevins, James P. 2016. Word and Paradigm Morphology. Oxford: Oxford University Press.

  • Booij, Gert. 2017. The construction of words In Barbara Dancygier (ed.), The Cambridge Handbook of Cognitive Linguistics, Chapter 15. Cambridge: Cambridge University Press.

  • Bybee, Joan L. 1985. Morphology: A Study of the Relation between Meaning and Form. Amsterdam: John Benjamins.

  • Comrie, Bernard & Maria Polinsky. 1998. The Great Dagestanian Case Hoax. In Anna Siewierska & Jae Jung Song (eds.), Case, Typology, and Grammar, 95–114. Amsterdam: John Benjamins.

  • Corbett, Greville G. 2015. Morphosyntactic complexity: A typology of lexical splits. Language. 91. 145–193. .

    • Crossref
    • Export Citation
  • Cotterell, Ryan, Christo Kirov, John Sylak-Glassman, Gėraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sandra Kübler, David Yarowsky, Jason Eisner & Mans Hulden. 2017. CoNLL-SIGMORPHON 2017 shared task: Universal morphological reinflection in 52 languages.In Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection, 1–30.

  • Cotterell, Ryan, Christo Kirov, John Sylak-Glassman, David Yarowsky, Jason Eisner & Mans Hulden. 2016. The SIGMORPHON 2016 shared task— Morphological reinflection. In Proceedings of the 14th Annual SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 10–22.

  • Cruse, D. A. 1986. Lexical Semantics. Cambridge: Cambridge University Press.

  • Diessel, Holger. 2015. Usage-based construction grammar In Ewa Dąbrowska & Dagmar Divjak (eds.), Handbook of Cognitive Linguistics, Chapter 14. Berlin: De Gruyter Mouton.

  • Faruqui, Manaal, Yulia Tsvetkov, Graham Neubig & Chris Dyer. 2016. Morphological inflection generation using character sequence to sequence learning. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, USA, June 12 - June 17, 2016. https://arxiv.org/abs/1512.06110

  • Goldberg, Adele. 2006. Constructions at work. The nature of generalization in language. Oxford: Oxford University Press.

  • Hart, Betty & Todd R Risley. 2003. The early catastrophe. The 30 million word gap by age 3. American Educator Spring 2003. 4–9.

  • Janda, Laura A & Lene Antonsen. 2016. The ongoing eclipse of possessive suffixes in North Saami: A case study in reduction of morphological complexity. Diachronica. 33(3). 330–366. .

    • Crossref
    • Export Citation
  • Janda, Laura A & Olga Lyashevksaya. 2011. Grammatical profiles and the interaction of the lexicon with aspect, tense and mood in Russian. Cognitive Linguistics. 22(4) 719–763.

  • Kann, Katharina & Hinrich Schütze. 2016a. Single-model encoder-decoder with explicit morphological representation for reinflection. The Association for Computational Linguistics.In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 555–560.

  • Kann, Katharina & Hinrich Schütze. 2016b. MED: The LMU System for the SIGMORPHON 2016 Shared Task on Morphological Reinflection.In Proceedings of the 14th Annual SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 62–70.

  • Karlsson, Fred. 1985. Paradigms and word forms. Studia gramatyczne VII. Ossolineum, 135–154.

  • Karlsson, Fred. 1986. Frequency considerations in morphology. Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung. 39. 19–28.

  • Kibrik, Andrei E. 2001. Archi (Caucasian—Daghestanian) In Andrew Spencer & Arnold M Zwicky (eds.), The Handbook of Morphology, Chapter 23. Hoboken, NJ: Wiley-Blackwell.

  • Kuznetsova, Julia. 2017. The ratio of unique word forms as a measure of creativity. In Anastasia Makarova, Stephen M. Dickey & Dagmar Divjak (eds.), Each Venture a New Beginning: Studies in Honor of Laura A. Janda, 85–97. Bloomington, In Slavica Publishers.

  • Langacker, Ronald W. 2008. Cognitive Grammar: A Basic Introduction. Oxford: Oxford University Press.

  • Levenshtein, Vladimir I. 1965/1966. Dvojnye kody s ispravleniem vypadenij, vstavok i zameščenij simvolov. Doklady Akademii Nauk SSSR. 163(4). 845–848. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8), 707–710.

  • Malouf, Robert. 2016. Generating morphological paradigms with a recurrent neural network. San Diego Linguistic Papers. 6. 122–129.

  • Malouf, Robert. 2017. Abstractive morphological learning with a recurrent neural network. Morphology. 27. 431–458. .

    • Crossref
    • Export Citation
  • Manning, Christopher D. & Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.

  • Merriënboer, Bart van, Dzmitry Bahdanau, Vincent Dumoulin, Dmitriy Serdyuk, David Warde-Farley, Jan Chorowski & Yoshua Bengio. 2015. Blocks and fuel: Frameworks for deep learning. arXiv preprint arXiv:1506.00619 [cs.LG].

  • Moreno-Sánchez, Isabel, Francesc Font-Clos & Álvaro Corral. 2016. Large-scale analysis of Zipf’s Law in English texts. PLoS One. 11(1). e0147073. .

    • Crossref
    • Export Citation
  • Nesset, Tore & Laura A Janda. 2010. Paradigm structure: Evidence from Russian suffix shift. Cognitive Linguistics. 21(4) 699–725.

  • Nickel, Klaus P & Pekka Sammallahti. 2011. Nordsamisk grammatikk. Karasjok: Davvi Girji.

  • Nivre, Joakim, Marie-Catherine De Marneffe, Filip Ginter, Yoav Goldberg, Christopher D Jan Hajic, Ryan McDonald Manning, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty & Daniel Zeman. 2016. Universal Dependencies v1: A Multilingual Treebank Collection. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk & Stelios Piperidis (eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Paris: European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2016/summaries/348.html

  • Payne, John & Rodney Huddleston. 2002. Nouns and noun phrases. In Rodney Huddleston & Geoffrey Pullum (eds.), The Cambridge Grammar of the English Language, 479–481. Cambridge/New York: Cambridge University Press.

  • Pertsova, Katya & Julia Kuznetsova. 2015. Experimental evidence for lexical conservatism in Russian: Defective verbs revisited. In Yohei Oseki, Masha Esipova & Stephanie Harves (eds.), Proceedings of the 24th Meeting of Formal Approaches to Slavic Linguistics. Ann Arbor, Michigan: Michigan Slavic Publications. https://nyu.edu/projects/fasl24/proceedings/pertsova_kuznetsova_fasl24.pdf

  • Piperski, Alexander. Ch. 2015. To be or not to be: Corpora as indicators of (non-)existence. In V. P. Selegej (ed.), Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference “Dialogue” (2015) 14(1),515–522.

  • Reynolds, Robert J. 2016. Russian natural language processing for computer-assisted language learning. Doctoral Dissertation, UiT The Arctic University of Norway.

  • Sims, Andrea D. 2006. Minding the Gaps: Inflectional Defectiveness in a Paradigmatic Theory. PhD Dissertation, Ohio State University.

  • Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press.

  • Spencer, Andrew. 2016. Two morphologies or one? Inflection versus word-formation. In Andrew Hippisley & Gregory Stump (eds.), The Cambridge Handbook of Morphology, 27–49. Cambridge: Cambridge University Press.

    • Crossref
    • Export Citation
  • Theano Development Team. 2016. Theano: A Python framework for fast computation of mathematical expressions. arXiv:1605.02688v1.

  • Wurzel, Wolfgang U. 1984. Flexionsmorphologie und Natürlichkeit. Berlin: Akademie-Verlag.

  • Wurzel, Wolfgang U. 1989. Inflectional Morphology and Naturalness. Dordrecht. Boston and London: Kluwer Academic Publishers.

  • Zipf, George K. 1949. Human Behavior and the Principle of Least Effort. Reading, MA: Addison-Wesley.

Purchase article
Get instant unlimited access to the article.
$42.00
Log in
Already have access? Please log in.


or
Log in with your institution

Journal + Issues

Corpus Linguistics and Linguistic Theory publishes high-quality, corpus-based research focusing on theoretically-relevant issues in all core areas of linguistic research (phonology, morphology, syntax, semantics, pragmatics) and other recognized topic areas. The journal features articles from a corpus-based approach that develop new methods, evaluate theoretical claims and offer analyses of linguistic phenomena within a theoretical framework.

Search