Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Corpus Linguistics and Linguistic Theory

Founded by Gries, Stefan Th. / Stefanowitsch, Anatol

Ed. by Wulff, Stefanie

2 Issues per year


IMPACT FACTOR 2017: 1.200
5-year IMPACT FACTOR: 1.386

CiteScore 2017: 0.80

SCImago Journal Rank (SJR) 2017: 0.288
Source Normalized Impact per Paper (SNIP) 2017: 0.930

Online
ISSN
1613-7035
See all formats and pricing
More options …

Less is more: why all paradigms are defective, and why that is a good thing

A. Laura Janda / M. Francis Tyers
Published Online: 2018-06-21 | DOI: https://doi.org/10.1515/cllt-2018-0031

Abstract

Only a fraction of lexemes are encountered in all their paradigm forms in any corpus or even in the lifetime of any speaker. This raises a question as to how it is that native speakers confidently produce and comprehend word forms that they have never witnessed. We present the results of an experiment using a recurrent neural network computational learning model. In particular, we compare the model’s production of unencountered forms using two types of training data: full paradigms vs. single word forms for Russian nouns, verbs, and adjectives. In the long run, the model displays better performance when exposed to the more naturalistic training on single word forms, even though the other training data is much larger as it includes full paradigms for each and every word. We discuss why “defective” paradigms may be better for human learners as well.

Keywords: morphology; paradigm; Russian; corpus; computational experiment

References

  • Ackerman, Farrell, James P Blevins & Robert Malouf. 2009. Parts and wholes: Patterns of relatedness in complex morphological systems and why they matter. In James P Blevins & Juliette Blevins (eds.), Analogy in Grammar: Form and Acquisition, 54–82. Oxford: Oxford University Press.Google Scholar

  • Ackerman, Farrell & Robert Malouf. 2016. Implicative relations in word-based morphological systems. In Andrew Hippisley & Gregory Stump (eds.), Cambridge Handbook of Morphology, 297–328. Cambridge: Cambridge University Press.Google Scholar

  • Aharoni, Roee, Yoav Goldberg & Yonatan Belnikov. 2016. Improving sequence to sequence learning for morphological inflection generation: The BIU-MIT Systems for the SIGMORPHON 2016 shared task for morphological reinflection. In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology (SIGMORPHON at ACL) 2016. DOI: CrossrefGoogle Scholar

  • Albright, Adam. 2003. A quantitative study of Spanish paradigm gaps. In G. Garding & M. Tsujimura (eds.), West Coast Conference on Formal Linguistics 22 proceedings. Somerville, MA: Cascadilla Press, 1–14. http://web.mit.edu/albright/www/papers/Albright-WCCFL22.pdf

  • Andrjušina, N. P. 2006. Leksičeskij minimum po russkomu jazyku kak inostrannomu. Bazovyj uroven’. Obščee vladenie. Moscow/St. Petersburg: TsMO MGU/Zlatoust.Google Scholar

  • Arppe, Antti. 2006. Frequency considerations in morphology, revisited - Finnish verbs differ, too. In M. Suominen, A. Arppe, A. Airola, O. Heinämäki, M. Miestamo, U. Määttä, J. Niemi, K. K. Pitkänen, K. Sinnemäki & Kaius (eds.), A Man of Measure. Festschrift in Honour of Fred Karlsson in his 60th Birthday, Special Supplement to SKY Journal of Linguistics. vol. 19/2006. 175–189. Turku: Linguistic Association of Finland. http://www.ling.helsinki.fi/sky/julkaisut/SKY2006_1/1.3.1.ARPPE.pdf.

  • Baayen, R. Harald. 1992. Quantitative aspects of morphological productivity. In Gert E Booij & J. Van Marle (eds.), Yearbook of Morphology 1991, 109–149. Dordrecht: Kluwer Academic Publishers.Google Scholar

  • Baayen, R. Harald. 1993. On frequency, transparency, and productivity. In Gert E Booij & J. Van Marle (eds.), Yearbook of Morphology 1992, 181–208. Dordrecht: Kluwer Academic Publishers.Google Scholar

  • Baerman, Matthew. 2011. Defectiveness and homophony avoidance. Journal of Linguistics. 47(1) 1–29.CrossrefWeb of ScienceGoogle Scholar

  • Blevins, James P. 2016. Word and Paradigm Morphology. Oxford: Oxford University Press.Google Scholar

  • Booij, Gert. 2017. The construction of words In Barbara Dancygier (ed.), The Cambridge Handbook of Cognitive Linguistics, Chapter 15. Cambridge: Cambridge University Press.Google Scholar

  • Bybee, Joan L. 1985. Morphology: A Study of the Relation between Meaning and Form. Amsterdam: John Benjamins.Google Scholar

  • Comrie, Bernard & Maria Polinsky. 1998. The Great Dagestanian Case Hoax. In Anna Siewierska & Jae Jung Song (eds.), Case, Typology, and Grammar, 95–114. Amsterdam: John Benjamins.Google Scholar

  • Corbett, Greville G. 2015. Morphosyntactic complexity: A typology of lexical splits. Language. 91. 145–193. .CrossrefWeb of ScienceGoogle Scholar

  • Cotterell, Ryan, Christo Kirov, John Sylak-Glassman, Gėraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sandra Kübler, David Yarowsky, Jason Eisner & Mans Hulden. 2017. CoNLL-SIGMORPHON 2017 shared task: Universal morphological reinflection in 52 languages.In Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection, 1–30.Google Scholar

  • Cotterell, Ryan, Christo Kirov, John Sylak-Glassman, David Yarowsky, Jason Eisner & Mans Hulden. 2016. The SIGMORPHON 2016 shared task— Morphological reinflection. In Proceedings of the 14th Annual SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 10–22.Google Scholar

  • Cruse, D. A. 1986. Lexical Semantics. Cambridge: Cambridge University Press.Google Scholar

  • Diessel, Holger. 2015. Usage-based construction grammar In Ewa Dąbrowska & Dagmar Divjak (eds.), Handbook of Cognitive Linguistics, Chapter 14. Berlin: De Gruyter Mouton.Google Scholar

  • Faruqui, Manaal, Yulia Tsvetkov, Graham Neubig & Chris Dyer. 2016. Morphological inflection generation using character sequence to sequence learning. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, USA, June 12 - June 17, 2016. https://arxiv.org/abs/1512.06110

  • Goldberg, Adele. 2006. Constructions at work. The nature of generalization in language. Oxford: Oxford University Press.Google Scholar

  • Hart, Betty & Todd R Risley. 2003. The early catastrophe. The 30 million word gap by age 3. American Educator Spring 2003. 4–9.Google Scholar

  • Janda, Laura A & Lene Antonsen. 2016. The ongoing eclipse of possessive suffixes in North Saami: A case study in reduction of morphological complexity. Diachronica. 33(3). 330–366. .CrossrefWeb of ScienceGoogle Scholar

  • Janda, Laura A & Olga Lyashevksaya. 2011. Grammatical profiles and the interaction of the lexicon with aspect, tense and mood in Russian. Cognitive Linguistics. 22(4) 719–763.Web of ScienceGoogle Scholar

  • Kann, Katharina & Hinrich Schütze. 2016a. Single-model encoder-decoder with explicit morphological representation for reinflection. The Association for Computational Linguistics.In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 555–560.Google Scholar

  • Kann, Katharina & Hinrich Schütze. 2016b. MED: The LMU System for the SIGMORPHON 2016 Shared Task on Morphological Reinflection.In Proceedings of the 14th Annual SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 62–70.Google Scholar

  • Karlsson, Fred. 1985. Paradigms and word forms. Studia gramatyczne VII. Ossolineum, 135–154.Google Scholar

  • Karlsson, Fred. 1986. Frequency considerations in morphology. Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung. 39. 19–28.Google Scholar

  • Kibrik, Andrei E. 2001. Archi (Caucasian—Daghestanian) In Andrew Spencer & Arnold M Zwicky (eds.), The Handbook of Morphology, Chapter 23. Hoboken, NJ: Wiley-Blackwell.Google Scholar

  • Kuznetsova, Julia. 2017. The ratio of unique word forms as a measure of creativity. In Anastasia Makarova, Stephen M. Dickey & Dagmar Divjak (eds.), Each Venture a New Beginning: Studies in Honor of Laura A. Janda, 85–97. Bloomington, In Slavica Publishers.Google Scholar

  • Langacker, Ronald W. 2008. Cognitive Grammar: A Basic Introduction. Oxford: Oxford University Press.Google Scholar

  • Levenshtein, Vladimir I. 1965/1966. Dvojnye kody s ispravleniem vypadenij, vstavok i zameščenij simvolov. Doklady Akademii Nauk SSSR. 163(4). 845–848. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8), 707–710.Google Scholar

  • Malouf, Robert. 2016. Generating morphological paradigms with a recurrent neural network. San Diego Linguistic Papers. 6. 122–129.Google Scholar

  • Malouf, Robert. 2017. Abstractive morphological learning with a recurrent neural network. Morphology. 27. 431–458. .CrossrefWeb of ScienceGoogle Scholar

  • Manning, Christopher D. & Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.Google Scholar

  • Merriënboer, Bart van, Dzmitry Bahdanau, Vincent Dumoulin, Dmitriy Serdyuk, David Warde-Farley, Jan Chorowski & Yoshua Bengio. 2015. Blocks and fuel: Frameworks for deep learning. arXiv preprint arXiv:1506.00619 [cs.LG].Google Scholar

  • Moreno-Sánchez, Isabel, Francesc Font-Clos & Álvaro Corral. 2016. Large-scale analysis of Zipf’s Law in English texts. PLoS One. 11(1). e0147073. .CrossrefWeb of ScienceGoogle Scholar

  • Nesset, Tore & Laura A Janda. 2010. Paradigm structure: Evidence from Russian suffix shift. Cognitive Linguistics. 21(4) 699–725.Web of ScienceGoogle Scholar

  • Nickel, Klaus P & Pekka Sammallahti. 2011. Nordsamisk grammatikk. Karasjok: Davvi Girji.Google Scholar

  • Nivre, Joakim, Marie-Catherine De Marneffe, Filip Ginter, Yoav Goldberg, Christopher D Jan Hajic, Ryan McDonald Manning, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty & Daniel Zeman. 2016. Universal Dependencies v1: A Multilingual Treebank Collection. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk & Stelios Piperidis (eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Paris: European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2016/summaries/348.html

  • Payne, John & Rodney Huddleston. 2002. Nouns and noun phrases. In Rodney Huddleston & Geoffrey Pullum (eds.), The Cambridge Grammar of the English Language, 479–481. Cambridge/New York: Cambridge University Press.Google Scholar

  • Pertsova, Katya & Julia Kuznetsova. 2015. Experimental evidence for lexical conservatism in Russian: Defective verbs revisited. In Yohei Oseki, Masha Esipova & Stephanie Harves (eds.), Proceedings of the 24th Meeting of Formal Approaches to Slavic Linguistics. Ann Arbor, Michigan: Michigan Slavic Publications. https://nyu.edu/projects/fasl24/proceedings/pertsova_kuznetsova_fasl24.pdf

  • Piperski, Alexander. Ch. 2015. To be or not to be: Corpora as indicators of (non-)existence. In V. P. Selegej (ed.), Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference “Dialogue” (2015) 14(1),515–522.Google Scholar

  • Reynolds, Robert J. 2016. Russian natural language processing for computer-assisted language learning. Doctoral Dissertation, UiT The Arctic University of Norway.Google Scholar

  • Sims, Andrea D. 2006. Minding the Gaps: Inflectional Defectiveness in a Paradigmatic Theory. PhD Dissertation, Ohio State University.Google Scholar

  • Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press.Google Scholar

  • Spencer, Andrew. 2016. Two morphologies or one? Inflection versus word-formation. In Andrew Hippisley & Gregory Stump (eds.), The Cambridge Handbook of Morphology, 27–49. Cambridge: Cambridge University Press.Google Scholar

  • Theano Development Team. 2016. Theano: A Python framework for fast computation of mathematical expressions. arXiv:1605.02688v1.Google Scholar

  • Wurzel, Wolfgang U. 1984. Flexionsmorphologie und Natürlichkeit. Berlin: Akademie-Verlag.Google Scholar

  • Wurzel, Wolfgang U. 1989. Inflectional Morphology and Naturalness. Dordrecht. Boston and London: Kluwer Academic Publishers.Google Scholar

  • Zipf, George K. 1949. Human Behavior and the Principle of Least Effort. Reading, MA: Addison-Wesley.Google Scholar

About the article

A. Laura Janda

Laura A. Janda (born 1957, Ph.D., UCLA, 1984) is Professor of Russian Linguistics at UiT the Arctic University of Norway. Her special areas of interest are the complex factors associated with the grammatical categories of case and aspect and how these can be investigated using corpus data and experiments.

M. Francis Tyers

Francis M. Tyers (born 1983, Ph.D., Universitat d’Alacant, 2013) is Assistant Professor of Linguistics at Higher School of Economics in Moscow. He is passionate about language technology for lesser-resourced languages and has co-organised workshops on machine translation in a number of countries including Russia and Finland.


Published Online: 2018-06-21


Citation Information: Corpus Linguistics and Linguistic Theory, ISSN (Online) 1613-7035, ISSN (Print) 1613-7027, DOI: https://doi.org/10.1515/cllt-2018-0031.

Export Citation

© 2018 Walter de Gruyter GmbH, Berlin/Boston.Get Permission

Comments (0)

Please log in or register to comment.
Log in