Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter Mouton April 21, 2021

Optimization of morpheme length: a cross-linguistic assessment of Zipf’s and Menzerath’s laws

Matthew Stave ORCID logo, Ludger Paschen, François Pellegrino and Frank Seifart ORCID logo
From the journal Linguistics Vanguard

Abstract

Zipf’s Law of Abbreviation and Menzerath’s Law both make predictions about the length of linguistic units, based on corpus frequency and the length of the carrier unit. Each contributes to the efficiency of languages: for Zipf, units are more likely to be reduced when they are highly predictable, due to their frequency; for Menzerath, units are more likely to be reduced when there are more sub-units to contribute to the structural information of the carrier unit. However, it remains unclear how the two laws work together in determining unit length at a given level of linguistic structure. We examine this question regarding the length of morphemes in spoken corpora of nine typologically diverse languages drawn from the DoReCo corpus, showing that Zipf’s Law is a stronger predictor, but that the two laws interact with one another. We also explore how this is affected by specific typological characteristics, such as morphological complexity.


Corresponding author: Matthew Stave, Univ Lyon and Centre National de la Recherche Scientifique (UMR 5596), Dynamique du Langage, Lyon, France, E-mail:

References

Altmann, Gabriel. 1980. Prolegomena to Menzerath’s law. In Rüdiger Grotjahn (ed.), Glottometrika, vol. 2, 1–10. Bochum: Brockmeyer.Search in Google Scholar

Bentz, Christian & Ramon Ferrer-i-Cancho. 2016. Zipf’s law of abbreviation as a language universal. In Leiden workshop on capturing phylogenetic algorithms for linguistics, 1–4. Tübingen: University of Tübingen.Search in Google Scholar

Dryer, Matthew. 1989. Large linguistic areas and language sampling. Studies in Language 13(2). 257–292. https://doi.org/10.1075/sl.13.2.03dry.Search in Google Scholar

Fedurek, Pawel, Klaus Zuberbühler & Stuart Semple. 2017. Trade-offs in the production of animal vocal sequences: Insights from the structure of wild chimpanzee pant hoots. Frontiers in Zoology 14(1). 50. https://doi.org/10.1186/s12983-017-0235-8.Search in Google Scholar

Fenk, Auguste & Gertraud Fenk-Oczlon. 1993. Menzerath’s law and the constant flow of linguistic information. In Reinhard Köhler & Burghard B. Rieger (eds.), Contributions to quantitative linguistics, 11–31. Dordrecht: Springer.Search in Google Scholar

Ferrer‐i‐Cancho, Ramon, Antoni Hernández‐Fernández, David Lusseau, Govindasamy Agoramoorthy, Minna J. Hsu & Stuart Semple. 2013. Compression as a universal principle of animal behavior. Cognitive Science 37(8). 1565–1578.Search in Google Scholar

Franjieh, Michael. 2018. Fanbyak corpus (Deposit IDs: 0131, 0387). London: ELAR.Search in Google Scholar

Gerlach, Rainer. 1982. Zut Überprüfung des Menzerathschen Gesetzes im Bereich der Morphologie. Glottometrika 4. 95–102.Search in Google Scholar

Gibson, Edward, Richard Futrell, Steven T. Piandadosi, Isabelle Dautriche, Kyle Mahowald, Leon Bergen & Roger Levy. 2019. How efficiency shapes human language. Trends in Cognitive Sciences 23. 389–407. https://doi.org/10.1016/j.tics.2019.02.003.Search in Google Scholar

Hammarström, Harald, Robert Forkel, Martin Haspelmath & Sebastian Bank. 2020. Glottolog 4.3. Jena: Max Planck Institute for the Science of Human History. https://doi.org/10.5281/zenodo.4061162 (accessed 05 March 2021).Search in Google Scholar

Hartmann, Iren. 2004. Hoocąk text corpus. Nijmegen: The Language Archive. Available at: https://hdl.handle.net/1839/bf362b19-2cd6-4bee-af44-a46446077875.Search in Google Scholar

Harvey, Andrew. 2016. The Gorwaa noun phrase: Toward a description of the Gorwaa language (Deposit ID: 0404). London: ELAR.Search in Google Scholar

Haspelmath, Martin. 2018. Explaining grammatical coding asymmetries: Form-frequency correspondences and predictability. Manuscript. Leipzig, ms. Available at: https://ling.auf.net/lingbuzz/004531.Search in Google Scholar

Haspelmath, Martin & Andrea Sims. 2013. Understanding morphology. New York, NY: Routledge.Search in Google Scholar

Heesen, Raphaela, Catherine Hobaiter, Ramon Ferrer-i-Cancho & Stuart Semple. 2019. Linguistic laws in chimpanzee gestural communication. Proceedings of the Royal Society B 286(1896). 20182900. https://doi.org/10.1098/rspb.2018.2900.Search in Google Scholar

Hellwig, Birgit. 2003. Goemai texts (Deposit ID: 0003). London: ELAR.Search in Google Scholar

Hellwig, Birgit. 2007. A Documention of Tabaq, a Hill Nubian language of the Sudan. London: ELAR, in its sociolinguistic context (Deposit ID: 0200). Available at: https://elar.soas.ac.uk/Collection/MPI143018.Search in Google Scholar

Kisler, Thomas, Uwe Reichel & Florian Schiel. 2017. Multilingual processing of speech via web services. Computer Speech & Language 45. 326–347. https://doi.org/10.1016/j.csl.2017.01.005.Search in Google Scholar

Köhler, Reinhard. 1982. Das Menzerathsche Gesetz auf Satzebene. Glottometrika 4. 103–113.Search in Google Scholar

Köhler, Reinhard. 1984. Zur Interpretation des Menzerathschen Gesetzes. Glottometrika 6. 177–183.Search in Google Scholar

Lehiste, Ilse. 1970. Suprasegmentals. Cambridge, MA: The M.I.T. Press.Search in Google Scholar

Leto, Claudia, Winarno S. Alamudi, Nikolaus P. Himmelmann, Jani Kuhnt-Saptodewo, Sonja Riesberg & Hasan Basri. 2010. DoBeS Totoli documentation. DoBeS Archive MPI Nijmegen. Available at: http://www.mpi.nl/DOBES/.Search in Google Scholar

Mačutek, Ján, Radek Čech & Jiří Milička. 2017. Menzerath-Altmann Law in syntactic dependency structure. International conference on dependency linguistics (Depling), 4, 100–107. Linköping: University Electronic Press.Search in Google Scholar

Mandelbrot, Benoit. 1953. An informational theory of the statistical structure of language. Communication Theory 84. 486–502.Search in Google Scholar

Menzerath, Paul. 1928. Über einige phonetische Probleme. In Actes du premier congrès international de linguistes. Leiden: Sijthhoff.Search in Google Scholar

Milička, Jiří. 2014. Menzerath’s law: The whole is greater than the sum of its parts. Journal of Quantitative Linguistics 21(2). 85–99.Search in Google Scholar

Miller, George. A. 1957. Some effects of intermittent silence. American Journal of Psychology 70(2). 311–314. https://doi.org/10.2307/1419346.Search in Google Scholar

Paschen, Ludger, François Delafontaine, Cristoph Draxler, Susanne Fuchs, Matthew Stave & Frank Seifart. 2020. Building a time-aligned cross-linguistic reference corpus from language documentation data (DoReCo). In Proceedings of the 12th language resources and evaluation conference (LREC), 2657–2666. https://www.aclweb.org/anthology/2020.lrec-1.324.Search in Google Scholar

Piantadosi, Steven. T., Harry Tily & Edward Gibson. 2011. Word lengths are optimized for efficient communication. Proceedings of the National Academy of Sciences 108(9). 3526–3529. https://doi.org/10.1073/pnas.1012551108.Search in Google Scholar

Pustet, Regina & Altmann Gabriel. 2005. Morpheme length distribution in Lakota. Journal of Quantitative Linguistics 12(1). 53–63. https://doi.org/10.1080/092961070500055335.Search in Google Scholar

Skopeteas, Stavros. & Violeta Moisidi. 2011. Texts: Urum narrative collection (Working papers of the Urum documentation project). Bielefeld: University of Bielefeld. Available at: http://urum.lili.uni-bielefeld.de/.Search in Google Scholar

Smith, F. J. & K. Devine. 1985. Storing and retrieving word phrases. Information Processing & Management 21(3). 215–224. https://doi.org/10.1016/0306-4573(85)90106-2.Search in Google Scholar

Strunk, Jan., Seifart Frank, S. Danielsen, Hartmann Iren, Brigitte Pakendorf, Søren Wichmann, Alena Witzlack-Makarevich & Balthasar Bickel. 2019. Determinants of phonetic word duration in ten language documentation corpora: Word frequency, complexity, position, and part of speech. Submitted manuscript. University of Cologne, ms.Search in Google Scholar

Teo, Amos. 2013. Documenting traditional agricultural songs and stories of the Sumi Nagas. (Deposit ID: 0128). London: ELAR.Search in Google Scholar

Teupenhayn, R. & Altmann. Gabriel. 1984. Clause length and Menzerath’s law. Glottometrika 6. 127–138.Search in Google Scholar

Vydrina, Alexandra. 2013. Description and documentation of the Kakabe language. (Deposit ID: 0228). London: ELAR.Search in Google Scholar

Zipf, George Kingsley. 1935. The psycho-biology of language: An introduction to dynamic philology. Houghton Mifflin. Reprinted 1968. Cambridge, MA: The M.I.T. Press.Search in Google Scholar

Zipf, George K. 1949. Human behavior and the principle of least effort. Cambridge, MA: Addison-Wesley Press.Search in Google Scholar

Received: 2019-11-01
Accepted: 2020-10-23
Published Online: 2021-04-21

© 2021 Walter de Gruyter GmbH, Berlin/Boston