Abstract
This paper deals with the problem of linguistic homoplasy (parallel or backward development), how it can be detected, what kinds of linguistic homoplasy can be distinguished and which varieties of the phenomenon are the most deleterious for the reconstruction of language phylogeny. It is proposed that language phylogeny reconstruction should consist of two main stages. Firstly, a strict consensus tree should be built on the basis of high-quality input data elaborated with the help of the main phylogenetic methods (such as Neighbor-joining, Bayesian MCMC, and Maximum parsimony), and ancestral character states, allowing us to reveal a certain number of homoplastic characters. Secondly, after the detected instances of homoplasy are eliminated from the input matrix, the consensus tree is to be compiled again. It is expected that after homoplastic optimization it will be possible to better resolve individual “problem clades”, and generally the homoplasy-optimized phylogeny should be more robust than the tree constructed initially. The proposed procedure is tested on the 110-item Swadesh wordlists of the Lezgian and Tsezic groups. The Lezgian and Tsezic results generally support theoretical expectations. The MLN (minimal lateral network) method, currently implemented in the LingPy software, is a helpful tool for the detection of linguistic homoplasy.
Acknowledgements
I express my sincere thanks to Michael Cysouw (Marburg), Johann-Mattis List (Paris), George Starostin, Yakov Testelets, and Mikhail Zhivlov (all Moscow) for the discussion and valuable comments. I remain responsible for all possible errors of fact or interpretation.
References
Alekseev, Mikhail. 1998. Tsezskie jazyki [Tsezic languages]. In Mikhail Alekseev (ed.), Jazyki mira: Kavkazskie jazyki [Languages of the world: Caucasian languages], 299–303. Moscow: Academia.Search in Google Scholar
Atkinson, Quentin D. & Russell D. Gray. 2006. How old is the Indo-European language family? Progress or more moths to the flame? In Peter Forster & Colin Renfrew (eds.), Phylogenetic methods and the prehistory of languages, 91–109. Cambridge: McDonald Institute for Archaeological Research.Search in Google Scholar
Balanovsky, Oleg, Khadizhat Dibirova, Anna Dybo, Oleg Mudrak, Svetlana Frolova et al. 2011. Parallel evolution of genes and languages in the Caucasus region. Molecular Biology and Evolution 28(10). 2905–2920.10.1093/molbev/msr126Search in Google Scholar
Berg, Helma van den. 1995. A grammar of Hunzib. With texts and lexicon. Leiden: Proefschrift ter verkrijging van de graad van Doctor aan de Rijksuniversiteit te Leiden, 25 januari 1995.Search in Google Scholar
Bokarev, Evgeny A. 1959. Tsezskie (didojskie) jazyki Dagestana [Tsezic languages of Dagestan]. Moscow: Izd-vo AN SSSR.Search in Google Scholar
Bryant, David, Flavia Filimon & Russell D. Gray. 2005. Untangling our past: languages, trees, splits and networks. In Ruth Mace, Clare Holden & Stephen Shennan (eds.), The Evolution of cultural diversity: a phylogenetic approach, 69–85. London: UCL Press.Search in Google Scholar
Bryant, David & Vincent Moulton. 2004. NeighborNet: an agglomerative algorithm for the construction of phylogenetic networks. Molecular Biology and Evolution 21. 255–265.10.1093/molbev/msh018Search in Google Scholar
Burlak, Svetlana A. & Sergei A. Starostin. 2005. Sravnitel’no-istoricheskoe jazykoznanie [Historical linguistics]. 2nd ed. Moscow: Academia.Search in Google Scholar
Chang, Will, Chundra Cathcart, David Hall & Andrew Garrett. 2015. Ancestry-constrained phylogenetic analysis supports the Indo-European steppe hypothesis. Language 91(1). 194–244.10.1353/lan.2015.0005Search in Google Scholar
Cysouw, Michael & Diana Forker. 2009. Reconstruction of morphosyntactic function: Nonspatial usage of spatial case marking in Tsezic. Language 85(3). 588–617.10.1353/lan.0.0147Search in Google Scholar
Dyen, Isidore, Joseph Kruskal & Paul Black. 1997. Comparative Indo-European database. The file was last modified on February 5, 1997. http://www.wordgumbo.com/ie/cmp/ (accessed 7 July 2014).Search in Google Scholar
Evans, Steven N., Donald Ringe & Tandy Warnow. 2006. Inference of divergence times as a statistical inverse problem. In Peter Forster & Colin Renfrew (eds.), Phylogenetic methods and the prehistory of languages, 119–130. Cambridge: McDonald Institute for Archaeological Research.Search in Google Scholar
Forker, Diana. 2013. A grammar of Hinuq. Berlin & Boston: Mouton De Gruyter.10.1515/9783110303971Search in Google Scholar
Gascuel, Olivier. 1997. BIONJ: An improved version of the NJ algorithm based on a simple model of sequence data. Molecular Biology and Evolution 14. 685–695.10.1093/oxfordjournals.molbev.a025808Search in Google Scholar
Goloboff, Pablo A., James S. Farris & Kevin C. Nixon. 2008. TNT, a free program for phylogenetic analysis. Cladistics 24(5). 774–786.10.1111/j.1096-0031.2008.00217.xSearch in Google Scholar
Gray, Russell D. & Quentin D. Atkinson. 2003. Language-tree divergence times support the Anatolian theory of Indo-European origin. Nature 426. 435–439.10.1038/nature02029Search in Google Scholar
Harbert, Wayne. 2007. The Germanic languages. Cambridge: Cambridge University Press.10.1017/CBO9780511755071Search in Google Scholar
Haspelmath, Martin. 2009. Lexical borrowing: Concepts and issues. In Martin Haspelmath & Uri Tadmor (eds.), Loanwords in the world’s languages: A comparative handbook, 35–54. Berlin & New York: Mouton De Gruyter.10.1515/9783110218442Search in Google Scholar
Haugen, Einar. 1950. The analysis of linguistic borrowing. Language 26(2). 210–231.10.2307/410058Search in Google Scholar
Holden, Clare J. & Russell D. Gray. 2006. Rapid radiation, borrowing and dialect continua in the Bantu languages. In Peter Forster & Colin Renfrew (eds.), Phylogenetic methods and the prehistory of languages, 19–31. Cambridge: McDonald Institute for Archaeological Research.Search in Google Scholar
Huelsenbeck, John P. & Fredrik Ronquist. 2001. MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics 17(8). 754–755.10.1093/bioinformatics/17.8.754Search in Google Scholar
Huson, Daniel H. & David Bryant. 2006. Application of phylogenetic networks in evolutionary studies. Molecular Biology and Evolution 23(2). 254–267.10.1093/molbev/msj030Search in Google Scholar
Imnaishvili, David S. 1963. Didojskij jazyk v sravnenii s ginukhskim i khvarshijskim jazykami [Tsez language in comparison with the Hinukh and Khwarshi languages]. Tbilisi: Mecniereba.Search in Google Scholar
Kassian, Alexei. 2011–2012. Annotated Swadesh wordlists for the Lezgian group (North Caucasian family). In George Starostin (ed.), The global lexicostatistical database. Moscow & Santa Fe: Center for Comparative Studies at the Russian State University for the Humanities; Santa Fe Institute. http://starling.rinet.ru/new100 (accessed 7 May 2015).Search in Google Scholar
Kassian, Alexei. 2013–2015. Annotated Swadesh wordlists for the Tsezic group (North Caucasian family). In George Starostin (ed.), The global lexicostatistical database. Moscow & Santa Fe: Center for Comparative Studies at the Russian State University for the Humanities; Santa Fe Institute. http://starling.rinet.ru/new100 (accessed 7 May 2015).Search in Google Scholar
Kassian, Alexei. 2015. Towards a formal genealogical classification of the Lezgian languages (North Caucasus): Testing various phylogenetic methods on lexical data. PLoS ONE 10(2): e0116950. doi:10.1371/journal.pone.0116950.Search in Google Scholar
Kassian, Alexei, George Starostin, Anna Dybo & Vasiliy Chernov. 2010. The Swadesh wordlist. An attempt at semantic specification. Journal of Language Relationship 4. 46–89.Search in Google Scholar
Kassian, Alexei & Yakov Testelets. 2017. Classification of the Tsezic languages and the controversy of Hinukh (North Caucasus). Lingua 196. 98–118, http://dx.doi.org/10.1016/j.lingua.2017.06.011Search in Google Scholar
Kassian, Alexei, Mikhail Zhivlov & George Starostin. 2015. Proto–Indo-European–Uralic comparison from the probabilistic point of view. Journal of Indo-European Studies 43(3–4). 301–347.Search in Google Scholar
Khalilova, Zaira. 2009. A grammar of Khwarshi. Leiden: Proefschrift ter verkrijging van de graad van Doctor aan de Universiteit Leiden, 17 December 2009.Search in Google Scholar
Kiparsky, Valentin. 1967. Russische historische Grammatik, Bd. 2: Die Entwicklung des Formensystems. Heidelberg: Carl Winter Universitätsverlag.Search in Google Scholar
Kitchen, Andrew, Christopher Ehret, Shiferaw Assefa & Connie J. Mulligan. 2009. Bayesian phylogenetic analysis of Semitic languages identifies an Early Bronze Age origin of Semitic in the Near East. Proceedings of the Royal Society B 276. 2703–2710.10.1098/rspb.2009.0408Search in Google Scholar
Koryakov, Yuri B. 2006. Atlas kavkazskikh jazykov: s prilozheniem polnogo reestra jazykov [Atlas of the Caucasian languages with language guide]. Moscow: Institute of Linguistics.Search in Google Scholar
Kushniarevich, Alena, Olga Utevska, Marina Chuhryaeva, Anastasia Agdzhoyan, Khadizhat Dibirova, Ingrida Uktveryte, Märt Möls, Lejla Mulahasanovic, Andrey Pshenichnov, Svetlana Frolova, Andrey Shanko, Ene Metspalu, Maere Reidla, Kristiina Tambets, Erika Tamm, Sergey Koshel, Valery Zaporozhchenko, Lubov Atramentova, Vaidutis Kučinskas, Oleg Davydenko, Olga Goncharova, Irina Evseeva, Michail Churnosov, Elvira Pocheshchova, Bayazit Yunusbayev, Elza Khusnutdinova, Damir Marjanović, Pavao Rudan, Siiri Rootsi, Nick Yankovsky, Phillip Endicott, Alexei Kassian, Anna Dybo, The Genographic Consortium, Chris Tyler-Smith, Elena Balanovska, Mait Metspalu, Toomas Kivisild, Richard Villems & Oleg Balanovsky. 2015. Genetic heritage of the Balto-Slavic speaking populations: A synthesis of autosomal, mitochondrial and Y-chromosomal data. PLoS ONE 10(9): e0135820. doi:10.1371/journal.pone.0135820.Search in Google Scholar
Lees, Robert B. 1953. The basis of glottochronology. Language 29(2). 113–127.10.2307/410164Search in Google Scholar
List, Johann-Mattis & Steven Moran. 2013. An open source toolkit for quantitative historical linguistics. In Proceedings of the 51st Annual meeting of the association for computational linguistics: System demonstrations, 13–18. Stroudsburg, PA: Association for Computational Linguistics.Search in Google Scholar
List, Johann-Mattis, Steven Moran, Peter Bouda & Johannes Dellert. 2014c. LingPy: Python library for quantitative tasks in historical linguistics. Version 2.4.1.alpha, DOI: 10.5281/zenodo.11886. Marburg: Forschungszentrum Deutscher Sprachatlas. http://lingpy.org/ (accessed 28 September 2014).Search in Google Scholar
List, Johann-Mattis, Shijulal Nelson-Sathi, Hans Geisler & William Martin. 2014a. Networks of lexical borrowing and lateral gene transfer in language and genome evolution. BioEssays 36(2). 141–150.10.1002/bies.201300096Search in Google Scholar
List, Johann-Mattis, Shijulal Nelson-Sathi, William Martin & Hans Geisler. 2014b. Using phylogenetic networks to model Chinese dialect history. Language Dynamics and Change 4. 222–252.10.1163/22105832-00402008Search in Google Scholar
Lomtadze, Elizbar. 1963. Ginukhskij dialekt didojskogo jazyka [Hinukh dialect of the Tsez language]. Tbilisi: Mecniereba.Search in Google Scholar
Makarenkov, Vladimir, Dmytro Kevorkov & Pierre Legendre. 2006. Phylogenetic network construction approaches. In Dilip K. Arora, Randy M. Berka & Gautam B. Singh (eds.), Applied mycology and biotechnology, Vol. 6: Bioinformatics, 61–98. Amsterdam & Boston: Elsevier.10.1016/S1874-5334(06)80006-7Search in Google Scholar
Müller, André, Viveka Velupillai, Søren Wichmann, Cecil H. Brown, Eric W. Holman et al. ASJP world language trees of lexical similarity: Version 4 (October 2013). http://asjp.clld.org/download (accessed 7 May 2015).Search in Google Scholar
Nakhleh, Luay, Don Ringe & Tandy Warnow. 2005. Perfect phylogenetic networks: A new methodology for reconstructing the evolutionary history of natural languages. Language 81(2). 382–420.10.1353/lan.2005.0078Search in Google Scholar
Nelson-Sathi, Shijulal, Johann-Mattis List, Hans Geisler, Heiner Fangerau, Russell D. Gray, William Martin & Tal Dagan, 2011. Networks uncover hidden lexical borrowing in Indo-European language evolution. Proceedings of the Royal Society B 278. 1794–1803.10.1098/rspb.2010.1917Search in Google Scholar
Nikolayev (Nikolaev), Sergei L. 1978. Rekonstruktsija foneticheskoj sistemy pratsezskogo jazyka [Reconstruction of the Proto-Tsezic phonological system]. In Victoria N. Yartseva (ed.), Konferentsiya: Problemy rekonstruktsii (tezisy dokladov), 87–89. Moscow: Institut jazykoznaniya AN SSSR.Search in Google Scholar
Novotná, Petra & Blažek. Václav 2007. Glottochronology and its application to the Balto-Slavic languages. Baltistica 42(2). 185–210; Baltistica 42(3). 323–346.10.15388/baltistica.42.2.1168Search in Google Scholar
Pagel, Mark & Andrew Meade. 2006. Estimating rates of lexical replacement on phylogenetic trees of languages. In Peter Forster & Colin Renfrew (eds.), Phylogenetic Methods and the Prehistory of Languages, 173–182. Cambridge: McDonald Institute for Archaeological Research.Search in Google Scholar
Renfrew, Colin. 2000. At the edge of knowability: Towards a prehistory of languages. Cambridge Archaeological Journal 10(1). 7–34.10.1017/S0959774300000019Search in Google Scholar
Rexová, Kateřina, Daniel Frynta & Zrzavý. January 2003. Cladistic analysis of languages: Indo-European classification based on lexicostatistical data. Cladistics 19. 120–127.10.1111/j.1096-0031.2003.tb00299.xSearch in Google Scholar
Ringe, Don, Tandy Warnow & Ann Taylor. 2002. Indo-European and computational cladistics. Transactions of the Philological Society 100(1). 59–129.10.1111/1467-968X.00091Search in Google Scholar
Saitou, Naruya & Masatoshi Nei. 1987. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4. 406–425.Search in Google Scholar
Semple, Charles & Mike Steel. 2003. Phylogenetics. Oxford: Oxford University Press.Search in Google Scholar
Sneath, Peter H. & Robert R. Sokal. 1973. Numerical Taxonomy. San Francisco: W. H. Freeman and Company.Search in Google Scholar
Starostin, George S. 2010. Preliminary lexicostatistics as a basis for language classification: A new approach. Journal of Language Relationship 3. 79–116.Search in Google Scholar
Starostin, George S. 2011. Annotated Swadesh wordlists for the Nakh group (North Caucasian family). In George Starostin (ed.), The global lexicostatistical database. Moscow & Santa Fe: Center for Comparative Studies at the Russian State University for the Humanities; Santa Fe Institute. http://starling.rinet.ru/new100 (accessed 7 May 2015).Search in Google Scholar
Starostin, George S. (ed.). 2011–2015. The global lexicostatistical database. Moscow & Santa Fe: Center for Comparative Studies at the Russian State University for the Humanities; Santa Fe Institute. http://starling.rinet.ru/new100 (accessed 7 May 2015).Search in Google Scholar
Starostin, George S. 2013a. Jazyki Afriki. Opyt postroenija leksikostatisticheskoj klassifikatsii [Languages of Africa: A new lexicostatistical classification]. Vol. 1: Metod. Kojsanskie jazyki [Methodology. Khoisan languages]. Moscow: LRC.Search in Google Scholar
Starostin, George S. 2013b. Lexicostatistics as a basis for language classification: Increasing the pros, reducing the cons. In Heiner Fangerau, Hans Geisler, Thorsten Halling & William Martin (eds.), Classification and evolution in biology, linguistics and the history of science: Concepts – methods – visualization, 125–146. Stuttgart: Franz Steiner Verlag.Search in Google Scholar
Starostin, Sergei A. 1994. Lezgian etymological database, computerized version of the Proto-Lezgian corpus which includes some Proto-Lezgian etymologies (mostly basic lexicon items) that have not been included in Starostin & Nikolayev 1994 due to their lack of external cognates in other branches of North Caucasian. http://starling.rinet.ru/cgi-bin/main.cgi (accessed 10 September 2014).Search in Google Scholar
Starostin, Sergei A. (ed.). 1998–2005. The Tower of Babel: An etymological database project. http://starling.rinet.ru/ (accessed 7 May 2015).Search in Google Scholar
Starostin, Sergei A. 2000. Comparative-historical linguistics and lexicostatistics. In Colin Renfrew, April McMahon & Larry Trask (eds.). Time depth in historical linguistics, 223–259. Cambridge: McDonald Institute for Archaeological Research, 2000. First publ. in Vitaly Shevoroshkin & Paul J. Sidwell (eds.), 1999, Historical linguistics and lexicostatistics, 3–50. Melbourne: Association for the History of Language.Search in Google Scholar
Starostin, Sergei A. 2007. Opredelenie ustojchivosti bazisnoj leksiki [Defining the stability of basic lexicon]. In Sergei A. Starostin, Trudy po jazykoznaniju [Works in linguistics], 827–839. Moscow: LRC.Search in Google Scholar
Starostin, Sergei A. 2007 [1989]. Sravnitel’no-istoricheskoe jazykoznanie i leksikostatistika [Historical linguistics and lexicostatistics]. In Sergei A. Starostin, Trudy po jazykoznaniju [Works in linguistics], 407–447. Moscow: LRC. First publ. in Ilia Peiros (ed.), Lingvisticheskaja rekonstruktsija i drevnejshaja istorija Vostoka, 3–39. Moscow, 1989. English version: S. Starostin 2000.Search in Google Scholar
Starostin, Sergei A. 2007 [1993]. Rabochaja sreda dlya lingvista [Linguist’s workspace]. In Sergei A. Starostin, Trudy po jazykoznaniju [Works in linguistics], 481–496. Moscow: LRC. First publ. in Bazy dannykh po istorii Evrazii v srednie veka 2, 50–64, Moscow: Institut vostokovedenija RAN, 1993.Search in Google Scholar
Starostin, Sergei A. n d. Istoricheskaja fonetika lezginskikh jazykov [Lezgian historical phonology]. Unpubl. ms, 1980s.Search in Google Scholar
Starostin, Sergei A. & Sergei L. Nikolayev (Nikolaev). 1994. A North Caucasian etymological dictionary. Moscow: Asterisk [reprinted: 3 vols. Ann Arbor: Caravan Books, 2007]. Available online at the Tower of Babel project: http://starling.rinet.ru/cgi-bin/main.cgi (accessed 10 September 2014).Search in Google Scholar
Swadesh, Morris. 1952. Lexico-statistic dating of prehistoric ethnic contacts. Proceedings of the American Philosophical Society 96. 453–463.Search in Google Scholar
Swadesh, Morris, 1955. Towards greater accuracy in lexicostatistic dating. International Journal of American Linguistics 21. 121–137.10.1086/464321Search in Google Scholar
Testelets, Yakov G. 1993. K sravnitelno-istoricheskoj fonetike tsezskix jazykov (rekonstruktsija vokalizma) [Towards a historical phonology of the Tsezic languages: vowels]. In Tatiana M. Nikolaeva (ed.). Problemy fonetiki 1, 126–134. Moscow: Prometej.Search in Google Scholar
Supplemental Material
The online version of this article offers supplementary material (https://doi.org/10.1515/flih-2017-0008).
© 2017 Walter de Gruyter GmbH, Berlin/Boston