Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Yearbook of the Poznan Linguistic Meeting

1 Issue per year

Open Access
See all formats and pricing
More options …

Challenges of annotation and analysis in computer-assisted language comparison: A case study on Burmish languages

Nathan W. Hill / Johann-Mattis List
Published Online: 2017-09-13 | DOI: https://doi.org/10.1515/yplm-2017-0003


The use of computational methods in comparative linguistics is growing in popularity. The increasing deployment of such methods draws into focus those areas in which they remain inadequate as well as those areas where classical approaches to language comparison are untransparent and inconsistent. In this paper we illustrate specific challenges which both computational and classical approaches encounter when studying South-East Asian languages. With the help of data from the Burmish language family we point to the challenges resulting from missing annotation standards and insufficient methods for analysis and we illustrate how to tackle these problems within a computer-assisted framework in which computational approaches are used to pre-analyse the data while linguists attend to the detailed analyses.

Keywords: historical linguistics; linguistic reconstruction; Burmish languages; annotation; analysis; computer-assisted language comparison


  • Atkinson, Q. and R. Gray. 2006. “How old is the Indo-European language family? Illumination or more moths to the flame?” In: Forster, P. and C. Renfrew (eds.), Phylogenetic methods and the prehistory of languages. Cambridge, Oxford and Oakville: McDonald Institute for Archaeological Research. 91-109.Google Scholar

  • Bagga, A. and B. Baldwin. 1998. “Entity-based cross-document coreferencing using the vector space model”. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics. Association of Computational Linguistics. 79-85.Google Scholar

  • Blevins, J. 2004. Evolutionary phonology. The emergence of sound patterns. Cambridge: Cambridge University Press.Google Scholar

  • Burling, R. 1967. Proto-Lolo-Burmese. Bloomington: Indiana University Press. Google Scholar

  • Butler, A. and W. Saidel. 2000. “Defining sameness: Historical, biological, and generative homology”. BioEssays 22. 846-853.CrossrefGoogle Scholar

  • Campbell, L. 2013. Historical linguistics. Edinburgh: Edinburgh University Press.Google Scholar

  • Clerk, F. 1911. A manual of the Lawngwaw or Maru language, containing: the grammatical principles of the language, glossaries of special terms, colloquial exercises, and Maru-English and English-Maru vocabularies. Rangoon: American Baptist mission Press.Google Scholar

  • Corel, E., P. Lopez, R. Meheust and E. Bapteste. 2016. “Network-thinking: Graphs to analyze microbial complexity and evolution”. Trends in Microbiology 24(3). 224-237.CrossrefGoogle Scholar

  • Covington, M. 1996. “An algorithm to align words for historical comparison”. Computational Linguistics 22(4). 481-496.Google Scholar

  • Dixon, R. and A. Kroeber. 1919. Linguistic families of California. Berkeley: University of California Press.Google Scholar

  • Dunn, M. (ed.). 2012. Indo-European lexical cognacy database (IELex). http://ielex.mpi.nl/.Google Scholar

  • Fox, A. 1995. Linguistic reconstruction. An introduction to theory and method. Oxford; Oxford University Press.Google Scholar

  • François, A. 2008. “Semantic maps and the typology of colexification: Intertwining polysemous networks across languages”. In: Vanhove, M. (ed.), From polysemy to semantic change.Amsterdam: Benjamins. 163-215.Google Scholar

  • Gabelentz, G. v. d. 1891. Die Sprachwissenschaft. Ihre Aufgaben, Methoden und bisherigen Ergebnisse. Leipzig: T. O. Weigel.Google Scholar

  • Gabelentz, G. v. d. 1892. Handbuch zur Aufnahme fremder Sprachen [Handbook for the description of foreign languages]. Berlin: Ernst Siegfried Mittler & Sohn.Google Scholar

  • Greenhill, S., R. Blust and R. Gray. 2008. “The Austronesian Basic Vocabulary Database: From bioinformatics to lexomics”. Evolutionary Bioinformatics 4. 271-283.Google Scholar

  • Haas, M. 1969. The prehistory of languages. Mouton: The Hague and Paris.Google Scholar

  • Hammarstrom, H., R. Forkel and M. Haspelmath. 2017. Glottolog. Leipzig: Max Planck Institute for Evolutionary Anthropology.Google Scholar

  • Holm, H. 2007. “The new arboretum of Indo-European ‘trees’. Can new algorithms reveal the phylogeny and even prehistory of Indo-European?” Journal of Quantitative Linguistics 14(2-3). 167-214.CrossrefGoogle Scholar

  • Huáng Bufan 黃布凡 .1992. Zangmiǎn yǔzu yǔyan cihui [A Tibeto-Burman lexicon]. Zhōngyāng Minzu Daxue 中央民族大学 [Central Institute of Minorities]: Běijīng 北京.Google Scholar

  • Jenny, M. and P. Sidwell (eds.). 2015. The handbook of Austroasiatic languages. Leiden and Boston: Brill.Google Scholar

  • Kiparsky, P. 1988. “Phonological change”. In: Newmeyer, F. (ed.), The Cambridge Survey of Linguistics (vol. 1). Cambridge: Cambridge University Press. 363-415.Google Scholar

  • Koerner, E. 1976. “Zu Ursprung und Geschichte der Besternung in der historischen Sprachwissenschaft. Eine historiographische Notiz”. Zeitschrift fur vergleichende Sprachforschung 89(2). 185-190.Google Scholar

  • Kondrak, G. 2000. “A new algorithm for the alignment of phonetic sequences”.In: Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference. 288-295.Google Scholar

  • Koonin, E. 2005. “Orthologs, paralogs, and evolutionary genomics”. Annual Review of Genetics 39. 309-338.Google Scholar

  • Kroonen, G. 2013. Etymological dictionary of Proto-Germanic. Leiden and Boston: Brill.Google Scholar

  • Kürschner, W. 2014. “Georg von der Gabelentz’ Handbuch zur Aufnahme fremder Sprachen (1892). Entstehung, Ziele, Arbeitsweise, Wirkung“. In: Ezawa, K., F. Hundsnurscher and A. Vogel (eds.), Beitrage zur Gabelentz-Forschung. Tubingen: Narr. 239-259.Google Scholar

  • Labov, W. 1981. “Resolving the Neogrammarian Controversy”. Language 57(2). 267-308.CrossrefGoogle Scholar

  • List, J.-M. 2012. “LexStat. Automatic detection of cognates in multilingual wordlists”. In: Proceedings of the EACL 2012 Joint Workshop of Visualization of Linguistic Patterns and Uncovering Language History from Multilingual Resources. 117-125.Google Scholar

  • List, J.-M., A. Terhalle and M. Urban. 2013. “Using network approaches to enhance the analysis of cross-linguistic polysemies”. In: Proceedings of the 10th International Conference on Computational Semantics - Short Papers. Association for Computational Linguistics. 347-353.Google Scholar

  • List, J.-M., S. Nelson-Sathi, W. Martin and H. Geisler. 2014. “Using phylogenetic networks to model Chinese dialect history”. Language Dynamics and Change 4(2). 222-252.Google Scholar

  • List, J.-M. 2014. Sequence comparison in historical linguistics. Dusseldorf: Dusseldorf University Press.Google Scholar

  • List, J.-M. 2015. “Network perspectives on Chinese dialect history”. Bulletin of Chinese Linguistics 8. 42-67.Google Scholar

  • List, J.-M., M. Cysouw and R. Forkel. 2016. “Concepticon. A resource for the linking of concept lists”. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation. 2393-2400.Google Scholar

  • List, J.-M. and R. Forkel. 2016. LingPy. A Python library for historical linguistics. Jena: Max Planck Institute for the Science of Human History.Google Scholar

  • List, J.-M. 2016. “Beyond cognacy: Historical relations between words and their implication for phylogenetic reconstruction”. Journal of Language Evolution 1(2). 119-136.Google Scholar

  • List, J.-M., P. Lopez and E. Bapteste. 2016. “Using sequence similarity networks to identify partial cognates in multilingual wordlists”. In: Proceedings of the Association of Computational Linguistics 2016. (Volume 2: Short Papers.) Association of Computational Linguistics. 599-605.Google Scholar

  • List, J.-M., S. Greenhill and R. Gray. 2017. “The potential of automatic word comparison for historical linguistics”. PLOS ONE 12(1). 1-18.Google Scholar

  • List, J.-M. 2017. “A web-based interactive tool for creating, inspecting, editing, and publishing etymological datasets”. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. System Demonstrations. 9-12.Google Scholar

  • Luce, G.H. 1985. Phases of Pre-Pagan Burma: Languages and history. Oxford: Oxford University Press.Google Scholar

  • Makaev, E. 1977. Obščaja teorija sravnitel’nogo jazykoznanija [General theory of comparative linguistics]. Moscow: Nauka.Google Scholar

  • Malkiel, Y. 1954. “Etymology and the structure of word families”. Word 10(2-3). 265-274.CrossrefGoogle Scholar

  • Mann, N. 1998. A phonological reconstruction of Proto Northern Burmic. (MA thesis, the University of Texas at Arlington.)Google Scholar

  • Matisoff, J. 2015. The Sino-Tibetan Etymological Dictionary and Thesaurus project. Berkeley: University of California.Google Scholar

  • McMahon, A. and R. McMahon. 2005. Language classification by numbers. Oxford: Oxford University Press.Google Scholar

  • Meier-Brügger, M. 2002. Indogermanische Sprachwissenschaft. Berlin: de Gruyter. Google Scholar

  • Meiser, G. 1998. Historische Laut- und Formenlehre der lateinischen Sprache. Darmstadt: Wissenschaftliche Buchgesellschaft.Google Scholar

  • Morrison, D. 2015. “Molecular homology and multiple-sequence alignment: an analysis of concepts and practice”. Australian Systematic Botany 28. 46-62.Google Scholar

  • Nishi, Y. 1999. Four papers on Burmese: Toward the history of Burmese (the Myanmar language). Tokyo: Institute for the study of languages and cultures of Asia and Africa, Tokyo University of Foreign Studies.Google Scholar

  • Norquest, P. 2007. A phonological reconstruction of Proto-Hlai. (PhD dissertation, The University of Arizona.)Google Scholar

  • Okell, J. 1971. “K Clusters in Proto-Burmese”. Paper presented at the Sino-Tibetan Conference, October 8-9, 1971. Bloomington, IN.Google Scholar

  • Payne, D. 1991. “A classification of Maipuran (Arawakan) languages based on shared lexical retentions”. In: Derbyshire, D. and G. Pullum (eds.), Handbook of Amazonian languages (vol. 3). Berlin: Mouton de Gruyter. 355-499.Google Scholar

  • Prokić, J., M. Wieling and J. Nerbonne. 2009. “Multiple sequence alignments in linguistics”. In: Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education. 18-25.Google Scholar

  • Ratliff, M. 2010. Hmong-Mien language history. Canberra: Pacific Linguistics.Google Scholar

  • Schwink, F. 1994. Linguistic typology, universality and the realism of reconstruction. Washington: Institute for the Study of Man.Google Scholar

  • Smoot, M., K. Ono, J. Ruscheinski, P. Wang and T. Ideker. 2011. “Cytoscape 2.8. New features for data integration and network visualization”. Bioinformatics 27(3). 431-432.CrossrefGoogle Scholar

  • Steiner, L., P. Stadler and M. Cysouw. 2011. “A pipeline for computational historical linguistics”. Language Dynamics and Change 1(1). 89-127.Google Scholar

  • Sturtevant, E. 1920. The pronunciation of Greek and Latin. Chicago: University of Chicago Press.Google Scholar

  • Swadesh, M. 1963. “A punchcard system of cognate hunting”. International Journal of American Linguistics 29(3). 283-288.CrossrefGoogle Scholar

  • Urban, M. 2011. “Asymmetries in overt marking and directionality in semantic change”. Journal of Historical Linguistics 1(1). 3-47.Google Scholar

  • Vaan, M. 2008. Etymological dictionary of Latin and the other Italic languages. Leiden: Brill.Google Scholar

  • Wannemacher, M. 2011. A phonological overview of the Lacid language. Chiang Mai: Linguistics Institute, Payap University.Google Scholar

About the article

Published Online: 2017-09-13

Published in Print: 2017-09-26

Citation Information: Yearbook of the Poznan Linguistic Meeting, Volume 3, Issue 1, Pages 47–76, ISSN (Online) 2449-7525, DOI: https://doi.org/10.1515/yplm-2017-0003.

Export Citation

© 2017. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. BY-NC-ND 4.0

Comments (0)

Please log in or register to comment.
Log in