Accessible Requires Authentication Published by De Gruyter April 1, 2006

Using Corpora in the Calculation of Language Relationships

Anke Lüdeling

Abstract

This paper explores the relationships between traditional genetic trees of languages and (unrooted) similarity trees. Genetic trees are typically based on lexical and phonological data and ignore information from any other level of analysis. Similarity trees can directly be computed from corpus data. Since a corpus can be seen as corresponding strings (text, annotation tags), string comparison methods from bioinformatics can be used to compute 'distance scores' between languages. Similarity trees based on phonological and syntactic features are presented for five historical varieties of German, and the scope and limits of such surface-based methods in the calculation of language relationships are discussed

Online erschienen: 2006-4-1
Erschienen im Druck: 2006-4-1

© 2014 by Walter de Gruyter Berlin/Boston