Abstract
This paper explores the relationships between traditional genetic trees of languages and (unrooted) similarity trees. Genetic trees are typically based on lexical and phonological data and ignore information from any other level of analysis. Similarity trees can directly be computed from corpus data. Since a corpus can be seen as corresponding strings (text, annotation tags), string comparison methods from bioinformatics can be used to compute 'distance scores' between languages. Similarity trees based on phonological and syntactic features are presented for five historical varieties of German, and the scope and limits of such surface-based methods in the calculation of language relationships are discussed
© 2014 by Walter de Gruyter Berlin/Boston