tailieunhanh - Báo cáo khoa học: "Measuring Language Divergence by Intra-Lexical Comparison"

This paper presents a method for building genetic language taxonomies based on a new approach to comparing lexical forms. Instead of comparing forms cross-linguistically, a matrix of languageinternal similarities between forms is calculated. These matrices are then compared to give distances between languages. We argue that this coheres better with current thinking in linguistics and psycholinguistics. An implementation of this approach, called PHILOLOGICON, is described, along with its application to Dyen et al.’s (1992) ninety-five wordlists from Indo-European languages. . | Measuring Language Divergence by Intra-Lexical Comparison T. Mark Ellison Simon Kirby Informatics Language Evolution and Computation Research Unit University of Edinburgh Philosophy Psychology and Language Sciences mark@markellison. net University of Edinburgh simon@ Abstract This paper presents a method for building genetic language taxonomies based on a new approach to comparing lexical forms. Instead of comparing forms cross-linguistically a matrix of languageinternal similarities between forms is calculated. These matrices are then compared to give distances between languages. We argue that this coheres better with current thinking in linguistics and psycholinguistics. An implementation of this approach called PHILOLOGICON is described along with its application to Dyen et al. s 1992 ninety-five wordlists from Indo-European languages. 1 Introduction Recently there has been burgeoning interest in the computational construction of genetic language taxonomies Dyen et al. 1992 Nerbonne and Heeringa 1997 Kondrak 2002 Ringe et al. 2002 Benedetto et al. 2002 McMahon and McMahon 2003 Gray and Atkinson 2003 Nakleh et al. 2005 . One common approach to building language taxonomies is to ascribe language-language distances and then use a generic algorithm to construct a tree which explains these distances as much as possible. Two questions arise with this approach. The first asks what aspects of languages are important in measuring inter-language distance. The second asks how to measure distance given these aspects. A more traditional approach to building language taxonomies Dyen et al. 1992 answers these questions in terms of cognates. A word in language A is said to be cognate with word in language B if the forms shared a common ancestor in the parent language of A and B. In the cognatecounting method inter-language distance depends on the lexical forms of the languages. The distance between two languages is a function of the number or fraction of these .