tailieunhanh - Báo cáo khoa học: "Finding Cognate Groups using Phylogenies"
A central problem in historical linguistics is the identification of historically related cognate words. We present a generative phylogenetic model for automatically inducing cognate group structure from unaligned word lists. Our model represents the process of transformation and transmission from ancestor word to daughter word, as well as the alignment between the words lists of the observed languages. We also present a novel method for simplifying complex weighted automata created during inference to counteract the otherwise exponential growth of message sizes. . | Finding Cognate Groups using Phylogenies David Hall and Dan Klein Computer Science Division University of California Berkeley dlwh klein @ Abstract A central problem in historical linguistics is the identification of historically related cognate words. We present a generative phylogenetic model for automatically inducing cognate group structure from unaligned word lists. Our model represents the process of transformation and transmission from ancestor word to daughter word as well as the alignment between the words lists of the observed languages. We also present a novel method for simplifying complex weighted automata created during inference to counteract the otherwise exponential growth of message sizes. On the task of identifying cognates in a dataset of Romance words our model significantly outperforms a baseline approach increasing accuracy by as much as 80 . Finally we demonstrate that our automatically induced groups can be used to successfully reconstruct ancestral words. 1 Introduction A crowning achievement of historical linguistics is the comparative method Ohala 1993 wherein linguists use word similarity to elucidate the hidden phonological and morphological processes which govern historical descent. The comparative method requires reasoning about three important hidden variables the overall phylogenetic guide tree among languages the evolutionary parameters of the ambient changes at each branch and the cognate group structure that specifies which words share common ancestors. All three of these variables interact and inform each other and so historical linguists often consider them jointly. However linguists are currently required to make qualitative judgments regarding the relative likelihood of certain sound changes cognate groups and so on. Several recent statistical methods have been introduced to provide increased quantitative backing to the comparative method Oakes 2000 Bouchard-Cote et al. 2007 Bouchard-Cote et al. 2009 others .
đang nạp các trang xem trước