tailieunhanh - Báo cáo khoa học: "Phylogenetic Grammar Induction"

We present an approach to multilingual grammar induction that exploits a phylogeny-structured model of parameter drift. Our method does not require any translated texts or token-level alignments. Instead, the phylogenetic prior couples languages at a parameter level. Joint induction in the multilingual model substantially outperforms independent learning, with larger gains both from more articulated phylogenies and as well as from increasing numbers of languages. | Phylogenetic Grammar Induction Taylor Berg-Kirkpatrick and Dan Klein Computer Science Division University of California Berkeley tberg klein @ Abstract We present an approach to multilingual grammar induction that exploits a phylogeny-structured model of parameter drift. Our method does not require any translated texts or token-level alignments. Instead the phylogenetic prior couples languages at a parameter level. Joint induction in the multilingual model substantially outperforms independent learning with larger gains both from more articulated phylogenies and as well as from increasing numbers of languages. Across eight languages the multilingual approach gives error reductions over the standard monolingual DMV averaging and reaching as high as 39 . 1 Introduction Learning multiple languages together should be easier than learning them separately. For example in the domain of syntactic parsing a range of recent work has exploited the mutual constraint between two languages parses of the same bitext Kuhn 2004 Burkett and Klein 2008 Kuz-man et al. 2009 Smith and Eisner 2009 Snyder et al. 2009a . Moreover Snyder et al. 2009b in the context of unsupervised part-of-speech induction and Bouchard-Cote et al. 2007 in the context of phonology show that extending beyond two languages can provide increasing benefit. However multitexts are only available for limited languages and domains. In this work we consider unsupervised grammar induction without bitexts or multitexts. Without translation examples multilingual constraints cannot be exploited at the sentence token level. Rather we capture multilingual constraints at a parameter level using a phylogeny-structured prior to tie together the various individual languages learning problems. Our joint hierarchical prior couples model parameters for different languages in a way that respects knowledge about how the languages evolved. Aspects of this work are closely related to Cohen and Smith 2009 and .

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN