tailieunhanh - Báo cáo khoa học: "An Algorithm for Simultaneously Bracketing Parallel Texts by Aligning Words"

We describe a grammarless method for simultaneously bracketing both halves of a parallel text and giving word alignments, assuming only a translation lexicon for the language pair. We introduce inversion-invariant transduction grammars which serve as generative models for parallel bilingual sentences with weak order constraints. Focusing on Wansduction grammars for bracketing, we formulate a normal form, and a stochastic version amenable to a maximum-likelihoodbracketing algorithm. Several extensions and experiments are discussed. . | An Algorithm for Simultaneously Bracketing Parallel Texts by Aligning Words Dekai Wu HKUST Department of Computer Science University of Science Technology Clear Water Bay Hong Kong dekai@ Abstract We describe a grammarless method for simultaneously bracketing both halves of a parallel text and giving word alignments assuming only a translation lexicon for the language pair. We introduce inversion-invariant transduction grammars which save as generative models for parallel bilingual sentences with weak order constraints. Focusing on transduction grammars for bracketing we formulate a normal form and a stochastic version amenable to a maximum-likelihood bracketing algorithm. Several extensions and experiments are discussed. 1 Introduction Parallel corpora have been shown to provide an extremely rich source of constraints for statistical analysis . Brown etal. 1990 Gale Church 1991 Gale etal. 1992 Church 1993 Brown et al. 1993 Dagan et al. 1993 Dagan Church 1994 Fung Church 1994 Wu Xia 1994 Fung McKeown 1994 . Our thesis in this papa is that the lexical information actually gives sufficient information to extract not merely word alignments but also bracketing constraints for both parallel texts. Aside from purely linguistic interest bracket structure has been empirically shown to be highly effective at constraining subsequent training of for example stochastic context-free grammars Pereira Schabes 1992 Black et al. 1993 . Previous algorithms for automatic bracketing operate on monolingual texts and hence require more grammatical constraints for example tactics employing mutual information have been applied to tagged text Magennan Marcus 1990 . Algorithms for word alignment attempt to find the matching words between parallel Although word alignments are of little use by themselves they provide potential anchor points for other applications or for subsequent learning stages to acquire more interesting structures. Our technique views word .

TÀI LIỆU LIÊN QUAN