tailieunhanh - Báo cáo khoa học: "Gappy Phrasal Alignment by Agreement"
We propose a principled and efficient phraseto-phrase alignment model, useful in machine translation as well as other related natural language processing problems. In a hidden semiMarkov model, word-to-phrase and phraseto-word translations are modeled directly by the system. Agreement between two directional models encourages the selection of parsimonious phrasal alignments, avoiding the overfitting commonly encountered in unsupervised training with multi-word units. | Gappy Phrasal Alignment by Agreement Mohit Bansal UC Berkeley CS Division mbansal@ Chris Quirk Microsoft Research chrisq@ Robert C. Moore Google Research Abstract We propose a principled and efficient phrase-to-phrase alignment model useful in machine translation as well as other related natural language processing problems. In a hidden semiMarkov model word-to-phrase and phrase-to-word translations are modeled directly by the system. Agreement between two directional models encourages the selection of parsimonious phrasal alignments avoiding the overfitting commonly encountered in unsupervised training with multi-word units. Expanding the state space to include gappy phrases such as French ne pas makes the alignment space more symmetric thus it allows agreement between discontinuous alignments. The resulting system shows substantial improvements in both alignment quality and translation quality over word-based Hidden Markov Models while maintaining asymptotically equivalent runtime. 1 Introduction Word alignment is an important part of statistical machine translation MT pipelines. Phrase tables containing pairs of source and target language phrases are extracted from word alignments forming the core of phrase-based statistical machine translation systems Koehn et al. 2003 . Most syntactic machine translation systems extract synchronous context-free grammars SCFGs from aligned syntactic fragments Galley et al. 2004 Zollmann et al. 2006 which in turn are derived from bilingual word alignments and syntactic Author was a summer intern at Microsoft Research during this project. French ne voudrais pas voyager par chemin de fer English would not like traveling by railroad Figure 1 French-English pair with complex word alignment. parses. Alignment is also used in various other NLP problems such as entailment paraphrasing question answering summarization and spelling correction. A limitation to word-based alignment
đang nạp các trang xem trước