tailieunhanh - Báo cáo khoa học: "Cohesive Phrase-based Decoding for Statistical Machine Translation"
∗ Microsoft Research One Microsoft Way Redmond, WA, 98052 colinc@ Abstract Phrase-based decoding produces state-of-theart translations with no regard for syntax. We add syntax to this process with a cohesion constraint based on a dependency tree for the source sentence. The constraint allows the decoder to employ arbitrary, non-syntactic phrases, but ensures that those phrases are translated in an order that respects the source tree’s structure. In this way, we target the phrasal decoder’s weakness in order modeling, without affecting its strengths. To further increase flexibility, we incorporate cohesion as a decoder feature, creating a soft constraint. The resulting cohesive, phrase-based decoder. | Cohesive Phrase-based Decoding for Statistical Machine Translation Colin Cherry Microsoft Research One Microsoft Way Redmond WA 98052 colinc@ Abstract Phrase-based decoding produces state-of-the-art translations with no regard for syntax. We add syntax to this process with a cohesion constraint based on a dependency tree for the source sentence. The constraint allows the decoder to employ arbitrary non-syntactic phrases but ensures that those phrases are translated in an order that respects the source tree s structure. In this way we target the phrasal decoder s weakness in order modeling without affecting its strengths. To further increase flexibility we incorporate cohesion as a decoder feature creating a soft constraint. The resulting cohesive phrase-based decoder is shown to produce translations that are preferred over non-cohesive output in both automatic and human evaluations. 1 Introduction Statistical machine translation SMT is complicated by the fact that words can move during translation. If one assumes arbitrary movement is possible that alone is sufficient to show the problem to be NP-complete Knight 1999 . Syntactic cohesion 1 is the notion that all movement occurring during translation can be explained by permuting children in a parse tree Fox 2002 . Equivalently one can say that phrases in the source defined by subtrees in its parse remain contiguous after translation. Early Work conducted while at the University of Alberta. 1We use the term syntactic cohesion throughout this paper to mean what has previously been referred to as phrasal cohesion because the non-linguistic sense of phrase has become so common in machine translation literature. methods for syntactic SMT held to this assumption in its entirety Wu 1997 Yamada and Knight 2001 . These approaches were eventually superseded by tree transducers and tree substitution grammars which allow translation events to span subtree units providing several advantages including the ability to
đang nạp các trang xem trước