tailieunhanh - Báo cáo khoa học: "Improving Mid-Range Reordering using Templates of Factors"

We extend the factored translation model (Koehn and Hoang, 2007) to allow translations of longer phrases composed of factors such as POS and morphological tags to act as templates for the selection and reordering of surface phrase translation. We also reintroduce the use of alignment information within the decoder, which forms an integral part of decoding in the Alignment Template System (Och, 2002), into phrase-based decoding. Results show an increase in translation performance of up to BLEU for out-of-domain French–English translation. We also show how this method compares and relates to lexicalized reordering. . | Improving Mid-Range Reordering using Templates of Factors Hieu Hoang School of Informatics University of Edinburgh Abstract We extend the factored translation model Koehn and Hoang 2007 to allow translations of longer phrases composed of factors such as POS and morphological tags to act as templates for the selection and reordering of surface phrase translation. We also reintroduce the use of alignment information within the decoder which forms an integral part of decoding in the Alignment Template System Och 2002 into phrase-based decoding. Results show an increase in translation performance of up to BLEU for out-of-domain French-English translation. We also show how this method compares and relates to lexicalized reordering. 1 Introduction One of the major issues in statistical machine translation is reordering due to systematic wordordering differences between languages. Often reordering is best explained by linguistic categories such as part-of-speech tags. In fact prior work has examined the use of part-of-speech tags in pre-reordering schemes Tomas and Casacuberta 2003 . Re-ordering can also be viewed as composing of a number of related problems which can be explained or solved by a variety of linguistic phenomena. Firstly differences between phrase ordering account for much of the long-range reordering. Syntax-based and hierarchical models such as Chiang 2005 attempts to address this problem. Shorter range re-ordering such as intra-phrasal word re-ordering can often be predicted from the underlying property of the words and its context the most obvious property being POS tags. In this paper we tackle the issue of shorter-range re-ordering in phrase-based decoding by presenting an extension of the factored translation which directly models the translation of nonsurface factors such as POS tags. We shall call this Philipp Koehn School of Informatics University of Edinburgh pkoehn@ extension the factored template model. We .

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN