tailieunhanh - Báo cáo khoa học: "Two Easy Improvements to Lexical Weighting"

We introduce two simple improvements to the lexical weighting features of Koehn, Och, and Marcu (2003) for machine translation: one which smooths the probability of translating word f to word e by simplifying English morphology, and one which conditions it on the kind of training data that f and e co-occurred in. | Two Easy Improvements to Lexical Weighting David Chiang and Steve DeNeefe and Michael Pust USC Information Sciences Institute 4676 Admiralty Way Suite 1001 Marina del Rey Ca 90292 chiang sdeneefe pust @ Abstract each i let We introduce two simple improvements to the lexical weighting features of Koehn Och and Marcu 2003 for machine translation one which smooths the probability of translating word f to word e by simplifying English morphology and one which conditions it on the kind of training data that f and e co-occurred in. These new variations lead to improvements of up to BLEU with an average improvement of BLEU across two language pairs two genres and two translation systems. 1 Introduction Lexical weighting features Koehn et al. 2003 estimate the probability of a phrase pair or translation rule word-by-word. In this paper we introduce two simple improvements to these features one which smooths the probability of translating word f to word e using English morphology and one which conditions it on the kind of training data that f and e co-occurred in. These new variations lead to improvements of up to BLEU with an average improvement of BLEU across two language pairs two genres and two translation systems. 2 Background Since there are slight variations in how the lexical weighting features are computed we begin by defining the baseline lexical weighting features. If f fl fn and e ei em are a training sentence pair let ai 1 i n be the possibly empty set of positions in f that ei is aligned to. First compute a word translation table from the word-aligned parallel text for each sentence pair and 455 c fj ei c fj ei I for j e ai 1 ai c NULL ei c NULL ei 1 if ai 0 2 Then t e l f c f e XeC f e 3 where f can be NULL. Second during phrase-pair extraction store with each phrase pair the alignments between the words in the phrase pair. If it is observed with more than one word alignment pattern store the most frequent pattern. Third for each phrase

TỪ KHÓA LIÊN QUAN