tailieunhanh - Báo cáo khoa học: "Syntactic Phrase Reordering for English-to-Arabic Statistical Machine Tranfor slation"
Syntactic Reordering of the source language to better match the phrase structure of the target language has been shown to improve the performance of phrase-based Statistical Machine Translation. This paper applies syntactic reordering to English-to-Arabic translation. It introduces reordering rules, and motivates them linguistically. It also studies the effect of combining reordering with Arabic morphological segmentation, a preprocessing technique that has been shown to improve Arabic-English and EnglishArabic translation. We report on results in the news text domain, the UN text domain and in the spoken travel domain. . | Syntactic Phrase Reordering for English-to-Arabic Statistical Machine Translation Ibrahim Badr Rabih Zbib James Glass Computer Science and Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge MA 02139 USA iab02 rabih glass @ Abstract Syntactic Reordering of the source language to better match the phrase structure of the target language has been shown to improve the performance of phrase-based Statistical Machine Translation. This paper applies syntactic reordering to English-to-Arabic translation. It introduces reordering rules and motivates them linguistically. It also studies the effect of combining reordering with Arabic morphological segmentation a preprocessing technique that has been shown to improve Arabic-English and English-Arabic translation. We report on results in the news text domain the UN text domain and in the spoken travel domain. 1 Introduction Phrase-based Statistical Machine Translation has proven to be a robust and effective approach to machine translation providing good performance without the need for explicit linguistic information. Phrase-based SMT systems however have limited capabilities in dealing with long distance phenomena since they rely on local alignments. Automatically learned reordering models which can be conditioned on lexical items from both the source and the target provide some limited reordering capability when added to SMT systems. One approach that explicitly deals with long distance reordering is to reorder the source side to better match the target side using predefined rules. The reordered source is then used as input to the phrase-based SMT system. This approach indirectly incorporates structure information since the reordering rules are applied on the parse trees of the source sentence. Obviously the same reordering has to be applied to both training data and test data. Despite the added complexity of parsing the data this technique has shown improvements especially when good .
đang nạp các trang xem trước