tailieunhanh - Báo cáo khoa học: "Handling phrase reorderings for machine translation"
We propose a distance phrase reordering model (DPR) for statistical machine translation (SMT), where the aim is to capture phrase reorderings using a structure learning framework. On both the reordering classification and a Chinese-to-English translation task, we show improved performance over a baseline SMT system. model have been reported in (Koehn et al., 2005). However, the amount of the training data for each bilingual phrase is so small that the model usually suffers from the data sparseness problem. . | Handling phrase reorderings for machine translation Yizhao Ni Craig J. Saunders Sandor Szedmak and Mahesan Niranjan ISIS Group School of Electronics and Computer Science University of Southampton Southampton SO17 1BJ United Kingdom yn05r@ ss03v mn @ Abstract We propose a distance phrase reordering model DPR for statistical machine translation SMT where the aim is to capture phrase reorderings using a structure learning framework. On both the reordering classification and a Chinese-to-English translation task we show improved performance over a baseline SMT system. 1 Introduction Word or phrase reordering is a common problem in bilingual translations arising from different grammatical structures. For example in Chinese the expression of the date follows Year Month Date while when translated into English Month Date Year is often the correct grammar. In general the fluency of machine translations can be greatly improved by obtaining the correct word order in the target language. As the reordering problem is computationally expensive a word distance-based reordering model is commonly used among SMT decoders Koehn 2004 in which the costs of phrase movements are linearly proportional to the reordering distance. Although this model is simple and efficient the content independence makes it difficult to capture many distant phrase reordering caused by the grammar. To tackle the problem Koehn et al. 2005 developed a lexicalized reordering model that attempted to learn the phrase reordering based on content. The model learns the local orientation . monotone order or switching order probabilities for each bilingual phrase pair using Maximum Likelihood Estimation MLE . These orientation probabilities are then integrated into an SMT decoder to help finding a Viterbi-best local orientation sequence. Improvements by this the author s new address Xerox Research Centre Europe 6 Chemin de Maupertuis 38240 Meylan France. .
đang nạp các trang xem trước