tailieunhanh - Báo cáo khoa học: "Pivot Language Approach for Phrase-Based Statistical Machine Translation"

This paper proposes a novel method for phrase-based statistical machine translation by using pivot language. To conduct translation between languages Lf and Le with a small bilingual corpus, we bring in a third language Lp, which is named the pivot language. For Lf-Lp and Lp-Le, there exist large bilingual corpora. Using only Lf-Lp and Lp-Le bilingual corpora, we can build a translation model for Lf-Le. The advantage of this method lies in that we can perform translation between Lf and Le even if there is no bilingual corpus available for this language pair. . | Pivot Language Approach for Phrase-Based Statistical Machine Translation Hua Wu and Haifeng Wang Toshiba China Research and Development Center 5 F. Tower W2 Oriental Plaza East Chang An Ave. Dong Cheng District Beijing 100738 China wuhua wanghaifeng @ Abstract This paper proposes a novel method for phrase-based statistical machine translation by using pivot language. To conduct translation between languages Lf and Le with a small bilingual corpus we bring in a third language Lp which is named the pivot language. For Lf-Lp and Lp-Le there exist large bilingual corpora. Using only Lf-Lp and Lp-Le bilingual corpora we can build a translation model for Lf-Le. The advantage of this method lies in that we can perform translation between Lf and Le even if there is no bilingual corpus available for this language pair. Using BLEU as a metric our pivot language method achieves an absolute improvement of relative as compared with the model directly trained with 5 000 Lf-Le sentence pairs for French-Spanish translation. Moreover with a small Lf-Le bilingual corpus available our method can further improve the translation quality by using the additional Lf-Lp and Lp-Le bilingual corpora. 1 Introduction For statistical machine translation SMT phrasebased methods Koehn et al. 2003 Och and Ney 2004 and syntax-based methods Wu 1997 Al-shawi et al. 2000 Yamada and Knignt 2001 Melamed 2004 Chiang 2005 Quick et al. 2005 Mellebeek et al. 2006 outperform word-based methods Brown et al. 1993 . These methods need large bilingual corpora. However for some lan guages pairs only a small bilingual corpus is available which will degrade the performance of statistical translation systems. To solve this problem this paper proposes a novel method for phrase-based SMT by using a pivot language. To perform translation between languages Lf and Le we bring in a pivot language Lp for which there exist large bilingual corpora for language pairs Lf-Lp and Lp-Le. With the