tailieunhanh - Báo cáo khoa học: "Improve SMT Quality with Automatically Extracted Paraphrase Rules"
We propose a novel approach to improve SMT via paraphrase rules which are automatically extracted from the bilingual training data. Without using extra paraphrase resources, we acquire the rules by comparing the source side of the parallel corpus with the target-to-source translations of the target side. | Improve SMT Quality with Automatically Extracted Paraphrase Rules Wei He1 Hua Wu2 Haifeng Wang2 Ting Liu1 1Research Center for Social Computing and Information Retrieval Harbin Institute of Technology whe tliu @ 2Baidu wu_hua wanghaifeng @ Abstract We propose a novel approach to improve SMT via paraphrase rules which are automatically extracted from the bilingual training data. Without using extra paraphrase resources we acquire the rules by comparing the source side of the parallel corpus with the target-to-source translations of the target side. Besides the word and phrase paraphrases the acquired paraphrase rules mainly cover the structured paraphrases on the sentence level. These rules are employed to enrich the SMT inputs for translation quality improvement. The experimental results show that our proposed approach achieves significant improvements of points of BLEU in the oral domain and 1 points in the news domain. 1 Introduction The translation quality of the SMT system is highly related to the coverage of translation models. However no matter how much data is used for training it is still impossible to completely cover the unlimited input sentences. This problem is more serious for online SMT systems in real-world applications. Naturally a solution to the coverage problem is to bridge the gaps between the input sentences and the translation models either from the input side which targets on rewriting the input sentences to the MT-favored expressions or from This work was done when the first author was visiting Baidu. Correspondence author tliu@ 979 the side of translation models which tries to enrich the translation models to cover more expressions. In recent years paraphrasing has been proven useful for improving SMT quality. The proposed methods can be classified into two categories according to the paraphrase targets 1 enrich translation models to cover more bilingual expressions 2 paraphrase the input .
đang nạp các trang xem trước