tailieunhanh - Báo cáo khoa học: "Learning Translations of Named-Entity Phrases from Parallel Corpora"

We develop a new approach to learning phrase translations from parallel corpora, and show that it performs with very high coverage and accuracy in choosing French translations of English named-entity phrases in a test corpus of software manuals. Analysis of a subset of our results suggests that the method should also perform well on more general phrase translation tasks. | Learning Translations of Named-Entity Phrases from Parallel Corpora Robert c. Moore Microsoft Research Redmond WA 98052 USA bobmoore@ Abstract We develop a new approach to learning phrase translations from parallel corpora and show that it performs with very high coverage and accuracy in choosing French translations of English named-entity phrases in a test corpus of software manuals. Analysis of a subset of our results suggests that the method should also perform well on more general phrase translation tasks. 1 Introduction Machine translation can benefit greatly from augmenting knowledge of word translations with knowledge of phrase translations. Multiword phrases may have nonliteral translations or one of several equally valid literal translations may be strongly preferred in practice. Automatically learning translations of single words from parallel corpora has been much studied over the past ten years or so Melamed 2000 and references but learning translations of multiword phrases has received less attention. See Section 5 for a review of prior work in this area. In this paper we develop a new approach to learning phrase translations from parallel corpora and show that it performs with very high coverage and accuracy on a named-entity phrase translation task. Moreover analysis of a subset of our evaluation results suggests that the method should also perform well on more general phrase translation tasks. In our approach we are given a sentence-aligned parallel corpus annotated with a set of phrases in one of the two languages the source language and our goal is identify the corresponding phrases in the corpus in the other language the target language ranking the translation pairs in order of confidence. Certain segments of the target language corpus may be annotated as constituting lexical compounds which may or may not include the translations of the source language phrases of interest. Otherwise there is no annotation of the target language text

TỪ KHÓA LIÊN QUAN