Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Automatic Identification of Word Translations from Unrelated English and German Corpora"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Algorithms for the alignment of words in translated texts are well established. However, only recently new approaches have been proposed to identify word translations from non-parallel or even unrelated texts. This task is more difficult, because most statistical clues useful in the processing of parallel texts cannot be applied to non-parallel texts. Whereas for parallel texts in some studies up to 99% of the word alignments have been shown to be correct, the accuracy for non-parallel texts has been around 30% up to now. . | Automatic Identification of Word Translations from Unrelated English and German Corpora Reinhard Rapp University of Mainz FASK D-76711 Germersheim Germany rapp @usun2.fask.uni-mainz.de Abstract Algorithms for the alignment of words in translated texts are well established. However only recently new approaches have been proposed to identify word translations from non-parallel or even unrelated texts. This task is more difficult because most statistical clues useful in the processing of parallel texts cannot be applied to non-par-allel texts. Whereas for parallel texts in some studies up to 99 of the word alignments have been shown to be correct the accuracy for non-parallel texts has been around 30 up to now. The current study which is based on the assumption that there is a correlation between the patterns of word co-occurrences in corpora of different languages makes a significant improvement to about 72 of word translations identified correctly. 1 Introduction Starting with the well-known paper of Brown et al. 1990 on statistical machine translation there has been much scientific interest in the alignment of sentences and words in translated texts. Many studies show that for nicely parallel corpora high accuracy rates of up to 99 can be achieved for both sentence and word alignment Gale Church 1993 Kay Roscheisen 1993 . Of course in practice - due to omissions transpositions insertions and replacements in the process of translation - with real texts there may be all kinds of problems and therefore robustness is still an issue Langlais et al. 1998 . Nevertheless the results achieved with these algorithms have been found useful for the com pilation of dictionaries for checking the consistency of terminological usage in translations for assisting the terminological work of translators and interpreters and for example-based machine translation. By now some alignment programs are offered commercially Translation memory tools for translators such as IBM s Translation .