tailieunhanh - Báo cáo khoa học: "Effect of Cross-Language IR in Bilingual Lexicon Acquisition from Comparable Corpora"

Within the framework of translation knowledge acquisition from WWW news sites, this paper studies issues on the effect of cross-language retrieval of relevant texts in bilingual lexicon acquisition from comparable corpora. We experimentally show that it is quite effective to reduce the candidate bilingual term pairs against which bilingual term correspondences are estimated, in terms of both computational complexity and the performance of precise estimation of bilingual term correspondences. | Effect of Cross-Language IR in Bilingual Lexicon Acquisition from Comparable Corpora Takehito utsuro Graduate School of Informatics Kyoto University Sakyo-ku Kyoto 606-8501 Japan utsuro@ Takashi Horiuchi and Kohei Hino Takeshi Hamamoto and Takeaki Nakayama Dpt. Information and Computer Sciences Toyohashi University of Technology Tenpaku-cho Toyohashi 441-8580 Japan Abstract Within the framework of translation knowledge acquisition from WWW news sites this paper studies issues on the effect of cross-language retrieval of relevant texts in bilingual lexicon acquisition from comparable corpora. We experimentally show that it is quite effective to reduce the candidate bilingual term pairs against which bilingual term correspondences are estimated in terms of both computational complexity and the performance of precise estimation of bilingual term coưespondences. 1 Introduction Translation knowledge acquisition from paral-lel comparative corpora is one of the most important research topics of corpus-based MT. This is because it is necessary for an MT system to semi- automatically increase its translation knowledge in order for it to be used in the real world situation. One limitation of the corpus-based translation knowledge acquisition approach is that the techniques of translation knowledge acquisition heavily rely on availability of parallel comparative corpora. However the sizes as well as the domain of existing parallel comparative corpora are limited while it is very expensive to manually collect parallel comparative corpora. Therefore it is quite important to overcome this resource scarcity bottleneck in corpus-based translation knowledge acquisition research. In order to solve this problem this paper focuses on bilingual news articles on WWW news sites as a source for translation knowledge acquisition. In the case of WWW news sites in Japan Figure 1 Translation Knowledge Acquisition from WWW News Sites Overview Japanese as well as English news .

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.