tailieunhanh - Báo cáo khoa học: "Bilingual Lexicon Generation Using Non-Aligned Signatures"

Bilingual lexicons are fundamental resources. Modern automated lexicon generation methods usually require parallel corpora, which are not available for most language pairs. Lexicons can be generated using non-parallel corpora or a pivot language, but such lexicons are noisy. We present an algorithm for generating a high quality lexicon from a noisy one, which only requires an independent corpus for each language. Our algorithm introduces non-aligned signatures (NAS), a cross-lingual word context similarity score that avoids the over-constrained and inefficient nature of alignment-based methods. . | Bilingual Lexicon Generation Using Non-Aligned Signatures Daphna Shezaf Institute of Computer Science Hebrew University of Jerusalem Ari Rappoport Institute of Computer Science Hebrew University of Jerusalem arir@ Abstract Bilingual lexicons are fundamental resources. Modern automated lexicon generation methods usually require parallel corpora which are not available for most language pairs. Lexicons can be generated using non-parallel corpora or a pivot language but such lexicons are noisy. We present an algorithm for generating a high quality lexicon from a noisy one which only requires an independent corpus for each language. Our algorithm introduces non-aligned signatures NAS a cross-lingual word context similarity score that avoids the over-constrained and inefficient nature of alignment-based methods. We use NAS to eliminate incorrect translations from the generated lexicon. We evaluate our method by improving the quality of noisy Spanish-Hebrew lexicons generated from two pivot English lexicons. Our algorithm substantially outperforms other lexicon generation methods. 1 Introduction Bilingual lexicons are useful for both end users and computerized language processing tasks. They provide for each source language word or phrase a set of translations in the target language and thus they are a basic component of dictionaries which also include syntactic information sense division usage examples semantic fields usage guidelines etc. Traditionally when bilingual lexicons are not compiled manually they are extracted from parallel corpora. However for most language pairs parallel bilingual corpora either do not exist or are at best small and unrepresentative of the general language. Bilingual lexicons can be generated using nonparallel corpora or pivot language lexicons see Section 2 . However such lexicons are noisy. In this paper we present a method for generating a high quality lexicon given such a noisy one. Our .

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.