tailieunhanh - Báo cáo khoa học: "Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity"

There have been many proposals to extract semantically related words using measures of distributional similarity, but these typically are not able to distinguish between synonyms and other types of semantically related words such as antonyms, (co)hyponyms and hypernyms. We present a method based on automatic word alignment of parallel corpora consisting of documents translated into multiple languages and compare our method with a monolingual syntax-based method. | Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity Lonneke van der Plas Jorg Tiedemann Alfa-Informatica University of Groningen . Box 716 9700 AS Groningen The Netherlands vdplas tiedeman @ Abstract There have been many proposals to extract semantically related words using measures of distributional similarity but these typically are not able to distinguish between synonyms and other types of semantically related words such as antonyms co hyponyms and hypernyms. We present a method based on automatic word alignment of parallel corpora consisting of documents translated into multiple languages and compare our method with a monolingual syntax-based method. The approach that uses aligned multilingual data to extract synonyms shows much higher precision and recall scores for the task of synonym extraction than the monolingual syntax-based approach. 1 Introduction People use multiple ways to express the same idea. These alternative ways of conveying the same information in different ways are referred to by the term paraphrase and in the case of single words sharing the same meaning we speak of synonyms. Identihcation of synonyms is critical for many NLP tasks. In information retrieval the information that people ask for with a set of words may be found in in a text snippet that comprises a completely different set of words. In this paper we report on our hndings trying to automatically acquire synonyms for Dutch using two different resources a large monolingual corpus and a multilingual parallel corpus including 11 languages. A common approach to the automatic extraction of semantically related words is to use distributional similarity. The basic idea behind this is that similar words share similar contexts. Systems based on distributional similarity provide ranked lists of semantically related words according to the similarity of their contexts. Synonyms are expected to be among the highest ranks followed by co .

crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.