tailieunhanh - Báo cáo khoa học: "Beyond Lexical Units: Enriching Wordnets with Phrasets"
In this paper we present a proposal to extend WordNet-like lexical databases by adding phrasets, . sets of free combinations of words which are recurrently used to express a concept (let's call them recurrent free phrases). Phrasets are a useful source of information for different NLP tasks, and particularly in a multilingual environment to manage lexical gaps. Two experiments are presented to check the possibility of acquiring recurrent free phrases from dictionaries and corpora. | Beyond Lexical Units Enriching Wordnets with Phrasets Luisa Bentivogli Emanuele Pianta ITC-irst Trento Italy bentiVO pianta @ Abstract In this paper we present a proposal to extend WordNet-like lexical databases by adding phrasets . sets of free combinations of words which are recurrently used to express a concept let s call them recurrent free phrases . Phrasets are a useful source of information for different NLP tasks and particularly in a multilingual environment to manage lexical gaps. Two experiments are presented to check the possibility of acquiring recurrent free phrases from dictionaries and corpora. 1 Introduction WordNet Fellbaum 1998 is a popular lexical database for English in which content words are organized into sets of synonyms synsets each representing one underlying lexical concept. Words and concepts are further connected through various lexical and semantic relations. WordNet has been widely adopted in the NLP community for a variety of practical tasks such as word sense disambiguation question answering information retrieval summarization etc. The English WordNet database is being used as a basis for the development of different multilingual databases such as EuroWordNet MultiWordNet and the recent BalkaNet project. To make it more useful in NLP applications WordNet is constantly updated and extended with different kinds of information such as domain information syntactic information topic signatures syntactic parsing and PoS tagging of the glosses etc. In this paper we propose to extend the Word-Net model by adding a new data structure called phraset. A phraset is a set of free combinations of words as opposed to lexical units which are recurrently used to express a concept. Phrasets can provide useful information for different kind of NLP tasks both in a monolingual and multilingual environment. For instance phrasets can be useful for knowledge-based word alignment of parallel corpora to find correspondences when one language has a
đang nạp các trang xem trước