tailieunhanh - Báo cáo khoa học: "Complementing Word Net with Roget's and Corpus-based Thesauri for Information Retrieval"

This paper proposes a method to overcome the drawbacks of WordNet when applied to information retrieval by complementing it with Roget's thesaurus and corpus-derived thesauri. Words and relations which are not included in WordNet can be found in the corpus-derived thesauri. Effects of polysemy can be minimized with weighting m e t h o d considering all query terms and all of the thesauri. Experimental results show that our method enhances information retrieval performance significantly. expansion (Voorhees, 1994; Smeaton and Berrut, 1995), computing lexical cohesion (Stairmand, 1997), word sense disambiguation (Voorhees, 1993), and so on, but the results have. | Proceedings of EACL 99 Complementing WordNet with Roget s and Corpus-based Thesauri for Information Retrieval Rila Mandala Takenobu Tokunaga and Hozumi Tanaka Department of Computer Science Tokyo Institute of Technology 2-12-1 Oookayama Meguro-Ku Tokyo 152-8522 Japan rila take tanaka @ Abstract This paper proposes a method to overcome the drawbacks of WordNet when applied to information retrieval by complementing it with Roget s thesaurus and corpus-derived thesauri. Words and relations which are not included in WordNet can be found in the corpus-derived thesauri. Effects of polysemy can be minimized with weighting method considering all query terms and all of the thesauri. Experimental results show that our method enhances information retrieval performance significantly. 1 Introduction Information retrieval IR systems can be viewed basically as a form of comparison between documents and queries. In traditional IR methods this comparison is done based on the use of common index terms in the document and the query Salton and McGill 1983 . The drawback of such methods is that if semantically relevant documents do not contain the same terms as the query then they will be judged irrelevant by the IR system. This occurs because the vocabulary that the user uses is often not the same as the one used in documents Blair and Maron 1985 . To avoid the above problem several researchers have suggested the addition of terms which have similar or related meaning to the query increasing the chances of matching words in relevant documents. This method is called query expansion. A thesaurus contains information pertaining to paradigmatic semantic relations such as term synonymy hypernymy and hyponymy Aitchison and Gilchrist 1987 . It is thus natural to use a thesaurus as a source for query expansion. Many researchers have used WordNet Miller 1990 in information retrieval as a tool for query expansion Voorhees 1994 Smeaton and Berrut 1995 computing lexical cohesion .

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.