tailieunhanh - Báo cáo khoa học: "Orthogonal Negation in Vector Spaces for Modelling Word-Meanings and Document Retrieval"

Standard IR systems can process queries such as “web NOT internet”, enabling users who are interested in arachnids to avoid documents about computing. The documents retrieved for such a query should be irrelevant to the negated query term. Most systems implement this by reprocessing results after retrieval to remove documents containing the unwanted string of letters. This paper describes and evaluates a theoretically motivated method for removing unwanted meanings directly from the original query in vector models, with the same vector negation operator as used in quantum logic. . | Orthogonal Negation in Vector Spaces for Modelling Word-Meanings and Document Retrieval Dominic Widdows Stanford University dwiddows@ Abstract Standard IR systems can process queries such as web NOT internet enabling users who are interested in arachnids to avoid documents about computing. The documents retrieved for such a query should be irrelevant to the negated query term. Most systems implement this by reprocessing results after retrieval to remove documents containing the unwanted string of letters. This paper describes and evaluates a theoretically motivated method for removing unwanted meanings directly from the original query in vector models with the same vector negation operator as used in quantum logic. Irrelevance in vector spaces is modelled using orthogonality so query vectors are made orthogonal to the negated term or terms. As well as removing unwanted terms this form of vector negation reduces the occurrence of synonyms and neighbours of the negated terms by as much as 76 compared with standard Boolean methods. By altering the query vector itself vector negation removes not only unwanted strings but unwanted meanings. 1 Introduction Vector spaces enjoy widespread use in information retrieval Salton and McGill 1983 Baeza-Yates and This research was supported in part by the Research Collaboration between the NTT Communication Science Laboratories Nippon Telegraph and Telephone Corporation and CSLI Stanford University and by EC NSF grant IST-1999-11438 for the MUCHMORE project. Ribiero-Neto 1999 and from this original application vector models have been applied to semantic tasks such as word-sense acquisition Landauer and Dumais 1997 Widdows 2003 and disambiguation Schiitze 1998 . One benefit of these models is that the similarity between pairs of terms or between queries and documents is a continuous function automatically ranking results rather than giving just a YES NO judgment. In addition vector models can be freely built from .

TÀI LIỆU LIÊN QUAN