tailieunhanh - Báo cáo khoa học: "ON THE REPRESENTATION OF QUERY TERM RELATIONS BY SOFT BOOLEAN oPERATORS"

preconstructed dictionaries or thesauruses. Even in this relatively simplified environment one does not normally undertake a linguistic analysis of any scope. In fact, syntactic and semantic analysis have been used in b i b l i o g r a p h i c information retrieval only under special circumstances to analyze query phrases [22], to process structured text samples of a certain kind, [7,15], or finally to process texts in severely restricted topic areas. [2] Where s p e c i a l conditions do n o t o b t a i n , the preferred approach in. | ON THE REPRESENTATION OF QUERY TERM RELATIONS BY SOFT BOOLEAN OPERATORS Gerard Salton Department of Computer Science Cornell University Ithaca NY 14853 USA ABSTRACT The language analysis component in most text retrieval systems is confined to a recognition of noun phrases of the type normally included in back-of-the-book indexes and an identification of related terms included in a preconstructed thesaurus of quasi-synonyms. Even such a restricted language analysis is fraught with difficulties because of the well-known problems in the analysis of compound nominals and the hazards and cost of constructing word synonym classes valid for large text samples. In this study an extended soft Boolean logic is used for the formulation of information retrieval queries which is capable of representing both the use of compound noun phrases as well as the inclusion of synonym constructions in the query statements. The operations of the extended Boolean logic are described and evaluation output is included to demonstrate the effectiveness of the extended logic compared with that of ordinary text retrieval systems. 1. Linguistic Approaches in Information Retrieval It is possible to classify the various automatic text processing systems by the depth and type of linguistic analysis needed for their operations. Sophisticated language understanding components are believed to be essential to carry out automatic text transformations such as text abstracting and text translation. 1 14 24 Complete language understanding systems are also needed in automatic question-answering where direct responses to user queries are automatically generated by the system. 11 On the other hand relatively less sophisticated language analysis systems may be adequate for bibliographic information retrieval where references as opposed to direct answers are retrieved in response to user queries. 21 In bibliographic retrieval the content of individual documents is normally represented by sets of key words or key

TỪ KHÓA LIÊN QUAN