tailieunhanh - Báo cáo khoa học: "INFORMATION RETRIEVAL USING ROBUST NATURAL LANGUAGE PROCESSING"
We developed a prototype information retrieval system which uses advanced natural language processing techniques to enhance the effectiveness of traditional key-word based document retrieval. The backbone of our system is a statistical retrieval engine which performs automated indexing of documents, then search and ranking in response to user queries. This core architecture is augmented with advanced natural language processing tools which are both robust and efficient. | INFORMATION RETRIEVAL USING ROBUST NATURAL LANGUAGE PROCESSING Tomek Sfrzalkowski and Barbara Vautheyt Cour ant Institute of Mathematical Sciences New York University 715 Broadway rm. 704 New York NY 10003 tomek@ ABSTRACT We developed a prototype information retrieval system which uses advanced natural language processing techniques to enhance the effectiveness of traditional key-word based document retrieval. The backbone of our system is a statistical retrieval engine which performs automated indexing of documents then search and ranking in response to user queries. This core architecture is augmented with advanced natural language processing tools which are both robust and efficient. In early experiments the augmented system has displayed capabilities that appear to make it superior to the purely statistical base. INTRODUCTION A typical information retrieval IR task is to select documents from a database in response to a user s query and rank these documents according to relevance. This has been usually accomplished using statistical methods often coupled with manual encoding but it is now widely believed that these traditional methods have reached thefr limits. 1 2 These limits are particularly acute for text databases where natural language processing NLP has long been considered necessary for further progress. Unfortunately the difficulties encountered in applying computational linguistics technologies to text processing have contributed to a wide-spread belief that automated NLP may not be suitable in IR. These difficulties included inefficiency limited coverage and prohibitive cost of manual effort requữeđ to build lexicons and knowledge bases for each new text domain. On the other hand while numerous t Current address Laboratoire d lnformatique Universite de Fribourg ch. du Musee 3 1700 Fribourg Switzerland vauthey@cfnmi51 .bitnet. 1 As far as the automatic document retrieval is concerned. Techniques involving various forms of relevance feedback
đang nạp các trang xem trước