tailieunhanh - Báo cáo khoa học: "On-Demand Information Extraction"
At present, adapting an Information Extraction system to new topics is an expensive and slow process, requiring some knowledge engineering for each new topic. We propose a new paradigm of Information Extraction which operates 'on demand' in response to a user's query. On-demand Information Extraction (ODIE) aims to completely eliminate the customization effort. Given a user’s query, the system will automatically create patterns to extract salient relations in the text of the topic, and build tables from the extracted information using paraphrase discovery technology. . | On-Demand Information Extraction Satoshi Sekine Computer Science Department New York University 715 Broadway 7th floor New York NY 10003 USA sekine@ Abstract At present adapting an Information Extraction system to new topics is an expensive and slow process requiring some knowledge engineering for each new topic. We propose a new paradigm of Information Extraction which operates on demand in response to a user s query. On-demand Information Extraction ODIE aims to completely eliminate the customization effort. Given a user s query the system will automatically create patterns to extract salient relations in the text of the topic and build tables from the extracted information using paraphrase discovery technology. It relies on recent advances in pattern discovery paraphrase discovery and extended named entity tagging. We report on experimental results in which the system created useful tables for many topics demonstrating the feasibility of this approach. 1 Introduction Most of the world s information is recorded passed down and transmitted between people in text form. Implicit in most types of text are regularities of information structure - events which are reported many times about different individuals in different forms such as layoffs or mergers and acquisitions in news articles. The goal of information extraction IE is to extract such information to make these regular structures explicit in forms such as tabular databases. Once the information structures are explicit they can be processed in many ways to mine information to search for specific information to generate graphical displays and other summaries. However at present a great deal of knowledge for automatic Information Extraction must be coded by hand to move a system to a new topic. For example at the later MUC evaluations system developers spent one month for the knowledge engineering to customize the system to the given test topic. Research over the last decade has shown how some of this
đang nạp các trang xem trước