tailieunhanh - Báo cáo khoa học: "an Unsupervised Web Relation Extraction System"
Most information extraction systems either use hand written extraction patterns or use a machine learning algorithm that is trained on a manually annotated corpus. Both of these approaches require massive human effort and hence prevent information extraction from becoming more widely applicable. In this paper we present URES (Unsupervised Relation Extraction System), which extracts relations from the Web in a totally unsupervised way. It takes as input the descriptions of the target relations, which include the names of the predicates, the types of their attributes, and several seed instances of the relations. . | URES an Unsupervised Web Relation Extraction System Benjamin Rosenfeld Computer Science Department Bar-Ilan University Ramat-Gan ISRAEL grurgrur@ Ronen Feldman Computer Science Department Bar-Ilan University Ramat-Gan ISRAEL feldman@ Abstract Most information extraction systems either use hand written extraction patterns or use a machine learning algorithm that is trained on a manually annotated corpus. Both of these approaches require massive human effort and hence prevent information extraction from becoming more widely applicable. In this paper we present URES Unsupervised Relation Extraction System which extracts relations from the Web in a totally unsupervised way. It takes as input the descriptions of the target relations which include the names of the predicates the types of their attributes and several seed instances of the relations. Then the system downloads from the Web a large collection of pages that are likely to contain instances of the target relations. From those pages utilizing the known seed instances the system learns the relation patterns which are then used for extraction. We present several experiments in which we learn patterns and extract instances of a set of several common IE relations comparing several pattern learning and filtering setups. We demonstrate that using simple noun phrase tagger is sufficient as a base for accurate patterns. However having a named entity recognizer which is able to recognize the types of the relation attributes significantly enhances the extraction performance. We also compare our approach with KnowItAll s fixed generic patterns. 1 Introduction The most common preprocessing technique for text mining is information extraction IE . It is defined as the task of extracting knowledge out of textual documents. In general IE is divided into two main types of extraction tasks - Entity tagging and Relation extraction. The main approaches used by most information extraction systems are the .
đang nạp các trang xem trước