tailieunhanh - Báo cáo khoa học: "A Framework for Entailed Relation Recognition"

We define the problem of recognizing entailed relations – given an open set of relations, find all occurrences of the relations of interest in a given document set – and pose it as a challenge to scalable information extraction and retrieval. Existing approaches to relation recognition do not address well problems with an open set of relations and a need for high recall: supervised methods are not easily scaled, while unsupervised and semi-supervised methods address a limited aspect of the problem, as they are restricted to frequent, explicit, highly localized patterns. We argue that textual entailment (TE) is necessary. | A Framework for Entailed Relation Recognition Dan Roth Mark Sammons Vydiswaran University of Illinois at Urbana-Champaign danr mssammon vgvinodv @ Abstract We define the problem of recognizing entailed relations - given an open set of relations find all occurrences of the relations of interest in a given document set - and pose it as a challenge to scalable information extraction and retrieval. Existing approaches to relation recognition do not address well problems with an open set of relations and a need for high recall supervised methods are not easily scaled while unsupervised and semi-supervised methods address a limited aspect of the problem as they are restricted to frequent explicit highly localized patterns. We argue that textual entailment TE is necessary to solve such problems propose a scalable TE architecture and provide preliminary results on an Entailed Relation Recognition task. 1 Introduction In many information foraging tasks there is a need to find all text snippets relevant to a target concept. Patent search services spend significant resources looking for prior art relevant to a specified patent claim. Before subpoenaed documents are used in a court case or intelligence data is declassified all sensitive sections need to be redacted. While there may be a specific domain for a given application the set of target concepts is broad and may change over time. For these knowledge-intensive tasks we contend that feasible automated solutions require techniques which approximate an appropriate level of natural language understanding. Such problems can be formulated as a relation recognition task where the information need is expressed as tuples of arguments and relations. This structure provides additional information which can be exploited to precisely fulfill the information need. Our work introduces the Entailed Relation Recognition paradigm which leverages a textual entailment system to try to extract all relevant passages for