tailieunhanh - Báo cáo khoa học: "Deriving Generalized Knowledge from Corpora using WordNet Abstraction"
Existing work in the extraction of commonsense knowledge from text has been primarily restricted to factoids that serve as statements about what may possibly obtain in the world. We present an approach to deriving stronger, more general claims by abstracting over large sets of factoids. Our goal is to coalesce the observed nominals for a given predicate argument into a few predominant types, obtained as WordNet synsets. The results can be construed as generically quantified sentences restricting the semantic type of an argument position of a predicate | Deriving Generalized Knowledge from Corpora using WordNet Abstraction Benjamin Van Durme Phillip Michalak and Lenhart K. Schubert Department of Computer Science University of Rochester Rochester Ny 14627 USA Abstract Existing work in the extraction of commonsense knowledge from text has been primarily restricted to factoids that serve as statements about what may possibly obtain in the world. We present an approach to deriving stronger more general claims by abstracting over large sets of factoids. Our goal is to coalesce the observed nominals for a given predicate argument into a few predominant types obtained as WordNet synsets. The results can be construed as generically quantified sentences restricting the semantic type of an argument position of a predicate. 1 Introduction Our interest is ultimately in building systems with commonsense reasoning and language understanding abilities. As is widely appreciated such systems will require large amounts of general world knowledge. Large text corpora are an attractive potential source of such knowledge. However current natural language understanding NLU methods are not general and reliable enough to enable broad assimilation in a formalized representation of explicitly stated knowledge in encyclopedias or similar sources. As well such sources typically do not cover the most obvious facts of the world such as that ice cream may be delicious and may be coated with chocolate or that children may play in parks. Methods currently exist for extracting simple factoids like those about ice cream and children just mentioned see in particular Schubert 2002 Schubert and Tong 2003 but these are quite weak as general claims and - being unconditional - are unsuitable for inference chaining. Consider however the fact that when something is said it is generally said by a person organization or text source this a conditional statement dealing with the potential agents of saying and could enable useful inferences. For example in the .
đang nạp các trang xem trước