tailieunhanh - Báo cáo khoa học: "Unsupervised Discovery of Domain-Specific Knowledge from Text"

Learning by Reading (LbR) aims at enabling machines to acquire knowledge from and reason about textual input. This requires knowledge about the domain structure (such as entities, classes, and actions) in order to do inference. We present a method to infer this implicit knowledge from unlabeled text. Unlike previous approaches, we use automatically extracted classes with a probability distribution over entities to allow for context-sensitive labeling. | Unsupervised Discovery of Domain-Specific Knowledge from Text Dirk Hovy Chunliang Zhang Eduard Hovy Information Sciences Institute University of Southern California 4676 Admiralty Way Marina del Rey CA 90292 dirkh czheng hovy @ Anselmo Penas UNED NLP and IR Group Juan del Rosal 16 28040 Madrid Spain anselmo@ Abstract Learning by Reading LbR aims at enabling machines to acquire knowledge from and reason about textual input. This requires knowledge about the domain structure such as entities classes and actions in order to do inference. We present a method to infer this implicit knowledge from unlabeled text. Unlike previous approaches we use automatically extracted classes with a probability distribution over entities to allow for context-sensitive labeling. From a corpus of sentences we learn about 250k simple propositions about American football in the form of predicateargument structures like quarterbacks throw passes to receivers . Using several statistical measures we show that our model is able to generalize and explain the data statistically significantly better than various baseline approaches. Human subjects judged up to of the resulting propositions to be sensible. The classes and probabilistic model can be used in textual enrichment to improve the performance of LbR end-to-end systems. 1 Introduction The goal of Learning by Reading LbR is to enable a computer to learn about a new domain and then to reason about it in order to perform such tasks as question answering threat assessment and explanation Strassel et al. 2010 . This requires joint efforts from Information Extraction Knowledge Representation and logical inference. All these steps depend on the system having access to basic often unstated foundational knowledge about the domain. 1466 Most documents however do not explicitly mention this information in the text but assume basic background knowledge about the domain such as positions quarterback titles winner or actions