tailieunhanh - Báo cáo khoa học: "Refining Event Extraction through Cross-document Inference"

We apply the hypothesis of “One Sense Per Discourse” (Yarowsky, 1995) to information extraction (IE), and extend the scope of “discourse” from one single document to a cluster of topically-related documents. We employ a similar approach to propagate consistent event arguments across sentences and documents. Combining global evidence from related documents with local decisions, we design a simple scheme to conduct cross-document inference for improving the ACE event extraction task 1 . . | Refining Event Extraction through Cross-document Inference Heng Ji Ralph Grishman Computer Science Department New York University New York NY 10003 USA hengji grishman @ Abstract We apply the hypothesis of One Sense Per Discourse Yarowsky 1995 to information extraction IE and extend the scope of discourse from one single document to a cluster of topically-related documents. We employ a similar approach to propagate consistent event arguments across sentences and documents. Combining global evidence from related documents with local decisions we design a simple scheme to conduct cross-document inference for improving the ACE event extraction task 1 . Without using any additional labeled data this new approach obtained higher F-Measure in trigger labeling and 6 higher F-Measure in argument labeling over a state-of-the-art IE system which extracts events independently for each sentence. 1 Introduction Identifying events of a particular type within individual documents - classical information extraction - remains a difficult task. Recognizing the different forms in which an event may be expressed distinguishing events of different types and finding the arguments of an event are all challenging tasks. Fortunately many of these events will be reported multiple times in different forms both within the same document and within topically-related documents . a collection of documents sharing participants in potential events . We can 1 http speech tests ace take advantage of these alternate descriptions to improve event extraction in the original document by favoring consistency of interpretation across sentences and documents. Several recent studies involving specific event types have stressed the benefits of going beyond traditional singledocument extraction in particular Yangarber 2006 has emphasized this potential in his work on medical information extraction. In this paper we demonstrate that appreciable improvements are possible over the .