tailieunhanh - Báo cáo khoa học: "Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction"

Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents, but this is often not enough. In this paper, we present a novel self-training strategy, which uses Information Retrieval (IR) to collect a cluster of related documents as the resource for bootstrapping. | Can Document Selection Help Semi-supervised Learning A Case Study On Event Extraction Shasha Liao Ralph Grishman Computer Science Department New York University liaoss@ grishman@ Abstract Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents but this is often not enough. In this paper we present a novel self-training strategy which uses Information Retrieval IR to collect a cluster of related documents as the resource for bootstrapping. Also based on the particular characteristics of this corpus global inference is applied to provide more confident and informative data selection. We compare this approach to self-training on a normal newswire corpus and show that IR can provide a better corpus for bootstrapping and that global inference can further improve instance selection. We obtain gains of in trigger labeling and in role labeling through IR and an additional in trigger labeling and in role labeling by applying global inference. 1 Introduction The goal of event extraction is to identify instances of a class of events in text. In addition to identifying the event itself it also identifies all of the participants and attributes of each event these are the entities that are involved in that event. The same event might be presented in various expressions and an expression might represent different events in different contexts. Moreover for each event type the event participants and attributes may also appear in multiple forms and exemplars of the different forms may be required. Thus event extraction is a difficult task and requires substantial training data. However annotating events for training is a tedious task. Annotators need to read the whole sentence possibly several sentences to decide whether there is a specific event or not and then need to identify the event participants like Agent and Patient and attributes like place .

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG