tailieunhanh - The stages of event extraction

While previouswork focuses on traf c generated by bot- nets, our work is the rst to study botnet traces based on economicmotivation and monetizing activities. Along this direction, we expect a new category of traces can be used to characterize botnets from different perspectives (see Sec- ion 6). Our work takes activities from individual bots and aggregates them into botnets. The aggregation techniques proposed in this paper may generally bene t analysis of other traces in this category. Several previous studies [2, 16] use spam email mes- sages collected at a single or small number of points to gain nsight into different aspects of the Internet. SpamScat- er [2] clusters spamemail based on the. | The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@ Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks of varying difficulty. In this paper we present a simple modular approach to event extraction that allows us to experiment with a variety of machine learning methods for these sub-tasks as well as to evaluate the impact on performance these sub-tasks have on the overall task. 1 Introduction Events are undeniably temporal entities but they also possess a rich non-temporal structure that is important for intelligent information access systems information retrieval question answering summarization etc. . Without information about what happened where and to whom temporal information about an event may not be very useful. In the available annotated corpora geared toward information extraction we see two models of events emphasizing these different aspects. On the one hand there is the TimeML model in which an event is a word that points to a node in a network of temporal relations. On the other hand there is the ACE model in which an event is a complex structure relating arguments that are themselves complex structures but with only ancillary temporal information in the form of temporal arguments which are only noted when explicitly given . In the TimeML model every event is annotated because every event takes part in the temporal network. In the ACE model only interesting events events that fall into one of 34 predefined categories are annotated. The task of automatically extracting ACE events is more complex than extracting TimeML events in line with the increased complexity of ACE events involving detection of event anchors assignment of an array of attributes identification of arguments and assignment of roles and determination of event coreference. In this paper we present a modular system for ACE event detection and recognition. Our focus is on the .