tailieunhanh - Báo cáo khoa học: "An Error Analysis of Relation Extraction in Social Media Documents"
The annotated mentions in the Corpus are single or multi-word expressions which refer to a particular real world or abstract entity. The mentions are annotated to indicate sets of mentions which constitute co-reference groups referring to the same entity. Five relationships are annotated between these entities: PartOf, FeatureOf, Produces, InstanceOf, and MemberOf. One significant difference between these relation annotations and those in the ACE Corpus is that the former are relations between sets of mentions (the co-reference groups) rather than between individual mentions | An Error Analysis of Relation Extraction in Social Media Documents Gregory Ichneumon Brown University of Colorado at Boulder Boulder Colorado browngp@ Abstract Relation extraction in documents allows the detection of how entities being discussed in a document are related to one another . part-of . This paper presents an analysis of a relation extraction system based on prior work but applied to the . Power and Associates Sentiment Corpus to examine how the system works on documents from a range of social media. The results are examined on three different subsets of the JDPA Corpus showing that the system performs much worse on documents from certain sources. The proposed explanation is that the features used are more appropriate to text with strong editorial standards than the informal writing style of blogs. 1 Introduction To summarize accurately determine the sentiment or answer questions about a document it is often necessary to be able to determine the relationships between entities being discussed in the document such as part-of or member-of . In the simple sentiment example Example I bought a new car yesterday. I love the powerful engine. determining the sentiment the author is expressing about the car requires knowing that the engine is a part of the car so that the positive sentiment being expressed about the engine can also be attributed to the car. In this paper we examine our preliminary results from applying a relation extraction system to the 64 . Power and Associates JDPA Sentiment Corpus Kessler et al. 2010 . Our system uses lexical features from prior work to classify relations and we examine how the system works on different subsets from the JDPA Sentiment Corpus breaking the source documents down into professionally written reviews blog reviews and social networking reviews. These three document types represent quite different writing styles and we see significant difference in how the relation extraction system performs .
đang nạp các trang xem trước