tailieunhanh - Báo cáo khoa học: "At-Least-N Voting Improves Recall for Extracting Relations"

Several NLP tasks are characterized by asymmetric data where one class label NONE, signifying the absence of any structure (named entity, coreference, relation, etc.) dominates all other classes. Classifiers built on such data typically have a higher precision and a lower recall and tend to overproduce the NONE class. We present a novel scheme for voting among a committee of classifiers that can significantly boost the recall in such situations. We demonstrate results showing up to a 16% relative improvement in ACE value for the 2004 ACE relation extraction task for English, Arabic and Chinese. . | Minority Vote At-Least-N Voting Improves Recall for Extracting Relations Nanda Kambhatla IBM . Watson Research Center 1101 Kitchawan Road Rt 134 Yorktown NY 10598 nanda@ Abstract Several NLP tasks are characterized by asymmetric data where one class label NONE signifying the absence of any structure named entity coreference relation etc. dominates all other classes. Classifiers built on such data typically have a higher precision and a lower recall and tend to overproduce the NONE class. We present a novel scheme for voting among a committee of classifiers that can significantly boost the recall in such situations. We demonstrate results showing up to a 16 relative improvement in ACE value for the 2004 ACE relation extraction task for English Arabic and Chinese. 1 Introduction Statistical classifiers are widely used for diverse NLP applications such as part of speech tagging Ratnaparkhi 1999 chunking Zhang et al. 2002 semantic parsing Magerman 1993 named entity extraction Borthwick 1999 Bikel et al. 1997 Florian et al. 2004 coreference resolution Soon et al. 2001 relation extraction Kambhatla 2004 etc. A number of these applications are characterized by a dominance of a NONE class in the training examples. For example for coreference resolution classifiers might classify whether a given pair of mentions are references to the same entity or not. In this case we typically have a lot more examples of mention pairs that are not coreferential . the NONE class than otherwise. Similarly if a classifier is predicting the presence absence of a semantic relation between two mentions there are typically far more examples signifying an absence of a relation. Classifiers built with asymmetric data dominated by one class a NONE class donating absence of a relation or coreference or a named entity etc. can overgenerate the NONE class. This often results in a unbalanced classifier where precision is higher than recall. In this paper we present a novel approach for

TÀI LIỆU LIÊN QUAN