tailieunhanh - Báo cáo khoa học: "Learning with Annotation Noise"

It is usually assumed that the kind of noise existing in annotated data is random classification noise. Yet there is evidence that differences between annotators are not always random attention slips but could result from different biases towards the classification categories, at least for the harder-to-decide cases. Under an annotation generation model that takes this into account, there is a hazard that some of the training instances are actually hard cases with unreliable annotations. | Learning with Annotation Noise Eyal Beigman Olin Business School Washington University in St. Louis beigman@ Beata Beigman Klebanov Kellogg School of Management Northwestern University beata@ Abstract It is usually assumed that the kind of noise existing in annotated data is random classification noise. Yet there is evidence that differences between annotators are not always random attention slips but could result from different biases towards the classification categories at least for the harder-to-decide cases. Under an annotation generation model that takes this into account there is a hazard that some of the training instances are actually hard cases with unreliable annotations. We show that these are relatively unproblematic for an algorithm operating under the 0-1 loss model whereas for the commonly used voted perceptron algorithm hard training cases could result in incorrect prediction on the uncontroversial cases at test time. 1 Introduction It is assumed often tacitly that the kind of noise existing in human-annotated datasets used in computational linguistics is random classification noise Kearns 1993 Angluin and Laird 1988 resulting from annotator attention slips randomly distributed across instances. For example Osborne 2002 evaluates noise tolerance of shallow parsers with random classification noise taken to be crudely approximating annotation errors. It has been shown both theoretically and empirically that this type of noise is tolerated well by the commonly used machine learning algorithms Cohen 1997 Blum et al. 1996 Osborne 2002 Reidsma and Carletta 2008 . Yet this might be overly optimistic. Reidsma and op den Akker 2008 show that apparent differences between annotators are not random slips of attention but rather result from different biases annotators might have towards the classification categories. When training data comes from one annotator and test data from another the first annotator s biases are sometimes .