tailieunhanh - Báo cáo khoa học: "Multi-Criteria-based Active Learning for Named Entity Recognition"

In this paper, we propose a multi-criteria based active learning approach and effectively apply it to named entity recognition. Active learning targets to minimize the human annotation efforts by selecting e xamples for labeling. To maximize the contribution of the selected examples, we consider the multiple criteria: informativeness, representativeness and diversity and propose measures to quantify them. | Multi-Criteria-based Active Learning for Named Entity Recognition Dan Shen 1 Jie Zhang Jian Su Guodong Zhou Chew-Lim Tan Institute for Infocomm Technology Department of Computer Science 21 Heng Mui Keng Terrace National University of Singapore Singapore 119613 3 Science Drive 2 Singapore 117543 shendan zhangjie sujian zhougd @ shendan zhangjie tancl @ Abstract In this paper we propose a multi-criteria-based active learning approach and effectively apply it to named entity recognition. Active learning targets to minimize the human annotation efforts by selecting examples for labeling. To maximize the contribution of the selected examples we consider the multiple criteria informativeness representativeness and diversity and propose measures to quantify them. More comprehensively we incorporate all the criteria using two selection strategies both of which result in less labeling cost than single-criterion-based method. The results of the named entity recognition in both MUC-6 and GENIA show that the labeling cost can be reduced by at least 80 without degrading the performance. 1 Introduction In the machine learning approaches of natural language processing NLP models are generally trained on large annotated corpus. However annotating such corpus is expensive and timeconsuming which makes it difficult to adapt an existing model to a new domain. In order to overcome this difficulty active learning sample sele c-tion has been studied in more and more NLP applications such as POS tagging Engelson and Dagan 1999 information extraction Thompson et al. 1999 text classification Lewis and Catlett 1994 McCallum and Nigam 1998 Schohn and Cohn 2000 Tong and Koller 2000 Brinker 2003 statistical parsing Thompson et al. 1999 Tang et al. 2002 Steedman et al. 2003 noun phrase chunking Ngai and Yarowsky 2000 etc. Active learning is based on the assumption that 1 Current address of the first author Universitat des Saarlandes dshen@ a small .

TÀI LIỆU LIÊN QUAN