tailieunhanh - Báo cáo khoa học: "Semi-Supervised Active Learning for Sequence Labeling"

While Active Learning (AL) has already been shown to markedly reduce the annotation efforts for many sequence labeling tasks compared to random selection, AL remains unconcerned about the internal structure of the selected sequences (typically, sentences). We propose a semisupervised AL approach for sequence labeling where only highly uncertain subsequences are presented to human annotators, while all others in the selected sequences are automatically labeled. For the task of entity recognition, our experiments reveal that this approach reduces annotation efforts in terms of manually labeled tokens by up to 60 % compared to the standard, fully supervised AL scheme | Semi-Supervised Active Learning for Sequence Labeling Katrin Tomanek and Udo Hahn Jena University Language Information Engineering Julie Lab Friedrich-Schiller-Universitat Jena Germany @ Abstract While Active Learning AL has already been shown to markedly reduce the annotation efforts for many sequence labeling tasks compared to random selection AL remains unconcerned about the internal structure of the selected sequences typically sentences . We propose a semisupervised AL approach for sequence labeling where only highly uncertain subsequences are presented to human annotators while all others in the selected sequences are automatically labeled. For the task of entity recognition our experiments reveal that this approach reduces annotation efforts in terms of manually labeled tokens by up to 60 compared to the standard fully supervised AL scheme. 1 Introduction Supervised machine learning ML approaches are currently the methodological backbone for lots of NLP activities. Despite their success they create a costly follow-up problem viz. the need for human annotators to supply large amounts of golden annotation data on which ML systems can be trained. In most annotation campaigns the language material chosen for manual annotation is selected randomly from some reference corpus. Active Learning AL has recently shaped as a much more efficient alternative for the creation of precious training material. In the AL paradigm only examples of high training utility are selected for manual annotation in an iterative manner. Different approaches to AL have been successfully applied to a wide range of NLP tasks Engel-son and Dagan 1996 Ngai and Yarowsky 2000 Tomanek et al. 2007 Settles and Craven 2008 . When used for sequence labeling tasks such as POS tagging chunking or named entity recogni tion NER the examples selected by AL are sequences of text typically sentences. Approaches to AL for sequence labeling are usually unconcerned about the

TÀI LIỆU LIÊN QUAN