tailieunhanh - Báo cáo khoa học: "A High-Performance Semi-Supervised Learning Method for Text Chunking"

In machine learning, whether one can build a more accurate classifier by using unlabeled data (semi-supervised learning) is an important issue. Although a number of semi-supervised methods have been proposed, their effectiveness on NLP tasks is not always clear. This paper presents a novel semi-supervised method that employs a learning paradigm which we call structural learning. | A High-Performance Semi-Supervised Learning Method for Text Chunking Rie Kubota Andof Tong Zhang IBM TJ. Watson Research Center Yorktown Heights NY 10598 . frie1@ Ịtongz@ Abstract In machine learning whether one can build a more accurate classifier by using unlabeled data semi-supervised learning is an important issue. Although a number of semi-supervised methods have been proposed their effectiveness on NLP tasks is not always clear. This paper presents a novel semi-supervised method that employs a learning paradigm which we call structural learning. The idea is to find what good classifiers are like by learning from thousands of automatically generated auxiliary classification problems on unlabeled data. By doing so the common predictive structure shared by the multiple classification problems can be discovered which can then be used to improve performance on the target problem. The method produces performance higher than the previous best results on CoNLL 00 syntactic chunking and CoNLL 03 named entity chunking English and German . 1 Introduction In supervised learning applications one can often find a large amount of unlabeled data without difficulty while labeled data are costly to obtain. Therefore a natural question is whether we can use unlabeled data to build a more accurate classifier given the same amount of labeled data. This problem is often referred to as semi-supervised learning. Although a number of semi-supervised methods have been proposed their effectiveness on NLP tasks is not always clear. For example co-training Blum and Mitchell 1998 automatically bootstraps labels and such labels are not necessarily reliable Pierce and Cardie 2001 . A related idea is to use Expectation Maximization EM to impute labels. Although useful under some circumstances when a relatively large amount of labeled data is available the procedure often degrades performance . Merialdo 1994 . A number of bootstrapping methods have been proposed .

TÀI LIỆU LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
41    188    5    28-12-2024