tailieunhanh - Báo cáo khoa học: "Bootstrapping via Graph Propagation"

Bootstrapping a classifier from a small set of seed rules can be viewed as the propagation of labels between examples via features shared between them. This paper introduces a novel variant of the Yarowsky algorithm based on this view. It is a bootstrapping learning method which uses a graph propagation algorithm with a well defined objective function. | Bootstrapping via Graph Propagation Max Whitney and Anoop Sarkar Simon Fraser University School of Computing Science Burnaby BC V5A 1S6 Canada mwhitney anoop @ Abstract Bootstrapping a classifier from a small set of seed rules can be viewed as the propagation of labels between examples via features shared between them. This paper introduces a novel variant of the Yarowsky algorithm based on this view. It is a bootstrapping learning method which uses a graph propagation algorithm with a well defined objective function. The experimental results show that our proposed bootstrapping algorithm achieves state of the art performance or better on several different natural language data sets. 1 Introduction In this paper we are concerned with a case of semisupervised learning that is close to unsupervised learning in that the labelled and unlabelled data points are from the same domain and only a small set of seed rules is used to derive the labelled points. We refer to this setting as bootstrapping. In contrast typical semi-supervised learning deals with a large number of labelled points and a domain adaptation task with unlabelled points from the new domain. The two dominant discriminative learning methods for bootstrapping are self-training Scudder 1965 and co-training Blum and Mitchell 1998 . In this paper we focus on a self-training style bootstrapping algorithm the Yarowsky algorithm Yarowsky 1995 . Variants of this algorithm have been formalized as optimizing an objective function in previous work by Abney 2004 and Haf-fari and Sarkar 2007 but it is not clear that any perform as well as the Yarowsky algorithm itself. We take advantage of this formalization and introduce a novel algorithm called Yarowsky-prop which builds on the algorithms of Yarowsky 1995 and Subramanya et al. 2010 . It is theoretically This research was partially supported by an NSERC Canada RGPIN 264905 grant. We would like to thank Gho-lamreza Haffari and the anonymous reviewers for their .

TỪ KHÓA LIÊN QUAN