tailieunhanh - Báo cáo khoa học: "Part of Speech Tagging Using a Network of Linear Separators"

We present an architecture and an on-line learning algorithm and apply it to the problem of part-ofspeech tagging. The architecture presented, SNOW, is a network of linear separators in the feature space, utilizing the Winnow update algorithm. Multiplicative weight-update algorithms such as Winnow have been shown to have exceptionally good behavior when applied to very high dimensional problems, and especially when the target concepts depend on only a small subset of the features in the feature space. . | Part of Speech Tagging Using a Network of Linear Separators Dan Roth and Dmitry Zelenko Department of Computer Science University of Illinois at Urbana-Champaign 1304 w Springfield Ave. Urbana IL 61801 danr zelenko Abstract We present an architecture and an on-line learning algorithm and apply it to the problem of part-of-speech tagging. The architecture presented SNOW is a network of linear separators in the feature space utilizing the Winnow update algorithm. Multiplicative weight-update algorithms such as Winnow have been shown to have exceptionally good behavior when applied to very high dimensional problems and especially when the target concepts depend on only a small subset of the features in the feature space. In this paper we describe an architecture that utilizes this mistake-driven algorithm for multi-class prediction - selecting the part of speech of a word. The experimental analysis presented here provides more evidence to that these algorithms are suitable for natural language problems. The algorithm used is an on-line algorithm every example is used by the algorithm only once and is then discarded. This has significance in terms of efficiency as well as quick adaptation to new contexts. We present an extensive experimental study of our algorithm under various conditions in particular it is shown that the algorithm performs comparably to the best known algorithms for POS. 1 Introduction Learning problems in the natural language domain often map the text to a space whose dimensions are the measured features of the text . its words. Two characteristic properties of this domain are that its dimensionality is very high and that both the learned concepts and the instances reside very sparsely in the feature space. In this paper we present a learning algorithm and an architecture with properties suitable for this domain. The SNOW algorithm presented here builds on recently introduced theories of multiplicative weight-updating learning .

TỪ KHÓA LIÊN QUAN