tailieunhanh - Báo cáo khoa học: "Word Sense Disambiguation Using Label Propagation Based Semi-Supervised Learning"

Shortage of manually sense-tagged data is an obstacle to supervised word sense disambiguation methods. In this paper we investigate a label propagation based semisupervised learning algorithm for WSD, which combines labeled and unlabeled data in learning process to fully realize a global consistency assumption: similar examples should have similar labels. Our experimental results on benchmark corpora indicate that it consistently outperforms SVM when only very few labeled examples are available, and its performance is also better than monolingual bootstrapping, and comparable to bilingual bootstrapping. . | Word Sense Disambiguation Using Label Propagation Based Semi-Supervised Learning Zheng-Yu Niu Dong-Hong Ji Institute for Infocomm Research 21 Heng Mui Keng Terrace 119613 Singapore zniu dhji @ Abstract Shortage of manually sense-tagged data is an obstacle to supervised word sense disambiguation methods. In this paper we investigate a label propagation based semisupervised learning algorithm for WSD which combines labeled and unlabeled data in learning process to fully realize a global consistency assumption similar examples should have similar labels. Our experimental results on benchmark corpora indicate that it consistently outperforms SVM when only very few labeled examples are available and its performance is also better than monolingual bootstrapping and comparable to bilingual bootstrapping. 1 Introduction In this paper we address the problem of word sense disambiguation WSD which is to assign an appropriate sense to an occurrence of a word in a given context. Many methods have been proposed to deal with this problem including supervised learning algorithms Leacock et al. 1998 semi-supervised learning algorithms Yarowsky 1995 and unsupervised learning algorithms Schutze 1998 . Supervised sense disambiguation has been very successful but it requires a lot of manually sense-tagged data and can not utilize raw unannotated data that can be cheaply acquired. Fully unsupervised methods do not need the definition of senses and manually sense-tagged data but their sense clustering results can not be directly used in many NLP tasks since there is no sense tag for each instance in clusters. Considering both the availability of a large amount of unlabelled data and direct use of word Chew Lim Tan Department of Computer Science National University of Singapore 3 Science Drive 2 117543 Singapore tancl@ senses semi-supervised learning methods have received great attention recently. Semi-supervised methods for WSD are characterized in terms .

TÀI LIỆU LIÊN QUAN