tailieunhanh - Báo cáo khoa học: " Word Sense Disambiguation in Untagged Text based on Term Weight Learning"

This paper describes unsupervised learning algorithm for disambiguating verbal word senses using term weight learning. In our method, collocations which characterise every sense are extracted using similarity-based estimation. For the results, term weight learning is performed. Parameters of term weighting are then estimated so as to maximise the collocations which characterise every sense and minimise the other collocations. The resuits of experiment demonstrate the effectiveness of the method. . | Proceedings of EACL 99 Word Sense Disambiguation in Untagged Text based on Term Weight Learning Fumiyo Fukumoto and Yoshimi Suzukif Department of Computer Science and Media Engineering Yamanashi University 4-3-11 Takeda Kofu 400-8511 Japan fukumoto@ ysuzuki@ . Abstract This paper describes unsupervised learning algorithm for disambiguating verbal word senses using term weight learning. In our method collocations which characterise every sense are extracted using similarity-based estimation. For the results term weight learning is performed. Parameters of term weighting are then estimated so as to maximise the collocations which characterise every sense and minimise the other collocations. The results of experiment demonstrate the effectiveness of the method. 1 Introduction One of the major approaches to disambiguate word senses is supervised learning Gale et al. 1992 Yarowsky 1992 Bruce and Janyce 1994 Miller et al. 1994 Niwa and Nitta 1994 Luk 1995 Ng and Lee 1996 Wilks and Stevenson 1998 . However a major obstacle impedes the acquisition of lexical knowledge from corpora . the difficulties of manually sensetagging a training corpus since this limits the applicability of many approaches to domains where this hard to acquire knowledge is already available. This paper describes unsupervised learning algorithm for disambiguating verbal word senses using term weight learning. In our approach an overlapping clustering algorithm based on Mutual information-based Mu term weight learning between a verb and a noun is applied to a set of verbs. It is preferable that Mu is not low Mu x 3 for a reliable statistical analysis Church et al. 1991 . However this suffers from the problem of data sparseness . the co-occurrences which are used to represent every distinct senses does not appear in the test data. To attack this problem for a low Mu value we distinguish between unobserved co-occurrences that are likely to occur in a new

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN