tailieunhanh - Báo cáo khoa học: "Proceedings of EACL '99"

People have been writing programs for automatic Word Sense Disambiguation (WSD) for forty years now, yet the validity of the task has remained in doubt. At a first pass, the task is simply defined: a word like bank can mean 'river bank' or 'money bank' and the task-is to determine which of these applies in a context in which the word bank appears. The problems arise because most sense distinctions are not as clear as the distinction between 'river bank' and 'money b.~nk', so it is not always straightforward for a person to say what the correct answer is | Proceedings of EACL 99 95 Replicability for Manual Word Sense Tagging Adam Kilgarriff ITRI University of Brighton Lewes Road Brighton UK email adam@ People have been writing programs for automatic Word Sense Disambiguation WSD for forty years now yet the validity of the task has remained in doubt. At a first pass the task is simply defined a word like bank can mean river bank or money bank and the task is to determine which of these applies in a context in which the word bank appears. The problems arise because most sense distinctions are not as clear as the distinction between river bank and money bank so it is not always straightforward for a person to say what the correct answer is. Thus we do not always know what it would mean to say that a computer program got the right answer. The issue is discussed in detail by Gale et al. 1992 who identify the problem as one of identifying the upper bound for the performance of a WSD program. If people can only agree on the correct answer x of the time a claim that a program achieves more than x accuracy is hard to interpret and x is the upper bound for what the program can meaningfully achieve. There have been some discussions as to what this upper bound might be. Gale et al. review a psycholinguistic study Jorgensen 1990 in which the level of agreement averaged 68 . But an upper bound of 68 is disastrous for the enterprise since it implies that the best a program could possibly do is still not remotely good enough for any practical purpose. Even worse news comes from Ng and Lee 1996 who re-tagged parts of the manually tagged SEMCOR corpus Fellbaum 1998 . The taggings matched only 57 of the time. If these represent as high a level of inter tagger agreement as one could ever expect WSD is a doomed enterprise. However neither study set out to identify an upper bound for WSD and it is far from ideal to use their results in this way. In this paper we report on a study which did aim specifically at achieving as .

TỪ KHÓA LIÊN QUAN