tailieunhanh - Báo cáo khoa học: "Sense-Linking in a Machine Readable Dictionary"

Dictionaries contain a rich set of relationships between their senses, but often these relationships are only implicit. We report on our experiments to automatically identify links between the senses in a machinereadable dictionary. In particular, we automatically identify instances of zero-affix morphology, and use that information to find specific linkages between senses. This work has provided insight into the performance of a stochastic tagger. 1 Introduction (LDOCE), is a dictionary for learners of English as a second language. As such, it provides a great deal of information about word meanings in the form of example sentences, usage notes,. | Sense-Linking in a Machine Readable Dictionary Robert Krovetz Department of Computer Science University of Massachusetts Amherst MA 01003 Abstract Dictionaries contain a rich set of relationships between their senses but often these relationships are only implicit. We report on our experiments to automatically identify links between the senses in a machine-readable dictionary. In particular we automatically identify instances of zero-affix morphology and use that information to find specific linkages between senses. This work has provided insight into the performance of a stochastic tagger. 1 Introduction Machine-readable dictionaries contain a rich set of relationships between their senses and indicate them in a variety of ways. Sometimes the relationship is provided explicitly such as with a synonym or antonym reference. More commonly the relationship is only implicit and needs to be uncovered through outside mechanisms. This paper describes our efforts at identifying these links. The purpose of the research is to obtain a better understanding of the relationships between word meanings and to provide data for our work on wordsense disambiguation and information retrieval. Our hypothesis is that retrieving documents on the basis of word senses instead of words will result in better performance. Our approach is to treat the information associated with dictionary senses part of speech subcategorization subject area codes etc. as multiple sources of evidence cf. Krovetz 3 . This process is fundamentally a divisive one and each of the sources of evidence has exceptions . instances in which senses are related in spite of being separated by part of speech subcategorization or morphology . Identifying related senses will help us to test the hypothesis that unrelated meanings will be more effective at separating relevant from nonrelevant documents than meanings which are related. We will first discuss some of the explicit indications of sense relationships as found in

TÀI LIỆU LIÊN QUAN