tailieunhanh - Báo cáo khoa học: "Annotating and Learning Compound Noun Semantics"

There is little consensus on a standard experimental design for the compound interpretation task. This paper introduces wellmotivated general desiderata for semantic annotation schemes, and describes such a scheme for in-context compound annotation accompanied by detailed publicly available guidelines. Classification experiments on an open-text dataset compare favourably with previously reported results and provide a solid baseline for future research. | Annotating and Learning Compound Noun Semantics Diarmuid Ó Seaghdha University of Cambridge Computer Laboratory 15 JJ Thomson Avenue Cambridge CB3 0FD United Kingdom do242@ Abstract There is little consensus on a standard experimental design for the compound interpretation task. This paper introduces well-motivated general desiderata for semantic annotation schemes and describes such a scheme for in-context compound annotation accompanied by detailed publicly available guidelines. Classification experiments on an open-text dataset compare favourably with previously reported results and provide a solid baseline for future research. 1 Introduction There are a number of reasons why the interpretation of noun-noun compounds has long been a topic of interest for NLP researchers. Compounds occur very frequently in English and many other languages so they cannot be avoided by a robust semantic processing system. Compounding is a very productive process with a highly skewed type frequency spectrum and corpus information may be very sparse. Compounds are often highly ambiguous and a large degree of world knowledge seems necessary to understand them. For example knowing that a cheese knife is probably a knife for cutting cheese and probably not a knife made of cheese cf. plastic knife does not just require an ability to identify the senses of cheese and knife but also knowledge about what one usually does with cheese and knives. These factors combine to yield a difficult problem that exhibits many of the challenges characteristic of lexical semantic processing in general. Recent research has made signifi cant progress on solving the problem with statistical methods and often without the need for manually created lexical resources Lauer 1995 Lapata and Keller 2004 Girju 2006 Turney 2006 . The work presented here is part of an ongoing project that treats compound interpretation as a classification problem to be solved using machine learning. 2 Selecting an .