tailieunhanh - Báo cáo khoa học: "Empirical Measurements of Lexical Similarity in Noun Phrase Conjuncts"

The ability to detect similarity in conjunct heads is potentially a useful tool in helping to disambiguate coordination structures - a difficult task for parsers. We propose a distributional measure of similarity designed for such a task. We then compare several different measures of word similarity by testing whether they can empirically detect similarity in the head nouns of noun phrase conjuncts in the Wall Street Journal (WSJ) treebank. | Empirical Measurements of Lexical Similarity in Noun Phrase Conjuncts Deirdre Hogan Department of Computer Science Trinity College Dublin Dublin 2 Ireland dhogan@ Abstract The ability to detect similarity in conjunct heads is potentially a useful tool in helping to disambiguate coordination structures - a difficult task for parsers. We propose a distributional measure of similarity designed for such a task. We then compare several different measures of word similarity by testing whether they can empirically detect similarity in the head nouns of noun phrase con-juncts in the Wall Street Journal WSJ treebank. We demonstrate that several measures of word similarity can successfully detect conjunct head similarity and suggest that the measure proposed in this paper is the most appropriate for this task. 1 Introduction Some noun pairs are more likely to be conjoined than others. Take the follow two alternate bracketings 1. busloads of executives and their spouses and 2. busloads of executives and their spouses . The two head nouns coordinated in 1 are executives and spouses and incorrectly in 2 busloads and spouses. Clearly the former pair of head nouns is more likely and for the purpose of discrimination a parsing model would benefit if it could learn that executives and spouses is a more likely combination than busloads and spouses. If nouns co-occurring in coordination patterns are often semantically similar and if a simi Now at the National Centre for Language Technology Dublin City University Ireland. 149 larity measure could be defined so that for example sim executives spouses sim busloads spouses then it is potentially useful for coordination disambiguation. The idea that nouns co-occurring in conjunctions tend to be semantically related has been noted in Riloff and Shepherd 1997 and used effectively to automatically cluster semantically similar words Roark and Charniak 1998 Caraballo 1999 Widdows and Dorow 2002 . The tendency for conjoined .