tailieunhanh - Báo cáo khoa học: "A Clustering Approach for the Nearly Unsupervised Recognition of Nonliteral Language"

In this paper we present TroFi (Trope Finder), a system for automatically classifying literal and nonliteral usages of verbs through nearly unsupervised word-sense disambiguation and clustering techniques. TroFi uses sentential context instead of selectional constraint violations or paths in semantic hierarchies. It also uses literal and nonliteral seed sets acquired and cleaned without human supervision in order to bootstrap learning. We adapt a word-sense disambiguation algorithm to our task and augment it with multiple seed set learners, a voting schema, and additional features like SuperTags and extrasentential context. . | A Clustering Approach for the Nearly Unsupervised Recognition of Nonliteral Language Julia Birke and Anoop Sarkar School of Computing Science Simon Fraser University Burnaby BC V5A 1S6 Canada jbirke@ anoop@ Abstract In this paper we present TroFi Trope Finder a system for automatically classifying literal and nonliteral usages of verbs through nearly unsupervised word-sense disambiguation and clustering techniques. TroFi uses sentential context instead of selectional constraint violations or paths in semantic hierarchies. It also uses literal and nonliteral seed sets acquired and cleaned without human supervision in order to bootstrap learning. We adapt a word-sense disambiguation algorithm to our task and augment it with multiple seed set learners a voting schema and additional features like SuperTags and extra-sentential context. Detailed experiments on hand-annotated data show that our enhanced algorithm outperforms the baseline by . Using the TroFi algorithm we also build the TroFi Example Base an extensible resource of annotated literal nonliteral examples which is freely available to the NLP research community. 1 Introduction In this paper we propose TroFi Trope Finder a nearly unsupervised clustering method for separating literal and nonliteral usages of verbs. For example given the target verb pour we would expect TroFi to cluster the sentence Custom demands that cognac be poured from a freshly opened bottle as literal and the sentence Salsa and rap music pour out of the windows as nonliteral which indeed it does. We call our method nearly unsupervised. See Section for why we use this terminology. We reduce the problem of nonliteral language recognition to one of word-sense disambiguation This research was partially supported by NSERC Canada RGPIN 264905 . We would like to thank Bill Dolan Fred Popowich Dan Fass Katja Markert Yudong Liu and the anonymous reviewers for their comments. by redefining literal and nonliteral as two

TỪ KHÓA LIÊN QUAN