tailieunhanh - Báo cáo khoa học: "Automatically Constructing a Lexicon of Verb Phrase Idiomatic Combinations"
We investigate the lexical and syntactic flexibility of a class of idiomatic expressions. We develop measures that draw on such linguistic properties, and demonstrate that these statistical, corpus-based measures can be successfully used for distinguishing idiomatic combinations from non-idiomatic ones. We also propose a means for automatically determining which syntactic forms a particular idiom can appear in, and hence should be included in its lexical representation. | Automatically Constructing a Lexicon of Verb Phrase Idiomatic Combinations Afsaneh Fazly Department of Computer Science University of Toronto Toronto ON M5S 3H5 Canada afsaneh@ Suzanne Stevenson Department of Computer Science University of Toronto Toronto On M5S 3H5 Canada suzanne@ Abstract We investigate the lexical and syntactic flexibility of a class of idiomatic expressions. We develop measures that draw on such linguistic properties and demonstrate that these statistical corpus-based measures can be successfully used for distinguishing idiomatic combinations from non-idiomatic ones. We also propose a means for automatically determining which syntactic forms a particular idiom can appear in and hence should be included in its lexical representation. 1 Introduction The term idiom has been applied to a fuzzy category with prototypical examples such as by and large kick the bucket and let the cat out of the bag. Providing a definitive answer for what idioms are and determining how they are learned and understood are still subject to debate Glucksberg 1993 Nunberg et al. 1994 . Nonetheless they are often defined as phrases or sentences that involve some degree of lexical syntactic and or semantic idiosyncrasy. Idiomatic expressions as a part of the vast family of figurative language are widely used both in colloquial speech and in written language. Moreover a phrase develops its idiomaticity over time Cacciari 1993 consequently new idioms come into existence on a daily basis Cowie et al. 1983 Seaton and Macaulay 2002 . Idioms thus pose a serious challenge both for the creation of wide-coverage computational lexicons and for the development of large-scale linguistically plausible natural language processing NLP systems Sag et al. 2002 . One problem is due to the range of syntactic idiosyncrasy of idiomatic expressions. Some idioms such as by and large contain syntactic violations these are often completely fixed and hence can be listed in
đang nạp các trang xem trước