tailieunhanh - Báo cáo khoa học: "Clustering Hungarian Verbs on the Basis of Complementation Patterns"
Our paper reports an attempt to apply an unsupervised clustering algorithm to a Hungarian treebank in order to obtain semantic verb classes. Starting from the hypothesis that semantic metapredicates underlie verbs’ syntactic realization, we investigate how one can obtain semantically motivated verb classes by automatic means. The 150 most frequent Hungarian verbs were clustered on the basis of their complementation patterns, yielding a set of basic classes and hints about the features that determine verbal subcategorization. . | Clustering Hungarian Verbs on the Basis of Complementation Patterns Kata Gabor Dept. of Language Technology Linguistics Institute HAS 1399 Budapest P O. Box 701 518 Hungary gkata@ Eniko Heja Dept. of Language Technology Linguistics Institute HAS 1399 Budapest P O. Box 701 518 Hungary eheja@ Abstract Our paper reports an attempt to apply an unsupervised clustering algorithm to a Hungarian treebank in order to obtain semantic verb classes. Starting from the hypothesis that semantic metapredicates underlie verbs syntactic realization we investigate how one can obtain semantically motivated verb classes by automatic means. The 150 most frequent Hungarian verbs were clustered on the basis of their complementation patterns yielding a set of basic classes and hints about the features that determine verbal subcategorization. The resulting classes serve as a basis for the subsequent analysis of their alternation behavior. 1 Introduction For over a decade automatic construction of wide-coverage structured lexicons has been in the center of interest in the natural language processing community. On the one hand structured lexical databases are easier to handle and to expand because they allow making generalizations over classes of words. On the other hand interest in the automatic acquisition of lexical information from corpora is due to the fact that manual construction of such resources is time-consuming and the resulting database is difficult to update. Most of the work in the field of acquisition of verbal lexical properties aims at learning subcategorization frames from corpora . Pereira et al. 1993 Briscoe and Carroll 1997 Sass 2006 . However semantic group ing of verbs on the basis of their syntactic distribution or other quantifiable features has also gained attention Schulte im Walde 2000 Schulte im Walde and Brew 2002 Merlo and Stevenson 2001 Dorr and Jones 1996 . The goal of these investigations is either the validation of verb classes based on .
đang nạp các trang xem trước