tailieunhanh - Báo cáo khoa học: "Generalizing Semantic Role Annotations Across Syntactically Similar Verbs"

Large corpora of parsed sentences with semantic role labels (. PropBank) provide training data for use in the creation of high-performance automatic semantic role labeling systems. Despite the size of these corpora, individual verbs (or rolesets) often have only a handful of instances in these corpora, and only a fraction of English verbs have even a single annotation. In this paper, we describe an approach for dealing with this sparse data problem, enabling accurate semantic role labeling for novel verbs (rolesets) with only a single training example. . | Generalizing Semantic Role Annotations Across Syntactically Similar Verbs Andrew S. Gordon Institute for Creative Technologies University of Southern California Marina del Rey CA 90292 USA gordon@ Reid Swanson Institute for Creative Technologies University of Southern California Marina del Rey CA 90292 USA swansonr@ Abstract Large corpora of parsed sentences with semantic role labels . PropBank provide training data for use in the creation of high-performance automatic semantic role labeling systems. Despite the size of these corpora individual verbs or rolesets often have only a handful of instances in these corpora and only a fraction of English verbs have even a single annotation. In this paper we describe an approach for dealing with this sparse data problem enabling accurate semantic role labeling for novel verbs rolesets with only a single training example. Our approach involves the identification of syntactically similar verbs found in PropBank the alignment of arguments in their corresponding rolesets and the use of their corresponding annotations in PropBank as surrogate training data. 1 Generalizing Semantic Role Annotations A recent release of the PropBank Palmer et al. 2005 corpus of semantic role annotations of Treebank parses contained 112 917 labeled instances of 4 250 rolesets corresponding to 3 257 verbs as illustrated by this example for the verb buy. arg0 Chuck bought arg1 a car arg2 from Jerry arg3 for 1o00 . Annotations similar to these have been used to create automated semantic role labeling systems Pradhan et al. 2005 Moschitti et al. 2006 for use in natural language processing applications that require only shallow semantic parsing. As with all machine-learning approaches the performance of these systems is heavily dependent on the availability of adequate amounts of training data. However the number of annotated instances in PropBank varies greatly from verb to verb there are 617 annotations for the want .