tailieunhanh - Báo cáo khoa học: "Robustness and Generalization of Role Sets: PropBank vs. VerbNet"
This paper presents an empirical study on the robustness and generalization of two alternative role sets for semantic role labeling: PropBank numbered roles and VerbNet thematic roles. By testing a state–of–the–art SRL system with the two alternative role annotations, we show that the PropBank role set is more robust to the lack of verb–specific semantic information and generalizes better to infrequent and unseen predicates. Keeping in mind that thematic roles are better for application needs, we also tested the best way to generate VerbNet annotation. . | Robustness and Generalization of Role Sets PropBank vs. VerbNet Benat Zapirain and Eneko Agirre IXA NLP Group University of the Basque Country @ Lluis Marquez TALP Research Center Technical University of Catalonia lluism@ Abstract This paper presents an empirical study on the robustness and generalization of two alternative role sets for semantic role labeling PropBank numbered roles and VerbNet thematic roles. By testing a state-of-the-art SRL system with the two alternative role annotations we show that the PropBank role set is more robust to the lack of verb-specific semantic information and generalizes better to infrequent and unseen predicates. Keeping in mind that thematic roles are better for application needs we also tested the best way to generate VerbNet annotation. We conclude that tagging first PropBank roles and mapping into Verb-Net roles is as effective as training and tagging directly on VerbNet and more robust for domain shifts. 1 Introduction Semantic Role Labeling is the problem of analyzing clause predicates in open text by identifying arguments and tagging them with semantic labels indicating the role they play with respect to the verb. Such sentence-level semantic analysis allows to determine who did what to whom when and where and thus characterize the participants and properties of the events established by the predicates. This kind of semantic analysis is very interesting for a broad spectrum of NLP applications information extraction summarization question answering machine translation etc. since it opens the door to exploit the semantic relations among linguistic constituents. The properties of the semantically annotated corpora available have conditioned the type of research and systems that have been developed so far. PropBank Palmer et al. 2005 is the most widely used corpus for training SRL systems probably because it contains running text from the Penn Treebank corpus with annotations on all .
đang nạp các trang xem trước