tailieunhanh - Báo cáo khoa học: "Optimal Constituent Alignment with Edge Covers for Semantic Projection"

Given a parallel corpus, semantic projection attempts to transfer semantic role annotations from one language to another, typically by exploiting word alignments. In this paper, we present an improved method for obtaining constituent alignments between parallel sentences to guide the role projection task. Our extensions are twofold: (a) we model constituent alignment as minimum weight edge covers in a bipartite graph, which allows us to find a globally optimal solution efficiently; (b) we propose tree pruning as a promising strategy for reducing alignment noise. . | Optimal Constituent Alignment with Edge Covers for Semantic Projection Sebastian Padó Computational Linguistics Saarland University Saarbrucken Germany pado@ Mirella Lapata School of Informatics University of Edinburgh Edinburgh UK mlap@ Abstract Given a parallel corpus semantic projection attempts to transfer semantic role annotations from one language to another typically by exploiting word alignments. In this paper we present an improved method for obtaining constituent alignments between parallel sentences to guide the role projection task. Our extensions are twofold a we model constituent alignment as minimum weight edge covers in a bipartite graph which allows us to find a globally optimal solution efficiently b we propose tree pruning as a promising strategy for reducing alignment noise. Experimental results on an English-German parallel corpus demonstrate improvements over state-of-the-art models. 1 Introduction Recent years have witnessed increased interest in data-driven methods for many natural language processing NLP tasks ranging from part-of-speech tagging to parsing and semantic role labelling. The success of these methods is due partly to the availability of large amounts of training data annotated with rich linguistic information. Unfortunately such resources are largely absent for almost all languages except English. Given the data requirements for supervised learning and the current paucity of suitable data for many languages methods for generating annotations semi- auto-matically are becoming increasingly popular. Annotation projection tackles this problem by leveraging parallel corpora and the high-accuracy tools . parsers taggers available for a few languages. Specifically through the use of word alignments annotations are transfered from resource-rich languages onto low density ones. The projection process can be decomposed into three steps a determining the units of projection these are typically words but can .