tailieunhanh - Báo cáo khoa học: "Empirical Lower Bounds on the Complexity of Translational Equivalence ∗"

This paper describes a study of the patterns of translational equivalence exhibited by a variety of bitexts. The study found that the complexity of these patterns in every bitext was higher than suggested in the literature. These findings shed new light on why “syntactic” constraints have not helped to improve statistical translation models, including finitestate phrase-based models, tree-to-string models, and tree-to-tree models. | Empirical Lower Bounds on the Complexity of Translational Equivalence Benjamin Wellington Computer Science Dept. New York University New York NY 10003 lastname @ Sonjia Waxmonsky Computer Science Dept. University of Chicago 1 Chicago IL 60637 wax@ I. Dan Melamed Computer Science Dept. New York University New York NY 10003 lastname @ Abstract This paper describes a study of the patterns of translational equivalence exhibited by a variety of bitexts. The study found that the complexity of these patterns in every bitext was higher than suggested in the literature. These findings shed new light on why syntactic constraints have not helped to improve statistical translation models including finite-state phrase-based models tree-to-string models and tree-to-tree models. The paper also presents evidence that inversion transduction grammars cannot generate some translational equivalence relations even in relatively simple real bitexts in syntactically similar languages with rigid word order. Instructions for replicating our experiments are at http GenPar ACL06 1 Introduction Translational equivalence is a mathematical relation that holds between linguistic expressions with the same meaning. The most common explicit representations of this relation are word alignments between sentences that are translations of each other. The complexity of a given word alignment can be measured by the difficulty of decomposing it into its atomic units under certain constraints detailed in Section 2. This paper describes a study of the distribution of alignment complexity in a variety of bitexts. The study considered word alignments both in isolation and in combination with independently generated parse trees for one or both sentences in each pair. Thus the study Thanks to David Chiang Liang Huang the anonymous reviewers and members of the NYU Proteus Project for helpful feedback. This research was supported by NSF grant s 0238406 and .