tailieunhanh - Báo cáo khoa học: "Improving Bitext Word Alignments via Syntax-based Reordering of English"
We present an improved method for automated word alignment of parallel texts which takes advantage of knowledge of syntactic divergences, while avoiding the need for syntactic analysis of the less resource rich language, and retaining the robustness of syntactically agnostic approaches such as the IBM word alignment models. We achieve this by using simple, easily-elicited knowledge to produce syntaxbased heuristics which transform the target language (. English) into a form more closely resembling the source language, and then by using standard alignment methods to align the transformed bitext. . | Improving Bitext Word Alignments via Syntax-based Reordering of English Elliott Franco Drabek and David Yarowsky Department of Computer Science Johns Hopkins University Baltimore MD 21218 USA edrabek yarowsky @ Abstract We present an improved method for automated word alignment of parallel texts which takes advantage of knowledge of syntactic divergences while avoiding the need for syntactic analysis of the less resource rich language and retaining the robustness of syntactically agnostic approaches such as the IBM word alignment models. We achieve this by using simple easily-elicited knowledge to produce syntaxbased heuristics which transform the target language . English into a form more closely resembling the source language and then by using standard alignment methods to align the transformed bitext. We present experimental results under variable resource conditions. The method improves word alignment performance for language pairs such as English-Korean and English-Hindi which exhibit longer-distance syntactic divergences. 1 Introduction Word-level alignment is a key infrastructural technology for multilingual processing. It is crucial for the development of translation models and translation lexica Tufis 2002 Melamed 1998 as well as for translingual projection Yarowsky et al. 2001 Lopez et al. 2002 . It has increasingly attracted attention as a task worthy of study in its own right Mihalcea and Pedersen 2003 Och and Ney 2000 . Syntax-light alignment models such as the five IBM models Brown et al. 1993 and their relatives have proved to be very successful and robust at producing word-level alignments especially for closely related languages with similar word order and mostly local reorderings which can be captured via simple models of relative word distortion. However these models have been less successful at modeling syntactic distortions with longer distance movement. In contrast more syntactically informed approaches have been constrained by .
đang nạp các trang xem trước