tailieunhanh - Báo cáo khoa học: "Yet Another Word Alignment Tool"

Yawat1 is a tool for the visualization and manipulation of word- and phrase-level alignments of parallel text. Unlike most other tools for manual word alignment, it relies on dynamic markup to visualize alignment relations, that is, markup is shown and hidden depending on the current mouse position. This reduces the visual complexity of the visualization and allows the annotator to focus on one item at a time. For a bird’s-eye view of alignment patterns within a sentence, the tool is also able to display alignments as alignment matrices. . | Yawat Yet Another Word Alignment Tool Ulrich Germann University of Toronto germann@ Abstract Yawat1 is a tool for the visualization and manipulation of word- and phrase-level alignments of parallel text. Unlike most other tools for manual word alignment it relies on dynamic markup to visualize alignment relations that is markup is shown and hidden depending on the current mouse position. This reduces the visual complexity of the visualization and allows the annotator to focus on one item at a time. For a bird s-eye view of alignment patterns within a sentence the tool is also able to display alignments as alignment matrices. In addition it allows for manual labeling of alignment relations with customizable tag sets. Different text colors are used to indicate which words in a given sentence pair have already been aligned and which ones still need to be aligned. Tag sets and color schemes can easily be adapted to the needs of specific annotation projects through configuration files. The tool is implemented in JavaScript and designed to run as a web application. 1 Introduction Sub-sentential alignments of parallel text play an important role in statistical machine translation SMT . Aligning parallel data on the word- or phrase-level is typically one of the first steps in building SMT systems as those alignments constitute the basis for the construction of probabilistic translation dictionaries. Consequently considerable effort has gone into devising and improving automatic word alignment algorithms and into evaluating their performance . Och and Ney 2003 Taskar et al. 2005 Moore et al. 2006 Fraser and Marcu 2006 among many others . For the sake of simplicity we will in the following use the term word alignment 1 Yawat was first presented at the 2007 Linguistic Annotation Workshop Germann 2007 . to refer to any form of alignment that identifies words or groups of words as translations of each other. Any explicit evaluation of word alignment quality .