tailieunhanh - Báo cáo khoa học: "Hierarchical Search for Word Alignment"
We present a simple yet powerful hierarchical search algorithm for automatic word alignment. Our algorithm induces a forest of alignments from which we can efficiently extract a ranked k-best list. We score a given alignment within the forest with a flexible, linear discriminative model incorporating hundreds of features, and trained on a relatively small amount of annotated data. We report results on Arabic-English word alignment and translation tasks. Our model outperforms a GIZA++ Model-4 baseline by points in F-measure, yielding a BLEU score increase over a state-of-the-art syntax-based machine translation system | Hierarchical Search for Word Alignment Jason Riesa and Daniel Marcu Information Sciences Institute Viterbi School of Engineering University of Southern California riesa marcu @ Abstract We present a simple yet powerful hierarchical search algorithm for automatic word alignment. Our algorithm induces a forest of alignments from which we can efficiently extract a ranked Ubest list. We score a given alignment within the forest with a flexible linear discriminative model incorporating hundreds of features and trained on a relatively small amount of annotated data. We report results on Arabic-English word alignment and translation tasks. Our model outperforms a GIZA Model-4 baseline by points in F-measure yielding a BLEU score increase over a state-of-the-art syntax-based machine translation system. 1 Introduction Automatic word alignment is generally accepted as a first step in training any statistical machine translation system. It is a vital prerequisite for generating translation tables phrase tables or syntactic transformation rules. Generative alignment models like IBM Model-4 Brown et al. 1993 have been in wide use for over 15 years and while not perfect see Figure 1 they are completely unsupervised requiring no annotated training data to learn alignments that have powered many current state-of-the-art translation system. Today there exist human-annotated alignments and an abundance of other information for many language pairs potentially useful for inducing accurate alignments. How can we take advantage of all of this data at our fingertips Using feature functions that encode extra information is one good way. Unfortunately as Moore 2005 points out it is usually difficult to extend a given generative model with feature functions without changing the entire generative story. This difficulty V . V Y w U w V M V U r M 3 .ợ -V o o soM J durJi Figure 1 Model-4 alignment vs. a gold standard. Circles represent links in a human-annotated alignment and .
đang nạp các trang xem trước