Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Hierarchical Search for Word Alignment"

Thụy Trâm 102 10 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

We present a simple yet powerful hierarchical search algorithm for automatic word alignment. Our algorithm induces a forest of alignments from which we can eﬃciently extract a ranked k-best list. We score a given alignment within the forest with a ﬂexible, linear discriminative model incorporating hundreds of features, and trained on a relatively small amount of annotated data. We report results on Arabic-English word alignment and translation tasks. Our model outperforms a GIZA++ Model-4 baseline by 6.3 points in F-measure, yielding a 1.1 BLEU score increase over a state-of-the-art syntax-based machine translation system | Hierarchical Search for Word Alignment Jason Riesa and Daniel Marcu Information Sciences Institute Viterbi School of Engineering University of Southern California riesa marcu @isi.edu Abstract We present a simple yet powerful hierarchical search algorithm for automatic word alignment. Our algorithm induces a forest of alignments from which we can efficiently extract a ranked Ubest list. We score a given alignment within the forest with a flexible linear discriminative model incorporating hundreds of features and trained on a relatively small amount of annotated data. We report results on Arabic-English word alignment and translation tasks. Our model outperforms a GIZA Model-4 baseline by 6.3 points in F-measure yielding a 1.1 BLEU score increase over a state-of-the-art syntax-based machine translation system. 1 Introduction Automatic word alignment is generally accepted as a first step in training any statistical machine translation system. It is a vital prerequisite for generating translation tables phrase tables or syntactic transformation rules. Generative alignment models like IBM Model-4 Brown et al. 1993 have been in wide use for over 15 years and while not perfect see Figure 1 they are completely unsupervised requiring no annotated training data to learn alignments that have powered many current state-of-the-art translation system. Today there exist human-annotated alignments and an abundance of other information for many language pairs potentially useful for inducing accurate alignments. How can we take advantage of all of this data at our fingertips Using feature functions that encode extra information is one good way. Unfortunately as Moore 2005 points out it is usually difficult to extend a given generative model with feature functions without changing the entire generative story. This difficulty V . V Y w U w V M V U r M 3 .ợ -V o o soM J durJi Figure 1 Model-4 alignment vs. a gold standard. Circles represent links in a human-annotated alignment and .

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "A Topic Similarity Model for Hierarchical Phrase-based Translation"

Báo cáo khoa học: "Modeling Topic Dependencies in Hierarchical Text Categorization"

Báo cáo khoa học: "Hierarchical Chunk-to-String Translation"

Báo cáo khoa học: "Head-Driven Hierarchical Phrase-based Translation"

Báo cáo khoa học: "Pattern Learning for Relation Extraction with a Hierarchical Topic Model"

Báo cáo khoa học: "Translation Model Size Reduction for Hierarchical Phrase-based Statistical Machine Translation"

Báo cáo khoa học: "SITS: A Hierarchical Nonparametric Model using Speaker Identity for Topic Segmentation in Multiparty Conversations"

Báo cáo khoa học: "A Discriminative Hierarchical Model for Fast Coreference at Large Scale"

Báo cáo khoa học: "Hierarchical Search for Word Alignment"

Báo cáo khoa học: "Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data"