tailieunhanh - Báo cáo khoa học: "Stochastic Lexicalized Inversion Transduction Grammar for Alignment"

We present a version of Inversion Transduction Grammar where rule probabilities are lexicalized throughout the synchronous parse tree, along with pruning techniques for efficient training. Alignment results improve over unlexicalized ITG on short sentences for which full EM is feasible, but pruning seems to have a negative impact on longer sentences. | Stochastic Lexicalized Inversion Transduction Grammar for Alignment Hao Zhang and Daniel Gildea Computer Science Department University of Rochester Rochester NY 14627 Abstract We present a version of Inversion Transduction Grammar where rule probabilities are lexicalized throughout the synchronous parse tree along with pruning techniques for efficient training. Alignment results improve over unlexicalized ITG on short sentences for which full EM is feasible but pruning seems to have a negative impact on longer sentences. 1 Introduction The Inversion Transduction Grammar ITG of Wu 1997 is a syntactically motivated algorithm for producing word-level alignments of pairs of transla-tionally equivalent sentences in two languages. The algorithm builds a synchronous parse tree for both sentences and assumes that the trees have the same underlying structure but that the ordering of constituents may differ in the two languages. This probabilistic syntax-based approach has inspired much subsequent reasearch. Alshawi et al. 2000 use hierarchical finite-state transducers. In the tree-to-string model of Yamada and Knight 2001 a parse tree for one sentence of a translation pair is projected onto the other string. Melamed 2003 presents algorithms for synchronous parsing with more complex grammars discussing how to parse grammars with greater than binary branching and lexicalization of synchronous grammars. Despite being one of the earliest probabilistic syntax-based translation models ITG remains state-of-the art. Zens and Ney 2003 found that the constraints of ITG were a better match to the decoding task than the heuristics used in the IBM decoder of Berger et al. 1996 . Zhang and Gildea 2004 found ITG to outperform the tree-to-string model for word-level alignment as measured against human gold-standard alignments. One explanation for this result is that while a tree representation is helpful for modeling translation the trees assigned by the traditional monolingual parsers and

TÀI LIỆU MỚI ĐĂNG
19    228    0    26-04-2024
22    119    0    26-04-2024
41    118    0    26-04-2024
5    100    0    26-04-2024
28    112    0    26-04-2024
380    91    0    26-04-2024
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.