tailieunhanh - Báo cáo khoa học: "Learning Non-Isomorphic Tree Mappings for Machine Translation"

Often one may wish to learn a tree-to-tree mapping, training it on unaligned pairs of trees, or on a mixture of trees and strings. Unlike previous statistical formalisms (limited to isomorphic trees), synchronous TSG allows local distortion of the tree topology. We reformulate it to permit dependency trees, and sketch EM/Viterbi algorithms for alignment, training, and decoding. | Learning Non-Isomorphic Tree Mappings for Machine Translation Jason Eisner Computer Science Dept. Johns Hopkins Univ. jason@ Abstract Often one may wish to learn a tree-to-tree mapping training it on unaligned pairs of trees or on a mixture of trees and strings. Unlike previous statistical formalisms limited to isomorphic trees synchronous TSG allows local distortion of the tree topology. We reformulate it to permit dependency trees and sketch EM Viterbi algorithms for alignment training and decoding. 1 Introduction Tree-to-Tree Mappings Statistical machine translation systems are trained on pairs of sentences that are mutual translations. For example beaucoup d enfants donnent un baiser a Sam kids kiss Sam quite often . This translation is somewhat free as is common in naturally occurring data. The first sentence is literally Lots of children give a kiss to Sam. This short paper outlines natural formalisms and algorithms for training on pairs of trees. Our methods work on either dependency trees as shown or phrase-structure trees. Note that the depicted trees are not isomorphic. donnent ỵ balser a beaucoup un Sam Xd enfants Our main concern is to develop models that can align and learn from these tree pairs despite the mismatches in tree structure. Many mismatches are characteristic of a language pair . preposition insertion of e multiword locutions kiss give a kiss to misinform wrongly inform and head-swapping float down descend by floating . Such systematic mismatches should be learned by the model and used during translation. It is even helpful to learn mismatches that merely tend to arise during free translation. Knowing that beaucoup d is often deleted will help in aligning the rest of the tree. When would learned tree-to-tree mappings be useful Obviously in MT when one has parsers for both the source and target language. Systems for deep analysis and generation might wish to learn mappings between deep and surface trees Bohmova et al. 2001 or .

TÀI LIỆU LIÊN QUAN