tailieunhanh - Báo cáo khoa học: "Variational Decoding for Statistical Machine Translation"
Statistical models in machine translation exhibit spurious ambiguity. That is, the probability of an output string is split among many distinct derivations (., trees or segmentations). In principle, the goodness of a string is measured by the total probability of its many derivations. However, finding the best string (., during decoding) is then computationally intractable. Therefore, most systems use a simple Viterbi approximation that measures the goodness of a string using only its most probable derivation. Instead, we develop a variational approximation, which considers all the derivations but still allows tractable decoding. . | Variational Decoding for Statistical Machine Translation Zhifei Li and Jason Eisner and Sanjeev Khudanpur Department of Computer Science and Center for Language and Speech Processing Johns Hopkins University Baltimore MD 21218 USA jason@ khudanpur@ Abstract Statistical models in machine translation exhibit spurious ambiguity. That is the probability of an output string is split among many distinct derivations . trees or segmentations . In principle the goodness of a string is measured by the total probability of its many derivations. However finding the best string . during decoding is then computationally intractable. Therefore most systems use a simple Viterbi approximation that measures the goodness of a string using only its most probable derivation. Instead we develop a variational approximation which considers all the derivations but still allows tractable decoding. Our particular variational distributions are parameterized as n-gram models. We also analytically show that interpolating these n-gram models for different n is similar to minimumrisk decoding for BLEU Tromble et al. 2008 . Experiments show that our approach improves the state of the art. 1 Introduction Ambiguity is a central issue in natural language processing. Many systems try to resolve ambiguities in the input for example by tagging words with their senses or choosing a particular syntax tree for a sentence. These systems are designed to recover the values of interesting latent variables such as word senses syntax trees or translations given the observed input. However some systems resolve too many ambiguities. They recover additional latent variables so-called nuisance variables that are not of interest to the For example though machine translation MT seeks to output a string typical MT systems Koehn et al. 2003 Chiang 2007 1These nuisance variables may be annotated in training data but it is more common for them to be latent even there .
đang nạp các trang xem trước