tailieunhanh - Báo cáo khoa học: "A Discriminative Latent Variable Model for Statistical Machine Translation"

Large-scale discriminative machine translation promises to further the state-of-the-art, but has failed to deliver convincing gains over current heuristic frequency count systems. We argue that a principle reason for this failure is not dealing with multiple, equivalent translations. We present a translation model which models derivations as a latent variable, in both training and decoding, and is fully discriminative and globally optimised. Results show that accounting for multiple derivations does indeed improve performance. Additionally, we show that regularisation is essential for maximum conditional likelihood models in order to avoid degenerate solutions. . | A Discriminative Latent Variable Model for Statistical Machine Translation Phil Blunsom Trevor Cohn and Miles Osborne School of Informatics University of Edinburgh 2 Buccleuch Place Edinburgh Eh8 9LW uK pblunsom tcohn miles @ Abstract Large-scale discriminative machine translation promises to further the state-of-the-art but has failed to deliver convincing gains over current heuristic frequency count systems. We argue that a principle reason for this failure is not dealing with multiple equivalent translations. We present a translation model which models derivations as a latent variable in both training and decoding and is fully discriminative and globally optimised. Results show that accounting for multiple derivations does indeed improve performance. Additionally we show that regularisation is essential for maximum conditional likelihood models in order to avoid degenerate solutions. 1 Introduction Statistical machine translation SMT has seen a resurgence in popularity in recent years with progress being driven by a move to phrase-based and syntax-inspired approaches. Progress within these approaches however has been less dramatic. We believe this is because these frequency count based1 models cannot easily incorporate non-independent and overlapping features which are extremely useful in describing the translation process. Discriminative models of translation can include such features without making assumptions of independence or explicitly modelling their interdependence. However while discriminative models promise much they have not been shown to deliver significant gains 1We class approaches using minimum error rate training Och 2003 frequency count based as these systems re-scale a handful of generative features estimated from frequency counts and do not support large sets of non-independent features. over their simpler cousins. We argue that this is due to a number of inherent problems that discriminative models for SMT must address in .