tailieunhanh - Báo cáo khoa học: "Continuous Space Language Models for Statistical Machine Translation"

Statistical machine translation systems are based on one or more translation models and a language model of the target language. While many different translation models and phrase extraction algorithms have been proposed, a standard word n-gram back-off language model is used in most systems. In this work, we propose to use a new statistical language model that is based on a continuous representation of the words in the vocabulary. A neural network is used to perform the projection and the probability estimation. . | Continuous Space Language Models for Statistical Machine Translation Holger Schwenk and Daniel Dchelotte and Jean-Luc Gauvain LIMSI-CNRS BP 133 91403 Orsay cedex FRANCE schwenk dechelot gauvain @ Abstract Statistical machine translation systems are based on one or more translation models and a language model of the target language. While many different translation models and phrase extraction algorithms have been proposed a standard word n-gram back-off language model is used in most systems. In this work we propose to use a new statistical language model that is based on a continuous representation of the words in the vocabulary. A neural network is used to perform the projection and the probability estimation. We consider the translation of European Parliament Speeches. This task is part of an international evaluation organized by the Tc-Star project in 2006. The proposed method achieves consistent improvements in the BLEU score on the development and test data. We also present algorithms to improve the estimation of the language model probabilities when splitting long sentences into shorter chunks. 1 Introduction The goal of statistical machine translation SMT is to produce a target sentence e from a source sentence f. Among all possible target sentences the one with maximal probability is chosen. The classical Bayes relation is used to introduce a target language model Brown et al. 1993 e argmaxPr e f argmaxPr f e Pr e where Pr f e is the translation model and Pr e is the target language model. This approach is usually referred to as the noisy source-channel approach in statistical machine translation. Since the introduction of this basic model many improvements have been made but it seems that research is mainly focused on better translation and alignment models or phrase extraction algorithms as demonstrated by numerous publications on these topics. On the other hand we are aware of only a small amount of papers investigating new approaches to .