tailieunhanh - Báo cáo khoa học: "Decoding Algorithm in Statistical Machine Translation"

Decoding algorithm is a crucial part in statistical machine translation. We describe a stack decoding algorithm in this paper. We present the hypothesis scoring method and the heuristics used in our algorithm. We report several techniques deployed to improve the performance of the decoder. We also introduce a simplified model to moderate the sparse data problem and to speed up the decoding process. We evaluate and compare these techniques/models in our statistical machine translation system. | Decoding Algorithm in Statistical Machine Translation Ye-Yi Wang and Alex Waibel Language Technology Institute School of Computer Science Carnegie Mellon University 5000 Forbes Avenue Pittsburgh PA 15213 USA yyw waibel @ Abstract Decoding algorithm is a crucial part in statistical machine translation. We describe a stack decoding algorithm in this paper. We present the hypothesis scoring method and the heuristics used in our algorithm. We report several techniques deployed to improve the performance of the decoder. We also introduce a simplified model to moderate the sparse data problem and to speed up the decoding process. We evaluate and compare these techniques models in our statistical machine translation system. 1 Introduction Statistical Machine Translation Statistical machine translation is based on a channel model. Given a sentence T in one language German to be translated into another language English it considers T as the target of a communication channel and its translation s as the source of the channel. Hence the machine translation task becomes to recover the source from the target. Basically every English sentence is a possible source for a German target sentence. If we assign a probability P S I T to each pair of sentences S T then the problem of translation is to find the source s for a given target T such that P S I T is the maximum. According to Bayes rule p siT ia 1 1 Since the denominator is independent of s we have s argmaxP S P T I S 2 s Therefore a statistical machine translation system must deal with the following three problems Modeling Problem How to depict the process of generating a sentence in a source language and the process used by a channel to generate a target sentence upon receiving a source sentence The former is the problem of language modeling and the later is the problem of translation modeling. They provide a framework for calculating P S and P T I S in 2 . Learning Problem Given a statistical language model P

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN