tailieunhanh - Báo cáo khoa học: "Modeling with Structures in Statistical Machine Translation"

Most statistical machine translation systems employ a word-based alignment model. In this paper we demonstrate that word-based alignment is a major cause of translation errors. We propose a new alignment model based on shallow phrase structures, and the structures can be automatically acquired from parallel corpus. This new model achieved over 10% error reduction for our spoken language translation task. | Modeling with Structures in Statistical Machine Translation Ye-Yi Wang and Alex Waibel School of Computer Science Carnegie Mellon University 5000 Forbes Avenue Pittsburgh PA 15213 USA yyw waibel Abstract Most statistical machine translation systems employ a word-based alignment model. In this paper we demonstrate that word-based alignment is a major cause of translation errors. We propose a new alignment model based on shallow phrase structures and the structures can be automatically acquired from parallel corpus. This new model achieved over 10 error reduction for our spoken language translation task. 1 Introduction Most if not all statistical machine translation systems employ a word-based alignment model Brown et al. 1993 Vogel Ney and Tillman 1996 Wang and Waibel 1997 which treats words in a sentence as independent entities and ignores the structural relationship among them. While this independence assumption works well in speech recognition it poses a major problem in our experiments with spoken language translation between a language pair with very different word orders. In this paper we propose a translation model that employs shallow phrase structures. It has the following advantages over word-based alignment Since the translation model can directly depict phrase reordering in translation it is more accurate for translation between languages with different word phrase orders. The decoder of the translation system can use the phrase information and extend hypothesis by phrases multiple words therefore it can speed up decoding. The paper is organized as follows. In section 2 the problems of word-based alignment models are discussed. To alienate these problems a new alignment model based on shallow phrase structures is introduced in section 3. In section 4 a grammar inference algorithm is presented that can automatically acquire the phrase structures used in the new model. Translation performance is then evaluated in section 5 and conclusions are .

TỪ KHÓA LIÊN QUAN