tailieunhanh - Báo cáo khoa học: "Bridging Morpho-Syntactic Gap between Source and Target Sentences for English-Korean Statistical Machine Translation"

Often, Statistical Machine Translation (SMT) between English and Korean suffers from null alignment. Previous studies have attempted to resolve this problem by removing unnecessary function words, or by reordering source sentences. However, the removal of function words can cause a serious loss in information. In this paper, we present a possible method of bridging the morpho-syntactic gap for EnglishKorean SMT. | Bridging Morpho-Syntactic Gap between Source and Target Sentences for English-Korean Statistical Machine Translation Gumwon Hong Seung-Wook Lee and Hae-Chang Rim Department of Computer Science Engineering Korea University Seoul 136-713 Korea gwhong swlee rim @ Abstract Often Statistical Machine Translation SMT between English and Korean suffers from null alignment. Previous studies have attempted to resolve this problem by removing unnecessary function words or by reordering source sentences. However the removal of function words can cause a serious loss in information. In this paper we present a possible method of bridging the morpho-syntactic gap for English-Korean SMT. In particular the proposed method tries to transform a source sentence by inserting pseudo words and by reordering the sentence in such a way that both sentences have a similar length and word order. The proposed method achieves increase in BLEU score over baseline phrase-based system. 1 Introduction Phrase-based SMT models have performed reasonably well on languages where the syntactic structures are very similar including languages such as French and English. However Collins et al. 2005 demonstrated that phrase-based models have limited potential when applied to languages that have a relatively different word order such is the case between German and English. They proposed a clause restructuring method for reordering German sentences in order to resemble the order of English sentences. By modifying the source sentence structure into the target sentence structure they argued that they could solve the decoding problem by use of completely monotonic translation. The translation from English to Korean can be more difficult than the translation of other language pairs for the following reasons First Korean is language isolate that is it has little ge nealogical relations with other natural Second the word order in Korean is relatively free because the functional .

TÀI LIỆU LIÊN QUAN