tailieunhanh - Báo cáo khoa học: "Jointly optimizing a two-step conditional random field model for machine transliteration and its fast decoding algorithm"

This paper presents a joint optimization method of a two-step conditional random field (CRF) model for machine transliteration and a fast decoding algorithm for the proposed method. Our method lies in the category of direct orthographical mapping (DOM) between two languages without using any intermediate phonemic mapping. In the two-step CRF model, the first CRF segments an input word into chunks and the second one converts each chunk into one unit in the target language. In this paper, we propose a method to jointly optimize the two-step CRFs and also a fast algorithm to realize it. . | Jointly optimizing a two-step conditional random field model for machine transliteration and its fast decoding algorithm Dong Yang Paul Dixon and Sadaoki Furui Department of Computer Science Tokyo Institute of Technology Tokyo 152-8552 Japan raymond dixonp furui @ Abstract This paper presents a joint optimization method of a two-step conditional random field CRF model for machine transliteration and a fast decoding algorithm for the proposed method. Our method lies in the category of direct orthographical mapping DOM between two languages without using any intermediate phonemic mapping. In the two-step CRF model the first CRF segments an input word into chunks and the second one converts each chunk into one unit in the target language. In this paper we propose a method to jointly optimize the two-step CRFs and also a fast algorithm to realize it. Our experiments show that the proposed method outperforms the well-known joint source channel model JSCM and our proposed fast algorithm decreases the decoding time significantly. Furthermore combination of the proposed method and the JSCM gives further improvement which outperforms state-of-the-art results in terms of top-1 accuracy. 1 Introduction There are more than 6000 languages in the world and 10 languages of them have more than 100 million native speakers. With the information revolution and globalization systems that support multiple language processing and spoken language translation become urgent demands. The translation of named entities from alphabetic to syllabary language is usually performed through transliteration which tries to preserve the pronunciation in the original language. For example in Chinese foreign words are written with Chinese characters in Japanese foreign words are usually written with special char- Source Name Target Name Note ĩ Ơ tb guu gu ru English-to-Chinese Chinese Romanized writing English-to-Japanese Japanese Romanized writing Figure 1 Transliteration examples

TỪ KHÓA LIÊN QUAN