Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Lightly Supervised Transliteration for Machine Translation"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We present a Hebrew to English transliteration method in the context of a machine translation system. Our method uses machine learning to determine which terms are to be transliterated rather than translated. The training corpus for this purpose includes only positive examples, acquired semi-automatically. Our classifier reduces more than 38% of the errors made by a baseline method. The identified terms are then transliterated. We present an SMTbased transliteration model trained with a parallel corpus extracted from Wikipedia using a fairly simple method which requires minimal knowledge. . | Lightly Supervised Transliteration for Machine Translation Amit Kirschenbaum Department of Computer Science University of Haifa 31905 Haifa Israel akirsche@cs.haifa.ac.il Shuly Wintner Department of Computer Science University of Haifa 31905 Haifa Israel shuly@cs.haifa.ac.il Abstract We present a Hebrew to English transliteration method in the context of a machine translation system. Our method uses machine learning to determine which terms are to be transliterated rather than translated. The training corpus for this purpose includes only positive examples acquired semi-automatically. Our classifier reduces more than 38 of the errors made by a baseline method. The identified terms are then transliterated. We present an SMT-based transliteration model trained with a parallel corpus extracted from Wikipedia using a fairly simple method which requires minimal knowledge. The correct result is produced in more than 76 of the cases and in 92 of the instances it is one of the top-5 results. We also demonstrate a small improvement in the performance of a Hebrew-to-English MT system that uses our transliteration module. 1 Introduction Transliteration is the process of converting terms written in one language into their approximate spelling or phonetic equivalents in another language. Transliteration is defined for a pair of languages a source language and a target language. The two languages may differ in their script systems and phonetic inventories. This paper addresses transliteration from Hebrew to English as part of a machine translation system. Transliteration of terms from Hebrew into English is a hard task for the most part because of the differences in the phonological and orthographic systems of the two languages. On the one hand there are cases where a Hebrew letter can be pronounced in multiple ways. For example Hebrew n can be pronounced either as b or as v . On the other hand two different Hebrew sounds can be mapped into the same English letter. For example .