tailieunhanh - Báo cáo khoa học: "Latent Class Transliteration based on Source Language Origin"
Transliteration, a rich source of proper noun spelling variations, is usually recognized by phonetic- or spelling-based models. However, a single model cannot deal with different words from different language origins, ., “get” in “piaget” and “target.” Li et al. (2007) propose a method which explicitly models and classifies the source language origins and switches transliteration models accordingly. This model, however, requires an explicitly tagged training set with language origins. . | Latent Class Transliteration based on Source Language Origin Masato Hagiwara Satoshi Sekine Rakuten Institute of Technology New York Rakuten Institute of Technology New York 215 Park Avenue South New York NY 215 Park Avenue South New York NY Abstract Transliteration a rich source of proper noun spelling variations is usually recognized by phonetic- or spelling-based models. However a single model cannot deal with different words from different language origins . get in piaget and target. Li et al. 2007 propose a method which explicitly models and classifies the source language origins and switches transliteration models accordingly. This model however requires an explicitly tagged training set with language origins. We propose a novel method which models language origins as latent classes. The parameters are learned from a set of transliterated word pairs via the EM algorithm. The experimental results of the transliteration task of Western names to Japanese show that the proposed model can achieve higher accuracy compared to the conventional models without latent classes. 1 Introduction Transliteration . 5ỹ j 5 baraku obama Barak Obama is phonetic translation between languages with different writing systems. Words are often transliterated when imported into differet languages which is a major cause of spelling variations of proper nouns in Japanese and many other languages. Accurate transliteration is also the key to robust machine translation systems. Phonetic-based rewriting models Knight and Jonathan 1998 and spelling-based supervised models Brill and Moore 2000 have been proposed for 53 recognizing word-to-word transliteration correspondence. These methods usually learn a single model given a training set. However single models cannot deal with words from multiple language origins. For example the get parts in piaget t T dx. piajé French origin and target ỹ ỹyb tagetto English origin may .
đang nạp các trang xem trước