tailieunhanh - Báo cáo khoa học: "Constructing Transliteration Lexicons from Web Corpora"

This paper proposes a novel approach to automating the construction of transliterated-term lexicons. A simple syllable alignment algorithm is used to construct confusion matrices for cross-language syllable-phoneme conversion. Each row in the confusion matrix consists of a set of syllables in the source language that are (correctly or erroneously) matched phonetically and statistically to a syllable in the target language. | Constructing Transliteration Lexicons from Web Corpora Jin-Shea Kuo1 2 1Chung-Hwa Telecommunication Laboratories Taiwan R. O. C. 326 jskuo@ Abstract This paper proposes a novel approach to automating the construction of transliterated-term lexicons. A simple syllable alignment algorithm is used to construct confusion matrices for cross-language syllable-phoneme conversion. Each row in the confusion matrix consists of a set of syllables in the source language that are correctly or erroneously matched phonetically and statistically to a syllable in the target language. Two conversions using phoneme-to-phoneme and text-to-phoneme syllabification algorithms are automatically deduced from a training corpus of paired terms and are used to calculate the degree of similarity between phonemes for transliterated-term extraction. In a large-scale experiment using this automated learning process for conversions more than 200 000 transliterated-term pairs were successfully extracted by analyzing query results from Internet search engines. Experimental results indicate the proposed approach shows promise in transliterated-term extraction. 1 Introduction Machine transliteration plays an important role in machine translation. The importance of term transliteration can be realized from our analysis of the terms used in 200 qualifying sentences that were randomly selected from English-Chinese mixed news pages. Each qualifying sentence contained at least one English word. Analysis showed that of the English terms were transliterated and that most of them were content words words that carry essential meaning as opposed to grammatical function words such as conjunctions prepositions and auxiliary verbs . In general a transliteration process starts by first examining a pre-compiled lexicon which contains many transliterated-term pairs collected manually or automatically. If a term is not found in the lexicon the transliteration system then deals with this .

TÀI LIỆU MỚI ĐĂNG
20    232    3    02-07-2024
15    142    0    02-07-2024