tailieunhanh - Báo cáo khoa học: "Homophones and Tonal Patterns in English-Chinese Transliteration"
The abundance of homophones in Chinese significantly increases the number of similarly acceptable candidates in English-to-Chinese transliteration (E2C). The dialectal factor also leads to different transliteration practice. We compare E2C between Mandarin Chinese and Cantonese, and report work in progress for dealing with homophones and tonal patterns despite potential skewed distributions of individual Chinese characters in the training data. | Homophones and Tonal Patterns in English-Chinese Transliteration Oi Yee Kwong Department of Chinese Translation and Linguistics City University of Hong Kong Tat Chee Avenue Kowloon Hong Kong Abstract The abundance of homophones in Chinese significantly increases the number of similarly acceptable candidates in English-to-Chinese transliteration E2C . The dialectal factor also leads to different transliteration practice. We compare E2C between Mandarin Chinese and Cantonese and report work in progress for dealing with homophones and tonal patterns despite potential skewed distributions of individual Chinese characters in the training data. 1 Introduction This paper addresses the problem of automatic English-Chinese forward transliteration referred to as E2C hereafter . There are only a few hundred Chinese characters commonly used in names but their combination is relatively free. Such flexibility however is not entirely ungoverned. For instance while the Brazilian striker Ronaldo is rendered as long5-naa4-dou6 in Cantonese other phonetically similar candidates like BJ P long5-naa4-dou6 or Ê u long4-naa4-doui are least likely. Beyond linguistic and phonetic properties many other social and cognitive factors such as dialect gender domain meaning and perception are simultaneously influencing the naming process and superimposing on the surface graphemic correspondence. The abundance of homophones in Chinese further complicates the problem. Past studies on phoneme-based E2C have reported their adverse effects . Virga and Khudanpur 2003 . Direct orthographic mapping . Li et al. 2004 making use of individual Chinese graphemes tends 1 Mandarin names are transcribed in Hanyu Pinyin and Cantonese names are transcribed in Jyutping published by the Linguistic Society of Hong Kong. to overcome the problem and model the character choice directly. Meanwhile Chinese is a typical tonal language and the tone information can help distinguish certain .
đang nạp các trang xem trước