tailieunhanh - Báo cáo khoa học: "Using Structural Information for Identifying Similar Chinese Characters"

Chinese characters that are similar in their pronunciations or in their internal structures are useful for computer-assisted language learning and for psycholinguistic studies. Although it is possible for us to employ imagebased methods to identify visually similar characters, the resulting computational costs can be very high. We propose methods for identifying visually similar Chinese characters by adopting and extending the basic concepts of a proven Chinese input method--Cangjie. We present the methods, illustrate how they work, and discuss their weakness in this paper. . | Using Structural Information for Identifying Similar Chinese Characters Chao-Lin Liu Jen-Hsiang Lin Department of Computer Science National Chengchi University Taipei 11605 Taiwan chaolin g9429 @ Abstract Chinese characters that are similar in their pronunciations or in their internal structures are useful for computer-assisted language learning and for psycholinguistic studies. Although it is possible for us to employ imagebased methods to identify visually similar characters the resulting computational costs can be very high. We propose methods for identifying visually similar Chinese characters by adopting and extending the basic concepts of a proven Chinese input method--Cangjie. We present the methods illustrate how they work and discuss their weakness in this paper. 1 Introduction A Chinese sentence consists of a sequence of characters that are not separated by spaces. The function of a Chinese character is not exactly the same as the function of an English word. Normally two or more Chinese characters form a Chinese word to carry a meaning although there are Chinese words that contain only one Chinese character. For instance a translation for conference is Maf and a translation for go is . Here Maf is a word formed by three characters and is a word with only one character. Just like that there are English words that are spelled similarly there are Chinese characters that are pronounced or written alike. For instance in English the sentence John plays an important roll in this event. contains an incorrect word. We should replace roll with role . In Chinese the sentence 4Ấ flẬU IM contains an incorrect word. We should replace M a place for taking examinations with a market . These two words have the same pronunciation shi 4 chang 3 and both represent locations. The sentence S M I tf M also con- t We use Arabic digits to denote the four tones in Mandarin. tains an error and we need to replace MW with MW . MW is considered an incorrect word but can