tailieunhanh - Báo cáo khoa học: "Using Word Support Model to Improve Chinese Input System"

This paper presents a word support model (WSM). The WSM can effectively perform homophone selection and syllable-word segmentation to improve Chinese input systems. The experimental results show that: (1) the WSM is able to achieve tonal (syllables input with four tones) and toneless (syllables input without four tones) syllable-to-word (STW) accuracies of 99% and 92%, respectively, among the converted words; and (2) while applying the WSM as an adaptation processing, together with the Microsoft Input Method Editor 2003 (MSIME) and an optimized bigram model, the average tonal and toneless STW improvements are 37% and 35%, respectively. . | Using Word Support Model to Improve Chinese Input System Jia-Lin Tsai Tung Nan Institute of Technology Department of Information Management Taipei 222 Taiwan tsaijl@ Abstract This paper presents a word support model WSM . The WSM can effectively perform homophone selection and syllable-word segmentation to improve Chinese input systems. The experimental results show that 1 the WSM is able to achieve tonal syllables input with four tones and toneless syllables input without four tones syllable-to-word STW accuracies of 99 and 92 respectively among the converted words and 2 while applying the WSM as an adaptation processing together with the Microsoft Input Method Editor 2003 MSIME and an optimized bigram model the average tonal and toneless STW improvements are 37 and 35 respectively. 1 Introduction According to Becker 1985 Huang 1985 Gu et al. 1991 Chung 1993 Kuo 1995 Fu et al. 1996 Lee et al. 1997 Hsu et al. 1999 Chen et al. 2000 Tsai and Hsu 2002 Gao et al. 2002 Lee 2003 Tsai 2005 the approaches of Chinese input methods . Chinese input systems can be classified into two types 1 keyboard based approach including phonetic and pinyin based Chang et al. 1991 Hsu et al. 1993 Hsu 1994 Hsu et al. 1999 Kuo 1995 Lua and Gan 1992 arbitrary codes based Fan et al. 1988 and structure scheme based Huang 1985 and 2 non-keyboard based approach including optical character recognition OCR Chung 1993 online handwriting Lee et al. 1997 and speech recognition Fu et al. 1996 Chen et al. 2000 . Currently the most popular Chinese input system is phonetic and pinyin based approach because Chinese people are taught to write phonetic and pinyin syllables of each Chinese character in primary school. In Chinese each Chinese word can be a mono-syllabic word such as M mouse a bi-syllabic word such as M kangaroo or a multi-syllabic word such as M Mickey mouse . The corresponding phonetic and pinyin syllables of each Chinese word is called syllable-words such as dai4 shu3 is .

TÀI LIỆU LIÊN QUAN