tailieunhanh - Báo cáo khoa học: "A New Statistical Approach to Chinese Pinyin Input"

Chinese input is one of the key challenges for Chinese PC users. This paper proposes a statistical approach to Pinyin-based Chinese input. This approach uses a trigram-based language model and a statistically based segmentation. Also, to deal with real input, it also includes a typing model which enables spelling correction in sentence-based Pinyin input, and a spelling model for English which enables modeless Pinyin input. | A New Statistical Approach to Chinese Pinyin Input Zheng Chen Microsoft Research China No. 49 Zhichun Road Haidian District 100080 China zhengc@ Abstract Chinese input is one of the key challenges for Chinese PC users. This paper proposes a statistical approach to Pinyin-based Chinese input. This approach uses a trigram-based language model and a statistically based segmentation. Also to deal with real input it also includes a typing model which enables spelling correction in sentence-based Pinyin input and a spelling model for English which enables modeless Pinyin input. 1. Introduction Chinese input method is one of the most difficult problems for Chinese PC users. There are two main categories of Chinese input method. One is shape-based input method such as wu bi zi xing the other is Pinyin or pronunciation-based input method such as Chinese CStar MSPY etc. Because of its facility to learn and to use Pinyin is the most popular Chinese input method. Over 97 of the users in China use Pinyin for input Chen Yuan 1997 . Although Pinyin input method has so many advantages it also suffers from several problems including Pinyin-to-characters conversion errors user typing errors and UI problem such as the need of two separate mode while typing Chinese and English etc. Pinyin-based method automatically converts Pinyin to Chinese characters. But there are only about 406 syllables they correspond to over 6000 common Chinese characters. So it is very difficult for system to select the correct corresponding Chinese characters automatically. A higher accuracy Kai-Fu Lee Microsoft Research China No. 49 Zhichun Road Haidian District 100080 China kfl@ may be achieved using a sentence-based input. Sentence-based input method chooses character by using a language model base on context. So its accuracy is higher than wordbased input method. In this paper all the technology is based on sentence-based input method but it can easily adapted to word-input .

TỪ KHÓA LIÊN QUAN