tailieunhanh - Báo cáo khoa học: "Large Scale Collocation Data and Their Application to Japanese Word Processor Technology"

Word processors or computers used in Japan employ Japanese input method through keyboard stroke combined with Kana (phonetic) character to Kanji (ideographic, Chinese) character conversion technology. The key factor of Kana-to-Kanji conversion technology is how to raise the accuracy of the conversion through the homophone processing, since we have so many homophonic Kanjis. In this paper, we report the results of our Kana-to-Kanji conversion experiments which embody the homophone processing based on large scale collocation data. . | Large Scale Collocation Data and Their Application to Japanese Word Processor Technology Yasuo Koyama Masako Yasutake Kenji Yoshimura and Kosho Shudo Institute for Information and Control Systems Fukuoka University Nanakuma Fukuoka 814-0180 Japan koyama@ yasutake@ yosimura@ shudo@ abstract Word processors or computers used in Japan employ Japanese input method through keyboard stroke combined with Kana phonetic character to Kanji ideographic Chinese character conversion technology. The key factor of Kana-to-Kanji conversion technology is how to raise the accuracy of the conversion through the homophone processing since we have so many homophonic Kanjis. In this paper we report the results of our Kana-to-Kanji conversion experiments which embody the homophone processing based on large scale collocation data. It is shown that approximately 135 000 collocations yield raise of the conversion accuracy compared with the prototype system which has no collocation data. 1. Introduction Word processors or computers used in Japan ordinarily employ Japanese input method through keyboard stroke combined with Kana phonetic to Kanji ideographic Chinese character conversion technology. The Kana-to-Kanji conversion is performed by the morphological analysis on the input Kana string with no space between words. Word- or phrase-segmentation is carried out by the analysis to identify the substring of the input which has to be converted from Kana to Kanji. Kana-Kanji mixed string which is the ordinary form of Japanese written text is obtained as the final result. The major issue of this technology lies in raising the accuracy of the segmentation and the homophone processing to select the correct Kanji among many homophonic candidates. The conventional methodology for processing homophones have used the function that gives the priority to the word which was used lastly or to the high frequency word. In

crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.