tailieunhanh - Báo cáo khoa học: "The State of the Art in Thai Language Processing"

1 Some Problematic Issues in the Thai Processing It is obvious that the most fundamental semantic unit in a language is the word. Words are explicitly identified in those languages with word boundaries. In Thai, there is no word boundary. Thai words are implicitly recognized and in many cases, they depend on the individual judgement. This causes a lot of difficulties in the Thai language processing. | The State of the Art in Thai Language Processing Virach Sornlertlamvanich Tanapong Potipiti Chai Wutiwiwatchai and Pradit Mittrapiyanuruk National Electronics and Computer Technology Center NECTEC National Science and Technology Development Agency Ministry of Science and Technology Environment. 22nd Floor Gypsum Metropolitan Tower 539 2 Sriayudhya Rd. Rajthevi Bangkok 10400 Thailand. Email virach tanapong chai @ pmittrap@ Abstract This paper reviews the current state of technology and research progress in the Thai language processing. It resumes the characteristics of the Thai language and the approaches to overcome the difficulties in each processing task. 1 Some Problematic Issues in the Thai Processing It is obvious that the most fundamental semantic unit in a language is the word. Words are explicitly identified in those languages with word boundaries. In Thai there is no word boundary. Thai words are implicitly recognized and in many cases they depend on the individual judgement. This causes a lot of difficulties in the Thai language processing. To illustrate the problem we employed a classic English example. The segmentation of GODISNOWHERE . No. Segmentation Meaning 1 God is now here. God is here. 2 God is no where. God doesn t exist. 3 God is nowhere. God doesn t exist. With the different segmentations 1 and 2 have absolutely opposite meanings. 2 and 3 are ambiguous that nowhere is one word or two words. And the difficulty becomes greatly aggravated when unknown words exist. As a tonal language a phoneme with different tone has different meaning. Many unique approaches are introduced for both the tone generation in speech synthesis research and tone recognition in speech recognition research. These difficulties propagate to many levels in the language processing area such as lexical acquisition information retrieval machine translation speech processing etc. Furthermore the similar problem also occurs in the levels of sentence

TỪ KHÓA LIÊN QUAN