tailieunhanh - Báo cáo khoa học: "Categorial Fluidity in Chinese and its Implications for Part-of-speech Tagging"

This paper discusses the theoretical and practical concerns in part-of-speech (POS) tagging for Chinese. Unlike other languages such as English, Chinese lacks morphological marking in association with categorial alternations. We consider such categorial fluidity a continuum, and any categorial shift a transition, with special focus on the verb-noun shift. Preliminary observations are reported on this phenomenon from empirical data, and we suggest that POS tagging should not only be theoretically valid but also sufficiently capture the extent of categorial fluidity as reflected by the data. . | Categorial Fluidity in Chinese and its Implications for Part-of-speech Tagging Oi Yee Kwong Benjamin K. Tsou Language Information Sciences Research Centre City University of Hong Kong Kowloon Hong Kong rlolivia rlbtsou @ Abstract This paper discusses the theoretical and practical concerns in part-of-speech POS tagging for Chinese. Unlike other languages such as English Chinese lacks morphological marking in association with categorial alternations. We consider such categorial fluidity a continuum and any categorial shift a transition with special focus on the verb-noun shift. Preliminary observations are reported on this phenomenon from empirical data and we suggest that POS tagging should not only be theoretically valid but also sufficiently capture the extent of categorial fluidity as reflected by the data. 1 Introduction There are currently a number of POS-tagged Chinese corpora available based on different tagsets and theoretical frameworks. Some are more semantic-oriented in determining the syntactic category of a word. For instance CKIP 1993 had a very fine-grained classification of Chinese verbs based on thematic structures. Others are mainly based on syntactic distribution . Yu et al. 1998 Xia 2000 where POS tags are assigned mainly depending on the syntactic properties of the target words. Given the many-to-many relation between grammatical function and lexical category in Chinese it is often not straightforward as to how certain words should be tagged in particular sentences. For example should the verb in nt huai2ỵi2 to suspect be given the same tag in all cases in I 1 1 The digits following the Hanyu pinyin indicate the tone. 1 a. OcIsgSSffliA fejc I suspect he is a thief wo3 huai2yi2 tai shi4 i 2 b. WỈ fêỊllSỈS IW He wears a suspicious look tai man3lian3 huai2yi2 biao3qing2 c. This is only my suspicion zhi3shi4 wo3 de5 huai2yi2 Most Chinese grammarians would suggest that Chinese words have predefined lexical categories mainly .

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN