tailieunhanh - Báo cáo khoa học: "Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging"
The Chinese language is characterized by the lack of formal devices such as morphological tense and number that often provide important clues for syntactic processing tasks. While state-of-theart tagging systems have achieved accuracies above 97% on English, Chinese POS tagging has proven to be more challenging and obtained accuracies about 93-94% (Tseng et al., 2005b; Huang et al., 2007, 2009; Li et al., 2011). | Capturing Paradigmatic and Syntagmatic Lexical Relations Towards Accurate Chinese Part-of-Speech Tagging Weiwei Sunl and Hans Uszkoreit Institute of Computer Science and Technology Peking University Saarbrticken Graduate School of Computer Science Department of Computational Linguistics Saarland University Language Technology Lab DFKIGmbH ws@ uszkoreit@ Abstract From the perspective of structural linguistics we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging an important and challenging task for Chinese language processing. Paradigmatic lexical relations are explicitly captured by word clustering on large-scale unlabeled data and are used to design new features to enhance a discriminative tagger. Syntagmatic lexical relations are implicitly captured by constituent parsing and are utilized via system combination. Experiments on the Penn Chinese Treebank demonstrate the importance of both paradigmatic and syntagmatic relations. Our linguistically motivated approaches yield a relative error reduction of 18 in total over a state-of-the-art baseline. 1 Introduction In grammar a part-of-speech POS is a linguistic category of words which is generally defined by the syntactic or morphological behavior of the word in question. Automatically assigning POS tags to words plays an important role in parsing word sense disambiguation as well as many other NLP applications. Many successful tagging algorithms developed for English have been applied to many other languages as well. In some cases the methods work well without large modifications such as for German. But a number of augmentations and changes become necessary when dealing with highly inflected or agglutinative languages as well as analytic languages of which Chinese is the focus This work is mainly finished when this author corresponding author was in Saarland University and DFKI. 242 of this paper. The Chinese language is characterized by the lack of formal devices such .
đang nạp các trang xem trước