tailieunhanh - Báo cáo khoa học: "An Empirical Study of Chinese Chunking"

In this paper, we describe an empirical study of Chinese chunking on a corpus, which is extracted from UPENN Chinese Treebank-4 (CTB4). First, we compare the performance of the state-of-the-art machine learning models. Then we propose two approaches in order to improve the performance of Chinese chunking. 1) We propose an approach to resolve the special problems of Chinese chunking. This approach extends the chunk tags for every problem by a tag-extension function. 2) We propose two novel voting methods based on the characteristics of chunking task. Compared with traditional voting methods, the proposed voting methods consider long distance. | An Empirical Study of Chinese Chunking Wenliang Chen Yujie Zhang Hitoshi Isahara Computational Linguistics Group National Institute of Information and Communications Technology 3-5 Hikari-dai Seika-cho Soraku-gun Kyoto Japan 619-0289 chenwl yujie isahara @ Abstract In this paper we describe an empirical study of Chinese chunking on a corpus which is extracted from UPENN Chinese Treebank-4 CTB4 . First we compare the performance of the state-of-the-art machine learning models. Then we propose two approaches in order to improve the performance of Chinese chunking. 1 We propose an approach to resolve the special problems of Chinese chunking. This approach extends the chunk tags for every problem by a tag-extension function. 2 We propose two novel voting methods based on the characteristics of chunking task. Compared with traditional voting methods the proposed voting methods consider long distance information. The experimental results show that the SVMs model outperforms the other models and that our proposed approaches can improve performance significantly. 1 Introduction Chunking identifies the non-recursive cores of various types of phrases in text possibly as a precursor to full parsing or information extraction. Steven P. Abney was the first person to introduce chunks for parsing Abney 1991 . Ramshaw and Marcus Ramshaw and Marcus 1995 first represented base noun phrase recognition as a machine learning problem. In 2000 CoNLL-2000 introduced a shared task to tag many kinds of phrases besides noun phrases in English Sang and Buchholz 2000 . Additionally many machine learning approaches such as Support Vector Machines SVMs Vapnik 1995 Conditional Random Fields CRFs Lafferty et al. 2001 Memory-based Learning MBL Park and Zhang 2003 Transformation-based Learning TBL Brill 1995 and Hidden Markov Models HMMs Zhou et al. 2000 have been applied to text chunking Sang and Buchholz 2000 Hammerton et al. 2002 . Chinese chunking is a difficult task and much work has

TỪ KHÓA LIÊN QUAN