tailieunhanh - Báo cáo khoa học: "Convolution Kernels with Feature Selection for Natural Language Processing Tasks"
Convolution kernels, such as sequence and tree kernels, are advantageous for both the concept and accuracy of many natural language processing (NLP) tasks. Experiments have, however, shown that the over-fitting problem often arises when these kernels are used in NLP tasks. This paper discusses this issue of convolution kernels, and then proposes a new approach based on statistical feature selection that avoids this issue. To enable the proposed method to be executed efficiently, it is embedded into an original kernel calculation process by using sub-structure mining algorithms. . | Convolution Kernels with Feature Selection for Natural Language Processing Tasks Jun Suzuki Hideki Isozaki and Eisaku Maeda NTT Communication Science Laboratories NTT Corp. 2-4 Hikaridai Seika-cho Soraku-gun Kyoto 619-0237 Japan jun isozaki maedag@ Abstract Convolution kernels such as sequence and tree kernels are advantageous for both the concept and accuracy of many natural language processing NLP tasks. Experiments have however shown that the over-fitting problem often arises when these kernels are used in NLP tasks. This paper discusses this issue of convolution kernels and then proposes a new approach based on statistical feature selection that avoids this issue. To enable the proposed method to be executed efficiently it is embedded into an original kernel calculation process by using sub-structure mining algorithms. Experiments are undertaken on real NLP tasks to confirm the problem with a conventional method and to compare its performance with that of the proposed method. 1 Introduction Over the past few years many machine learning methods have been successfully applied to tasks in natural language processing NLP . Especially state-of-the-art performance can be achieved with kernel methods such as Support Vector Machine Cortes and Vapnik 1995 . Examples include text categorization Joachims 1998 chunking Kudo and Matsumoto 2002 and parsing Collins and Duffy 2001 . Another feature of this kernel methodology is that it not only provides high accuracy but also allows us to design a kernel function suited to modeling the task at hand. Since natural language data take the form of sequences of words and are generally analyzed using discrete structures such as trees parsed trees and graphs relational graphs discrete kernels such as sequence kernels Lodhi et al. 2002 tree kernels Collins and Duffy 2001 and graph kernels Suzuki et al. 2003a have been shown to offer excellent results. These discrete kernels are related to convolution kernels .
đang nạp các trang xem trước