tailieunhanh - Báo cáo khoa học: "Identifying High-Impact Sub-Structures for Convolution Kernels in Document-level Sentiment Classification"
Convolution kernels support the modeling of complex syntactic information in machinelearning tasks. However, such models are highly sensitive to the type and size of syntactic structure used. It is therefore an important challenge to automatically identify high impact sub-structures relevant to a given task. | Identifying High-Impact Sub-Structures for Convolution Kernels in Document-level Sentiment Classification Zhaopeng Tu Yifan He Jennifer Foster Josef van Genabith Qun Liu Shouxun Lin Key Lab. of Intelligent Info. Processing Computer Science Department School of Computing Institute of Computing Technology CAS New York University Dublin City University i tuzhaopeng liuqun sxlin @ yhe@ 1 jfoster josef @ Abstract Convolution kernels support the modeling of complex syntactic information in machinelearning tasks. However such models are highly sensitive to the type and size of syntactic structure used. It is therefore an important challenge to automatically identify high impact sub-structures relevant to a given task. In this paper we present a systematic study investigating combinations of sequence and convolution kernels using different types of substructures in document-level sentiment classification. We show that minimal sub-structures extracted from constituency and dependency trees guided by a polarity lexicon show point absolute improvement in accuracy over a bag-of-words classifier on a widely used sentiment corpus. 1 Introduction An important subtask in sentiment analysis is sentiment classification. Sentiment classification involves the identification of positive and negative opinions from a text segment at various levels of granularity including document-level paragraphlevel sentence-level and phrase-level. This paper focuses on document-level sentiment classification. There has been a substantial amount of work on document-level sentiment classification. In early pioneering work Pang and Lee 2004 use a flat feature vector . a bag-of-words to represent the documents. A bag-of-words approach however cannot capture important information obtained from structural linguistic analysis of the doc 338 uments. More recently there have been several approaches which employ features based on deep linguistic analysis with .
đang nạp các trang xem trước