Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Hierarchical Directed Acyclic Graph Kernel: Methods for Structured Natural Language Data"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
This paper proposes the “Hierarchical Directed Acyclic Graph (HDAG) Kernel” for structured natural language data. The HDAG Kernel directly accepts several levels of both chunks and their relations, and then efficiently computes the weighed sum of the number of common attribute sequences of the HDAGs. We applied the proposed method to question classification and sentence alignment tasks to evaluate its performance as a similarity measure and a kernel function. The results of the experiments demonstrate that the HDAG Kernel is superior to other kernel functions and baseline methods. . | Hierarchical Directed Acyclic Graph Kernel Methods for Structured Natural Language Data Jun Suzuki Tsutomu Hirao Yutaka Sasaki and Eisaku Maeda NTT Communication Science Laboratories NTT Corp. 2-4 Hikaridai Seika-cho Soraku-gun Kyoto 619-0237 Japan jun hirao sasaki maeda @cslab.kecl.ntt.co.jp Abstract This paper proposes the Hierarchical Directed Acyclic Graph HDAG Kernel for structured natural language data. The HDAG Kernel directly accepts several levels of both chunks and their relations and then efficiently computes the weighed sum of the number of common attribute sequences of the HDAGs. We applied the proposed method to question classification and sentence alignment tasks to evaluate its performance as a similarity measure and a kernel function. The results of the experiments demonstrate that the HDAG Kernel is superior to other kernel functions and baseline methods. 1 Introduction As it has become easy to get structured corpora such as annotated texts many researchers have applied statistical and machine learning techniques to NLP tasks thus the accuracies of basic NLP tools such as POS taggers NP chunkers named entities taggers and dependency analyzers have been improved to the point that they can realize practical applications in NLP. The motivation of this paper is to identify and use richer information within texts that will improve the performance of NLP applications this is in contrast to using feature vectors constructed by a bag-of-words Salton et al. 1975 . We now are focusing on the methods that use numerical feature vectors to represent the features of natural language data. In this case since the original natural language data is symbolic researchers convert the symbolic data into numeric data. This process feature extraction is ad-hoc in nature and differs with each NLP task there has been no neat formulation for generating feature vectors from the semantic and grammatical structures inside texts. Kernel methods Vapnik 1995 Cristianini and .