tailieunhanh - Báo cáo khoa học: "String Re-writing Kernel"

Learning for sentence re-writing is a fundamental task in natural language processing and information retrieval. In this paper, we propose a new class of kernel functions, referred to as string re-writing kernel, to address the problem. A string re-writing kernel measures the similarity between two pairs of strings, each pair representing re-writing of a string. It can capture the lexical and structural similarity between two pairs of sentences without the need of constructing syntactic trees. | String Re-writing Kernel Fan Bu1 Hang Li2 and Xiaoyan Zhu3 1 3State Key Laboratory of Intelligent Technology and Systems 1 3Tsinghua National Laboratory for Information Sci. and Tech. 1 3Department of Computer Sci. and Tech. Tsinghua University Microsoft Research Asia No. 5 Danling Street Beijing 100080 China 1bufan0000@ 2hangli@ 3zxy-dcs@ Abstract Learning for sentence re-writing is a fundamental task in natural language processing and information retrieval. In this paper we propose a new class of kernel functions referred to as string re-writing kernel to address the problem. A string re-writing kernel measures the similarity between two pairs of strings each pair representing re-writing of a string. It can capture the lexical and structural similarity between two pairs of sentences without the need of constructing syntactic trees. We further propose an instance of string rewriting kernel which can be computed efficiently. Experimental results on benchmark datasets show that our method can achieve better results than state-of-the-art methods on two sentence re-writing learning tasks paraphrase identification and recognizing textual entailment. 1 Introduction Learning for sentence re-writing is a fundamental task in natural language processing and information retrieval which includes paraphrasing textual entailment and transformation between query and document title in search. The key question here is how to represent the rewriting of sentences. In previous research on sentence re-writing learning such as paraphrase identification and recognizing textual entailment most representations are based on the lexicons Zhang and Patrick 2005 Lintean and Rus 2011 de Marneffe et al. 2006 or the syntactic trees Das and Smith wrote . was written by . A Shakespeare wrote Hamlet. Hamlet was written by Shakespeare. B Figure 1 Example of re-writing. A is a re-writing rule and B is a re-writing of sentence. 2009 Heilman and Smith 2010 of the .

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN