Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Modeling Sentences in the Latent Space"

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

Sentence Similarity is the process of computing a similarity score between two sentences. Previous sentence similarity work finds that latent semantics approaches to the problem do not perform well due to insufficient information in single sentences. In this paper, we show that by carefully handling words that are not in the sentences (missing words), we can train a reliable latent variable model on sentences. | Modeling Sentences in the Latent Space Weiwei Guo Mona Diab Department of Computer Science Columbia University weiwei@cs.columbia.edu Center for Computational Learning Systems Columbia University mdiab@ccls.columbia.edu Abstract Sentence Similarity is the process of computing a similarity score between two sentences. Previous sentence similarity work finds that latent semantics approaches to the problem do not perform well due to insufficient information in single sentences. In this paper we show that by carefully handling words that are not in the sentences missing words we can train a reliable latent variable model on sentences. In the process we propose a new evaluation framework for sentence similarity Concept Definition Retrieval. The new framework allows for large scale tuning and testing of Sentence Similarity models. Experiments on the new task and previous data sets show significant improvement of our model over baselines and other traditional latent variable models. Our results indicate comparable and even better performance than current state of the art systems addressing the problem of sentence similarity. 1 Introduction Identifying the degree of semantic similarity SS between two sentences is at the core of many NLP applications that focus on sentence level semantics such as Machine Translation Kauchak and Barzi-lay 2006 Summarization Zhou et al. 2006 Text Coherence Detection Lapata and Barzilay 2005 etc.To date almost all Sentence Similarity SS approaches work in the high-dimensional word space and rely mainly on word similarity. There are two main not unrelated disadvantages to word similarity based approaches 1. lexical ambiguity as the pairwise word similarity ignores the semantic interaction between the word and its sentential context 864 2. word co-occurrence information is not sufficiently exploited. Latent variable models such as Latent Semantic Analysis LSA Landauer et al. 1998 Probabilistic Latent Semantic Analysis PLSA Hofmann 1999 Latent .