Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Paraphrase Identification as Probabilistic Quasi-Synchronous Recognition"

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

We present a novel approach to deciding whether two sentences hold a paraphrase relationship. We employ a generative model that generates a paraphrase of a given sentence, and we use probabilistic inference to reason about whether two sentences share the paraphrase relationship. The model cleanly incorporates both syntax and lexical semantics using quasi-synchronous dependency grammars (Smith and Eisner, 2006). | Paraphrase Identification as Probabilistic Quasi-Synchronous Recognition Dipanjan Das and Noah A. Smith Language Technologies Institute Carnegie Mellon University PittsbUrgh PA 15213 UsA dipanjan nasmith @cs.cmu.edu Abstract We present a novel approach to deciding whether two sentences hold a paraphrase relationship. We employ a generative model that generates a paraphrase of a given sentence and we use probabilistic inference to reason about whether two sentences share the paraphrase relationship. The model cleanly incorporates both syntax and lexical semantics using quasi-synchronous dependency grammars Smith and Eisner 2006 . Furthermore using a product of experts Hinton 2002 we combine the model with a complementary logistic regression model based on state-of-the-art lexical overlap features. We evaluate our models on the task of distinguishing true paraphrase pairs from false ones on a standard corpus giving competitive state-of-the-art performance. 1 Introduction The problem of modeling paraphrase relationships between natural language utterances McKeown 1979 has recently attracted interest. For computational linguists solving this problem may shed light on how best to model the semantics of sentences. For natural language engineers the problem bears on information management systems like abstractive summarizers that must measure semantic overlap between sentences Barzi-lay and Lee 2003 question answering modules Marsi and Krahmer 2005 and machine translation Callison-Burch et al. 2006 . The paraphrase identification problem asks whether two sentences have essentially the same meaning. Although paraphrase identification is defined in semantic terms it is usually solved using statistical classifiers based on shallow lexical n-gram and syntactic overlap features. Such overlap features give the best-published classification accuracy for the paraphrase identification task Zhang and Patrick 2005 Finch et al. 2005 Wan et al. 2006 Corley and Mihalcea 2005 inter alia