Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Validation of sub-sentential paraphrases acquired from parallel monolingual corpora"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
The task of paraphrase acquisition from related sentences can be tackled by a variety of techniques making use of various types of knowledge. In this work, we make the hypothesis that their performance can be increased if candidate paraphrases can be validated using information that characterizes paraphrases independently of the set of techniques that proposed them. We implement this as a bi-class classification problem (i.e. paraphrase vs. not paraphrase), allowing any paraphrase acquisition technique to be easily integrated into the combination system. . | Validation of sub-sentential paraphrases acquired from parallel monolingual corpora Houda Bouamor Aurelien Max Anne Vilnat LIMSI-CNRS Univ. Paris Sud Orsay France firstname.lastname@limsi.fr Abstract The task of paraphrase acquisition from related sentences can be tackled by a variety of techniques making use of various types of knowledge. In this work we make the hypothesis that their performance can be increased if candidate paraphrases can be validated using information that characterizes paraphrases independently of the set of techniques that proposed them. We implement this as a bi-class classification problem i.e. paraphrase vs. not paraphrase allowing any paraphrase acquisition technique to be easily integrated into the combination system. We report experiments on two languages English and French with 5 individual techniques on parallel monolingual parallel corpora obtained via multiple translation and a large set of classification features including surface to contextual similarity measures. Relative improvements in F-measure close to 18 are obtained on both languages over the best performing techniques. 1 Introduction The fact that natural language allows messages to be conveyed in a great variety of ways constitutes an important difficulty for NLP with applications in both text analysis and generation. The term paraphrase is now commonly used in the NLP litterature to refer to textual units of equivalent meaning at the phrasal level including single words . For instance the phrases six months and half a year form a paraphrase pair applicable in many different contexts as they would appropriately denote the same concept. Although one can envisage to manually build high-coverage lists of synonyms enumerating meaning equivalences at the level of phrases is too daunting a task for humans. Because this type of knowledge can however greatly benefit many NLP applications automatic acquisition of such paraphrases has attracted a lot of attention Androutsopoulos .