tailieunhanh - Báo cáo khoa học: "Extracting Paraphrases from Definition Sentences on the Web"

We propose an automatic method of extracting paraphrases from definition sentences, which are also automatically acquired from the Web. We observe that a huge number of concepts are defined in Web documents, and that the sentences that define the same concept tend to convey mostly the same information using different expressions and thus contain many paraphrases. We show that a large number of paraphrases can be automatically extracted with high precision by regarding the sentences that define the same concept as parallel corpora. . | Extracting Paraphrases from Definition Sentences on the Web Chikara Hashimoto Kentaro Torisawa Stijn De Saeger Jun ichi Kazama Sadao Kurohashi t National Institute of Information and Communications Technology Kyoto 619-0237 JAPAN Graduate School of Informatics Kyoto University Kyoto 606-8501 JAPAN ch t torisawa stijn kazama @ kuro@ Abstract We propose an automatic method of extracting paraphrases from definition sentences which are also automatically acquired from the Web. We observe that a huge number of concepts are defined in Web documents and that the sentences that define the same concept tend to convey mostly the same information using different expressions and thus contain many paraphrases. We show that a large number of paraphrases can be automatically extracted with high precision by regarding the sentences that define the same concept as parallel corpora. Experimental results indicated that with our method it was possible to extract about 300 000 paraphrases from 6 X 108 Web documents with a precision rate of about 94 . 1 Introduction Natural language allows us to express the same information in many ways which makes natural language processing NLP a challenging area. Accordingly many researchers have recognized that automatic paraphrasing is an indispensable component of intelligent NLP systems Iordanskaja et al. 1991 McKeown et al. 2002 Lin and Pantel 2001 Ravichandran and Hovy 2002 Kauchak and Barzi-lay 2006 Callison-Burch et al. 2006 and have tried to acquire a large amount of paraphrase knowledge which is a key to achieving robust automatic paraphrasing from corpora Lin and Pantel 2001 Barzi-lay and McKeown 2001 Shinyama et al. 2002 Barzilay and Lee 2003 . We propose a method to extract phrasal paraphrases from pairs of sentences that define the same 1087 concept. The method is based on our observation that two sentences defining the same concept can be regarded as a parallel corpus since they largely convey the same .

TỪ KHÓA LIÊN QUAN