tailieunhanh - Báo cáo khoa học: "Computing Lexical Chains with Graph Clustering"

This paper describes a new method for computing lexical chains. These are sequences of semantically related words that reflect a text’s cohesive structure. In contrast to previous methods, we are able to select chains based on their cohesive strength. This is achieved by analyzing the connectivity in graphs representing the lexical chains. We show that the generated chains significantly improve performance of automatic text summarization and keyphrase indexing. | Computing Lexical Chains with Graph Clustering Olena Medelyan Computer Science Department The University of Waikato New Zealand olena@ Abstract This paper describes a new method for computing lexical chains. These are sequences of semantically related words that reflect a text s cohesive structure. In contrast to previous methods we are able to select chains based on their cohesive strength. This is achieved by analyzing the connectivity in graphs representing the lexical chains. We show that the generated chains significantly improve performance of automatic text summarization and keyphrase indexing. 1 Introduction Text understanding tasks such as topic detection automatic summarization discourse analysis and question answering require deep understanding of the text s meaning. The first step in determining this meaning is the analysis of the text s concepts and their inter-relations. Lexical chains provide a framework for such an analysis. They combine semantically related words across sentences into meaningful sequences that reflect the cohesive structure of the text. Lexical chains introduced by Morris and Hirst 1991 have been studied extensively in the last decade since large lexical databases are available in digital form. Most approaches use WordNet or Roget s thesaurus for computing the chains and apply the results for text summarization. We present a new approach for computing lexical chains by treating them as graphs where nodes are document terms and edges reflect semantic relations between them. In contrast to previous methods we analyze the cohesive strength within a chain by computing the diameter of the chain graph. Weakly cohesive chains with a high graph diameter are decomposed by a graph clustering algorithm into several highly cohesive chains. We use WordNet and alternatively a domain-specific thesaurus for obtaining semantic relations between the terms. We first give an overview of existing methods for computing lexical chains and