tailieunhanh - Báo cáo khoa học: "AN AUTOMATIC METHOD OF FINDING BOUNDARIES"

This article outlines a new method of locating discourse boundaries based on lexical cohesion and a graphical technique called dotplotting. The application of dotplotting to discourse segmentation can be performed either manually, by examining a graph, or automatically, using an optimization algorithm. The results of two experiments involving automatically locating boundaries between a series of concatenated documents are presented. Areas of application and future directions for this work are also outlined. Introduction In general, texts are "about" some topic. . | AN AUTOMATIC METHOD OF FINDING TOPIC BOUNDARIES Jeffrey c. Reyn ar Department of Computer and Information Science University of Pennsylvania Philadelphia Pennsylvania USA j creynar @unagi. cis .upenn .edu Abstract This article outlines a new method of locating discourse boundaries based on lexical cohesion and a graphical technique called dotplotting. The application of dotplotting to discourse segmentation can be performed either manually by examining a graph or automatically using an optimization algorithm. The results of two experiments involving automatically locating boundaries between a series of concatenated documents are presented. Areas of application and future directions for this work are also outlined. Introduction In general texts are about some topic. That is the sentences which compose a document contribute information related to the topic in a coherent fashion. In all but the shortest texts the topic will be expounded upon through the discussion of multiple subtopics. Whether the organization of the text is hierarchical in nature as described in Grosz and Sidner 1986 or linear as examined in Skorochod ko 1972 boundaries between subtopics will generally exist. In some cases these boundaries will be explicit and will correspond to paragraphs or in longer texts sections or chapters. They can also be implicit. Newspaper articles often contain paragraph demarcations but less frequently contain section markings even though lengthy articles often address the main topic by discussing subtopics in separate paragraphs or regions of the article. Topic boundaries are useful for several different tasks. Hearst and Plaunt 1993 demonstrated their usefulness for information retrieval by showing that segmenting documents and indexing the resulting subdocuments improves accuracy on an information retrieval task. Youmans 1991 showed that his text segmentation algorithm could be used to manually find scene boundaries in works of literature. Morris and Hirst 1991 at The

TỪ KHÓA LIÊN QUAN