tailieunhanh - Báo cáo khoa học: "Probabilistic Text Structuring: Experiments with Sentence Ordering"

Ordering information is a critical task for natural language generation applications. In this paper we propose an approach to information ordering that is particularly suited for text-to-text generation. We describe a model that learns constraints on sentence order from a corpus of domainspecific texts and an algorithm that yields the most likely order among several alternatives. We evaluate the automatically generated orderings against authored texts from our corpus and against human subjects that are asked to mimic the model’s task. We also assess the appropriateness of such a model for multidocument summarization. the ordering and adjacency of facts and. | Probabilistic Text Structuring Experiments with Sentence Ordering Mirella Lapata Department of Computer Science University of Sheffield Regent Court 211 Portobello Street Sheffield S1 4DP UK mlap@ Abstract Ordering information is a critical task for natural language generation applications. In this paper we propose an approach to information ordering that is particularly suited for text-to-text generation. We describe a model that learns constraints on sentence order from a corpus of domainspecific texts and an algorithm that yields the most likely order among several alternatives. We evaluate the automatically generated orderings against authored texts from our corpus and against human subjects that are asked to mimic the model s task. We also assess the appropriateness of such a model for multidocument summarization. 1 Introduction Structuring a set of facts into a coherent text is a non-trivial task which has received much attention in the area of concept-to-text generation see Reiter and Dale 2000 for an overview . The structured text is typically assumed to be a tree . to have a hierarchical structure whose leaves express the content being communicated and whose nodes specify how this content is grouped via rhetorical or discourse relations . contrast sequence elaboration . For domains with large numbers of facts and rhetorical relations there can be more than one possible tree representing the intended content. These different trees will be realized as texts with different sentence orders or even paragraph orders and different levels of coherence. Finding the tree that yields the best possible text is effectively a search problem. One way to address it is by narrowing down the search space either exhaustively or heuristically. Marcu 1997 argues that global coherence can be achieved if constraints on local coherence are satisfied. The latter are operationalized as weights on the ordering and adjacency of facts and are derived from a corpus