tailieunhanh - Báo cáo khoa học: "Conciseness through Aggregation in Text Generation"

Aggregating different pieces of similar information is necessary to generate concise and easy to understand reports in technical domains. This paper presents a general algorithm that combines similar messages in order to generate one or more coherent sentences for them. The process is not as trivial as might be expected. Problems encountered are briefly described. 1 Motivation Aggregation is any syntactic process that allows the expression of concise and tightly constructed text such as coordination or subordination. By using the parallelism of syntactic structure to express similar information, writers can convey the same amount of information in a shorter. | Conciseness through Aggregation in Text Generation James Shaw Dept of Computer Science Columbia University New York NY 10027 USA Abstract Aggregating different pieces of similar information is necessary to generate concise and easy to understand reports in technical domains. This paper presents a general algorithm that combines similar messages in order to generate one or more coherent sentences for them. The process is not as trivial as might be expected. Problems encountered are briefly described. 1 Motivation Aggregation is any syntactic process that allows the expression of concise and tightly constructed text such as coordination or subordination. By using the parallelism of syntactic structure to express similar information writers can convey the same amount of information in a shorter space. Coordination has been the object of considerable research for an overview see van Oirsouw87 . In contrast to linguistic approaches which are generally analytic the treatment of coordination in this paper is from a synthetic point of view text generation. It raises issues such as deciding when and how to coordinate. An algorithm for generating coordinated sentences is implemented in PLANDoc Kukich et McKeown et an automated documentation system. PLANDoc generates natural language reports based on the interaction between telephone planning engineers and LEIS-PLAN1 a knowledge based system. Input to PLANDoc is a series of messages or semantic functional descriptions FD Fig. 1 . Each FD is an atomic decision about telephone equipment installation chosen by a planning engineer. The domain of discourse is currently limited to 31 message types but user interactions include many variations and combinations of these messages. Instead of generating four separate messages as in Fig. 2 PLANDoc combines them and generates the following two sentences This refinement activated DLC for CSAs 3122 and 3130 in the first quarter of 1994 1 LEIS is a .