tailieunhanh - Báo cáo khoa học: "Generation that Exploits Corpus-Based Statistical Knowledge"

We describe novel aspects of a new natural language generator called Nitrogen. This generator has a highly flexible input representation that allows a spectrum of input from syntactic to semantic depth, and shifts' the burden of many linguistic decisions to the statistical post-processor. The generation algorithm is compositional, making it efficient, yet it also handles non-compositional aspects of language. Nitrogen's design makes it robust and scalable, operating with lexicons and knowledge bases of one hundred thousand entities. . | Generation that Exploits Corpus-Based Statistical Knowledge Irene Langkiide and Kevin Knight Information Sciences Institute University of Southern California Marina del Rey CA 90292 and Abstract We describe novel aspects of a new natural language generator called Nitrogen. This generator has a highly flexible input representation that allows a spectrum of input from syntactic to semantic depth and shifts the burden of many linguistic decisions to the statistical post-processor. The generation algorithm is compositional making it efficient yet it also handles non-compositional aspects of language. Nitrogen s design makes it robust and scalable operating with lexicons and knowledge bases of one hundred thousand entities. 1 Introduction Language generation is an important subtask of applications like machine translation humancomputer dialogue explanation and summarization. The recurring need for generation suggests the usefulness of a general-purpose domain-independent natural language generator NLG . However plugin generators available today such as FUF SURGE Elhadad and Robin 1998 MUMBLE Meteer et al. 1987 KPML Bateman 1996 and CoGen-Tex s RealPro Lavoie and Rambow 1997 require inputs with a daunting amount of linguistic detail. As a result many client applications resort instead to simpler template-based methods. An important advantage of templates is that they sidestep linguistic decision-making and avoid the need for large complex knowledge resources and processing. For example the following structure could be a typical result from a database query on the type of food a venue serves obj-type venue obj-name Top_of_the_Mark attribute food-type attrib-value American By using a template like obj-name s attribute is attrib-value . the structure could produce the sentence Top of the Mark s food type is American. Templates avoid the need for detailed linguistic information about lexical items part-of-speech tags number gender definiteness