tailieunhanh - Báo cáo khoa học: "Automatic Evaluation of Sentence-Level Fluency Andrew Mutton∗"

In evaluating the output of language technology applications—MT, natural language generation, summarisation—automatic evaluation techniques generally conflate measurement of faithfulness to source content with fluency of the resulting text. In this paper we develop an automatic evaluation metric to estimate fluency alone, by examining the use of parser outputs as metrics, and show that they correlate with human judgements of generated text fluency. We then develop a machine learner based on these, and show that this performs better than the individual parser metrics, approaching a lower bound on human performance. . | GLEU Automatic Evaluation of Sentence-Level Fluency Andrew Mutton Mark Dras Stephen Wan t Robert Dale Centre for Language Technology 1 Information and Communication Technologies Macquarie University CSIRO NSW 2109 Australia NSW 2109 Australia madras@ Abstract In evaluating the output of language technology applications MT natural language generation summarisation automatic evaluation techniques generally conflate measurement of faithfulness to source content with fluency of the resulting text. In this paper we develop an automatic evaluation metric to estimate fluency alone by examining the use of parser outputs as metrics and show that they correlate with human judgements of generated text fluency. We then develop a machine learner based on these and show that this performs better than the individual parser metrics approaching a lower bound on human performance. We finally look at different language models for generating sentences and show that while individual parser metrics can be fooled depending on generation method the machine learner provides a consistent estimator of fluency. 1 Introduction Intrinsic evaluation of the output of many language technologies can be characterised as having at least two aspects how well the generated text reflects the source data whether it be text in another language for machine translation MT a natural language generation NLG input representation a document to be summarised and so on and how well it conforms to normal human language usage. These two aspects are often made explicit in approaches to creating the text. For example in statistical MT the translation model and the language model are treated separately characterised as faithfulness and fluency respectively as in the treatment in Jurafsky and Martin 2000 . Similarly the ultrasummarisation model of Witbrock and Mittal 1999 consists of a content model modelling the probability that a word in the source text will be in the summary and a language model. .