tailieunhanh - Báo cáo khoa học: "Using a Randomised Controlled Clinical Trial to Evaluate an NLG System"
The STOP system, which generates personalised smoking-cessation letters, was evaluated by a randomised controlled clinical trial. We believe this is the largest and perhaps most rigorous task effectiveness evaluation ever performed on an NLG system. The detailed results of the clinical trial have been presented elsewhere, in the medical literature. In this paper we discuss the clinical trial itself: its structure and cost, what we did and did not learn from it (especially considering that the trial showed that STOP was not effective), and how it compares to other NLG evaluation techniques. . | Using a Randomised Controlled Clinical Trial to Evaluate an NLG System Ehud Reiterf Roma RobertsonỊ A Scott LennoxỊ Liesl Osman Departments of Computing Scieneef General Practice and Medicine and Therapeutics University of Aberdeen Aberdeen Scotland UK @ Abstract The STOP system which generates personalised smoking-cessation letters was evaluated by a randomised controlled clinical trial. We believe this is the largest and perhaps most rigorous task effectiveness evaluation ever performed on an NLG system. The detailed results of the clinical trial have been presented elsewhere in the medical literature. In this paper we discuss the clinical trial itself its structure and cost what we did and did not learn from it especially considering that the trial showed that STOP was not effective and how it compares to other NLG evaluation techniques. 1 Introduction There is increasing interest in techniques for evaluating Natural Language Generation nlg systems. However we are not aware of any previously reported evaluations of NLG systems which have rigorously compared the task effectiveness of an NLG system to a non-NLG alternative. In this paper we discuss such an evaluation a large scale 2553 subjects randomised controlled clinical trial which evaluated the effectiveness of personalised smoking-cessation letters generated by the STOP system Reiter et al. 1999 . We believe that this is the largest most expensive and perhaps most rigorous evaluation ever done of an NLG system it was also a disappointing evaluation as it showed that STOP letters in general were no more effective than control letters. The detailed results of the STOP evaluation have been presented elsewhere in the medical lit erature Lennox et al. 2001 . The purpose of this paper is to discuss the clinical trial from an NLG evaluation perspective in order to help future researchers decide when a clinical trial or similar large-scale task effectiveness .
đang nạp các trang xem trước