tailieunhanh - Báo cáo khoa học: "Quantitative and Qualitative Evaluation of Darpa Communicator Spoken Dialogue Systems"

This paper describes the application of the PARADISE evaluation framework to the corpus of 662 human-computer dialogues collected in the June 2000 Darpa Communicator data collection. We describe results based on the standard logfile metrics as well as results based on additional qualitative metrics derived using the DATE dialogue act tagging scheme. We show that performance models derived via using the standard metrics can account for 37% of the variance in user satisfaction, and that the addition of DATE metrics improved the models by an absolute 5%. . | Quantitative and Qualitative Evaluation of Darpa Communicator Spoken Dialogue Systems Marilyn A. Walker AT T Labs - Research 180 Park Ave E103 Florham Park NJ. 07932 walker@ Rebecca Passonneau Julie E. Boland AT T Labs -Research Institute of Cognitive Science 180 Park Ave D191 University of Louisiana at Lafayette Florham Park NJ. 07932 Lafayette LA 70504 becky@ boland@ Abstract This paper describes the application of the PARADISE evaluation framework to the corpus of 662 human-computer dialogues collected in the June 2000 Darpa Communicator data collection. We describe results based on the standard logfile metrics as well as results based on additional qualitative metrics derived using the DATE dialogue act tagging scheme. We show that performance models derived via using the standard metrics can account for 37 of the variance in user satisfaction and that the addition of DATE metrics improved the models by an absolute 5 . 1 Introduction The objective of the DARPA COMMUNICATOR program is to support research on multi-modal speech-enabled dialogue systems with advanced conversational capabilities. In order to make this a reality it is important to understand the contribution of various techniques to users willingness and ability to use a spoken dialogue system. In June of 2000 we conducted an exploratory data collection experiment with nine participating communicator systems. All systems supported travel planning and utilized some form of mixed-initiative interaction. However the systems varied in several critical dimensions 1 They targeted different back-end databases for travel information 2 System modules such as ASR NLU TTS and dialogue management were typically different across systems. The Evaluation Committee chaired by Walker Walker 2000 with representatives from the nine COMMUNICATOR sites and from NIST developed the experimental design. A logfile standard was developed by MITRE along with a set of tools for .