tailieunhanh - Báo cáo khoa học: " A Framework for Evaluating Spoken Dialogue Agents"
This paper presents PARADISE (PARAdigm for Dialogue System Evaluation), a general framework for evaluating spoken dialogue agents. The framework decouples task requirements from an agent's dialogue behaviors, supports comparisons among dialogue strategies, enables the calculation of performance over subdialogues and whole dialogues, specifies the relative contribution of various factors to performance, and makes it possible to compare agents performing different tasks by normalizing for task complexity. . | PARADISE A Framework for Evaluating Spoken Dialogue Agents Marilyn A. Walker Diane J. Litman Candace A. Kamm and Alicia Abella AT T Labs -Research 180 Park Avenue Florham Park NJ 07932-0971 USA walker diane cak abella@ Abstract This paper presents PARADISE PARAdigm for Dialogue System Evaluation a general framework for evaluating spoken dialogue agents. The framework decouples task requirements from an agent s dialogue behaviors supports comparisons among dialogue strategies enables the calculation of performance over subdialogues and whole dialogues specifies the relative contribution of various factors to performance and makes it possible to compare agents performing different tasks by normalizing for task complexity. 1 Introduction Recent advances in dialogue modeling speech recognition and natural language processing have made it possible to build spoken dialogue agents for a wide variety of Potential benefits of such agents include remote or hands-free access ease of use naturalness and greater efficiency of interaction. However a critical obstacle to progress in this area is the lack of a general framework for evaluating and comparing the performance of different dialogue agents. One widely used approach to evaluation is based on the notion ofa reference answer Hirschman et al. 1990 . An agent s responses to a query are compared with a predefined key of minimum and maximum reference answers performance is the proportion of responses that match the key. This approach has many widely acknowledged limitations Hirschman and Pao 1993 Danieli et al. 1992 Bates and Ayuso 1993 . although there may be many potential dialogue strategies for carrying out a task the key is tied to one particular dialogue strategy. In contrast agents using different dialogue strategies can be compared with measures such as inappropriate utterance ratio turn correction ratio concept accuracy implicit recovery and transaction success Danieli We use the term
đang nạp các trang xem trước