tailieunhanh - Báo cáo khoa học: "Simulating the Behaviour of Older versus Younger Users when Interacting with Spoken Dialogue Systems"

In this paper we build user simulations of older and younger adults using a corpus of interactions with a Wizard-of-Oz appointment scheduling system. We measure the quality of these models with standard metrics proposed in the literature. Our results agree with predictions based on statistical analysis of the corpus and previous findings about the diversity of older people’s behaviour. Furthermore, our results show that these metrics can be a good predictor of the behaviour of different types of users, which provides evidence for the validity of current user simulation evaluation metrics. . | Simulating the Behaviour of Older versus Younger Users when Interacting with Spoken Dialogue Systems Kallirroi Georgila Maria Wolters and Johanna D. Moore Human Communication Research Centre University of Edinburgh kgeorgil mwolters jmoore@ Abstract In this paper we build user simulations of older and younger adults using a corpus of interactions with a Wizard-of-Oz appointment scheduling system. We measure the quality of these models with standard metrics proposed in the literature. Our results agree with predictions based on statistical analysis of the corpus and previous findings about the diversity of older people s behaviour. Furthermore our results show that these metrics can be a good predictor of the behaviour of different types of users which provides evidence for the validity of current user simulation evaluation metrics. 1 Introduction Using machine learning to induce dialogue management policies requires large amounts of training data and thus it is typically not feasible to build such models solely with data from real users. Instead data from real users is used to build simulated users SUs who then interact with the system as often as needed. In order to learn good policies the behaviour of the SUs needs to cover the range of variation seen in real users Schatzmann et al. 2005 Georgila et al. 2006 . Furthermore SUs are critical for evaluating candidate dialogue policies. To date several techniques for building SUs have been investigated and metrics for evaluating their quality have been proposed Schatzmann et al. 2005 Georgilaetal. 2006 . However to our knowledge no one has tried to build user simulations for different populations of real users and measure whether results from evaluating the quality of those simulations agree with what is known about those particular types of real users extracted from other studies of those populations. This is presumably due to the lack of corpora for different types of users. In this paper we focus on the