tailieunhanh - Báo cáo khoa học: "Using Automatically Transcribed Dialogs to Learn User Models in a Spoken Dialog System"

We use an EM algorithm to learn user models in a spoken dialog system. Our method requires automatically transcribed (with ASR) dialog corpora, plus a model of transcription errors, but does not otherwise need any manual transcription effort. We tested our method on a voice-controlled telephone directory application, and show that our learned models better replicate the true distribution of user actions than those trained by simpler methods and are very similar to user models estimated from manually transcribed dialogs. . | Using Automatically Transcribed Dialogs to Learn User Models in a Spoken Dialog System Umar Syed Department of Computer Science Princeton University Princeton NJ 08540 UsA usyed@ Jason D. Williams Shannon Laboratory AT T Labs Research Florham Park NJ 07932 USA jdw@ Abstract We use an EM algorithm to learn user models in a spoken dialog system. Our method requires automatically transcribed with ASR dialog corpora plus a model of transcription errors but does not otherwise need any manual transcription effort. We tested our method on a voice-controlled telephone directory application and show that our learned models better replicate the true distribution of user actions than those trained by simpler methods and are very similar to user models estimated from manually transcribed dialogs. 1 Introduction and Background When designing a dialog manager for a spoken dialog system we would ideally like to try different dialog management strategies on the actual user population that will be using the system and select the one that works best. However users are typically unwilling to endure this kind of experimentation. The next-best approach is to build a model of user behavior. That way we can experiment with the model as much as we like without troubling actual users. Of course for these experiments to be useful a high-quality user model is needed. The usual method of building a user model is to estimate it from transcribed corpora of human-computer dialogs. However manually transcribing dialogs is expensive and consequently these corpora are usually small and sparse. In this work we propose a method of building user models that does not operate on manually transcribed dialogs but instead uses dialogs that have been transcribed by an automatic speech recognition ASR engine. Since this process is error-prone we cannot assume that the transcripts will accurately reflect the users true actions and internal states. To handle this uncertainty we