tailieunhanh - Báo cáo khoa học: "User Simulations for context-sensitive speech recognition in Spoken Dialogue Systems"

We use a machine learner trained on a combination of acoustic and contextual features to predict the accuracy of incoming n-best automatic speech recognition (ASR) hypotheses to a spoken dialogue system (SDS). Our novel approach is to use a simple statistical User Simulation (US) for this task, which measures the likelihood that the user would say each hypothesis in the current context. Such US models are now common in machine learning approaches to SDS, are trained on real dialogue data, and are related to theories of “alignment” in psycholinguistics. We use a US to predict the user’s next dialogue. | User Simulations for context-sensitive speech recognition in Spoken Dialogue Systems Oliver Lemon Edinburgh University olemon@ Ioannis Konstas University of Glasgow konstas@ Abstract We use a machine learner trained on a combination of acoustic and contextual features to predict the accuracy of incoming n-best automatic speech recognition ASR hypotheses to a spoken dialogue system SDS . Our novel approach is to use a simple statistical User Simulation US for this task which measures the likelihood that the user would say each hypothesis in the current context. Such US models are now common in machine learning approaches to SDS are trained on real dialogue data and are related to theories of alignment in psycholinguistics. We use a US to predict the user s next dialogue move and thereby re-rank n-best hypotheses of a speech recognizer for a corpus of 2564 user utterances. The method achieved a significant relative reduction of Word Error Rate WER of 5 this is 44 of the possible WER improvement on this data and 62 of the possible semantic improvement Dialogue Move Accuracy compared to the baseline policy of selecting the topmost ASR hypothesis. The majority of the improvement is attributable to the User Simulation feature as shown by Information Gain analysis. 1 Introduction A crucial problem in the design of spoken dialogue systems SDS is to decide for incoming recognition hypotheses whether a system should accept consider correctly recognized reject assume misrecognition or ignore classify as noise or speech not directed to the system them. Obviously incorrect decisions at this point can have serious negative effects on system usability and user satisfaction. On the one hand accept ing misrecognized hypotheses leads to misunderstandings and unintended system behaviors which are usually difficult to recover from. On the other hand users might get frustrated with a system that behaves too cautiously and rejects or ignores too many utterances.

TỪ KHÓA LIÊN QUAN