tailieunhanh - Báo cáo khoa học: "Combining Acoustic and Pragmatic Features to Predict Recognition Performance in Spoken Dialogue Systems"

We use machine learners trained on a combination of acoustic confidence and pragmatic plausibility features computed from dialogue context to predict the accuracy of incoming n-best recognition hypotheses to a spoken dialogue system. Our best results show a 25% weighted f-score improvement over a baseline system that implements a “grammar-switching” approach to context-sensitive speech recognition. | Combining Acoustic and Pragmatic Features to Predict Recognition Performance in Spoken Dialogue Systems Malte Gabsdil Department of Computational Linguistics Saarland University Germany gabsdil@ Oliver Lemon School of Informatics Edinburgh University Scotland olemon@ Abstract We use machine learners trained on a combination of acoustic confidence and pragmatic plausibility features computed from dialogue context to predict the accuracy of incoming n-best recognition hypotheses to a spoken dialogue system. Our best results show a 25 weighted f-score improvement over a baseline system that implements a grammar-switching approach to context-sensitive speech recognition. 1 Introduction A crucial problem in the design of spoken dialogue systems is to decide for incoming recognition hypotheses whether a system should accept consider correctly recognized reject assume misrecognition or ignore classify as noise or speech not directed to the system them. In addition a more sophisticated dialogue system might decide whether to clarify or confirm certain hypotheses. Obviously incorrect decisions at this point can have serious negative effects on system usability and user satisfaction. On the one hand accepting misrecognized hypotheses leads to misunderstandings and unintended system behaviors which are usually difficult to recover from. On the other hand users might get frustrated with a system that behaves too cautiously and rejects or ignores too many utterances. Thus an important feature in dialogue system engineering is the tradeoff between avoiding task failure due to misrecognitions and promoting overall dialogue efficiency flow and naturalness. In this paper we investigate the use of machine learners trained on a combination of acoustic confidence and pragmatic plausibility features . computed from dialogue context to predict the quality of incoming n-best recognition hypotheses to a spoken dialogue system. These predictions are then used .