tailieunhanh - Báo cáo khoa học: "A Component for Just-In-Time Incremental Speech Synthesis"

We present a component for incremental speech synthesis (iSS) and a set of applications that demonstrate its capabilities. This component can be used to increase the responsivity and naturalness of spoken interactive systems. While iSS can show its full strength in systems that generate output incrementally, we also discuss how even otherwise unchanged systems may profit from its capabilities. | lNPRO_iSS A Component for Just-In-Time Incremental Speech Synthesis Timo Baumann University of Hamburg Department for Informatics Germany baumann@ David Schlangen University of Bielefeld Faculty of Linguistics and Literary Studies Germany Abstract We present a component for incremental speech synthesis iSS and a set of applications that demonstrate its capabilities. This component can be used to increase the responsivity and naturalness of spoken interactive systems. While iSS can show its full strength in systems that generate output incrementally we also discuss how even otherwise unchanged systems may profit from its capabilities. 1 Introduction Current state of the art in speech synthesis for spoken dialogue systems SDSs is for the synthesis component to expect full utterances in textual form as input and to deliver an audio stream verbalising this full utterance. At best timing information is returned as well so that a control component can determine in case of an interruption barge-in by the user where in the utterance this happened Edlund 2008 Matsuyama et al. 2010 . We want to argue here that providing capabilities to speech synthesis components for dealing with units smaller than full utterances can be beneficial for a whole range of interactive speech-based systems. In the easiest case incremental synthesis simply reduces the utterance-initial delay before speech output starts as output already starts when its beginning has been produced. In an otherwise conventional dialogue system the synthesis module could make it possible to interrupt the output speech stream e. g. when a noise event is detected that makes it likely that the user will not be able to hear what is being said and continue production when the interruption is over. If other SDS components are adapted more to take advantage of incremental speech synthesis even more 103 flexible behaviours can be realised such as providing utterances

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
337    139    1    26-11-2024
2    133    1    26-11-2024