tailieunhanh - INCORPORATING CONTEXTUAL PHONETICS INTO AUTOMATIC SPEECH RECOGNITION
This work outlines the problems encountered in modeling pro- nunciation for automatic speech recognition (ASR) of spontaneous (American) English speech. We detail some of the phonetic phe- nomena within the Switchboard corpus that make the recognition of this speaking style difficult. Phonetic transcribers found that fea- ture spreading and cue trading made identification of phonetic seg- mental boundaries problematic. Including different forms of con- text in pronunciation models, however, may alleviate these prob- lems in the ASR domain. The syllable appears to play an im- portant role, as many of the phonetic phenomena seen are sylla- ble-internal, and the increase in pronunciation variation compared to read speech is concentrated in coda. | INCORPORATING CONTEXTUAL PHONETICS INTO AUTOMATIC SPEECH RECOGNITION y y ?y Eric Fosler-Lussier ? ,StevenGreenberg, and Nelson Morgan ? University of California, Berkeley, USA y International Computer Science Institute, USA ABSTRACT Performance of all systems in 1998 Hub4E DARPA Broadcast News Evaluation 30 This work outlines the problems encountered in modeling pro- Planned Studio Speech nunciation for automatic speech recognition (ASR) of spontaneous Spontaneous Studio Speech (American) English speech. We detail some of the phonetic phe- 25 nomena within the Switchboard corpus that make the recognition of this speaking style difficult. Phonetic transcribers found that fea- ture spreading and cue trading made identification of phonetic seg- 20 mental boundaries problematic. Including different forms of con- text in pronunciation models, however, may alleviate these prob- lems in the ASR domain. The syllable appears to play an im- 15 portant role, as many of the phonetic phenomena seen are sylla- ble-internal, and the increase in pronunciation variation compared Percent word error to read speech is concentrated in coda consonants. In addition, we 10 show that other forms of context – speaking rate and word pre- dictability – help indicate increases in variability. We present a dynamic ASR pronunciation model that utilizes longer phonetic 5 contextual windows for capturing the range of detail characteristic of naturally spoken language. 0 cu−htk ibm limsi dragon bbn philips/rwth sprach sri ogi/fonix ASR System by Site 1. INTRODUCTION ASR systems typically perform more poorly on spontaneous Figure 1. ASR system error for nine recognizers on planned and speech than on corpora containing scripted and highly planned ma- spontaneous studio speech in the Broadcast News corpus. terial. Although some of this deterioration in performance reflects the wide range of acoustic background conditions typical of natu- relates with situations in which phonetic transcriptions of the .
đang nạp các trang xem trước