tailieunhanh - Báo cáo hóa học: " Research Article On the Utility of Syllable-Based Acoustic Models for Pronunciation Variation Modelling"

Tuyển tập báo cáo các nghiên cứu khoa học quốc tế ngành hóa học dành cho các bạn yêu hóa học tham khảo đề tài: Research Article On the Utility of Syllable-Based Acoustic Models for Pronunciation Variation Modelling | Hindawi Publishing Corporation EURASIP Journal on Audio Speech and Music Processing Volume 2007 Article ID 46460 11 pages doi 2007 46460 Research Article On the Utility of Syllable-Based Acoustic Models for Pronunciation Variation Modelling Annika Hamalainen Lou Boves Johan de Veth and Louis ten Bosch Centre for Language and Speech Technology CLST Faculty of Arts Radboud University Nijmegen P. O. Box 9103 6500 HD Nijmegen The Netherlands Received 6 December 2006 Accepted 18 May 2007 Recommended by Jim Glass Recent research on the TIMIT corpus suggests that longer-length acoustic models are more appropriate for pronunciation variation modelling than the context-dependent phones that conventional automatic speech recognisers use. However the impressive speech recognition results obtained with longer-length models on TIMIT remain to be reproduced on other corpora. To understand the conditions in which longer-length acoustic models result in considerable improvements in recognition performance we carry out recognition experiments on both TIMIT and the Spoken Dutch Corpus and analyse the differences between the two sets of results. We establish that the details of the procedure used for initialising the longer-length models have a substantial effect on the speech recognition results. When initialised appropriately longer-length acoustic models that borrow their topology from a sequence of triphones cannot capture the pronunciation variation phenomena that hinder recognition performance the most. Copyright 2007 Annika Hamalainen et al. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited. 1. INTRODUCTION Conventional large-vocabulary continuous speech recognis-ers use context-dependent phone models such as triphones to model speech. Apart from their capability of modelling some contextual effects the main .

TÀI LIỆU LIÊN QUAN