tailieunhanh - Báo cáo hóa học: " Research Article Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages"

Tuyển tập báo cáo các nghiên cứu khoa học quốc tế ngành hóa học dành cho các bạn yêu hóa học tham khảo đề tài: Research Article Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages | Hindawi Publishing Corporation EURASIP Journal on Audio Speech and Music Processing Volume 2008 Article ID 573832 7 pages doi 2008 573832 Research Article Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages Arnar Thor Jensson Koji Iwano and Sadaoki Furui Department of Computer Science Tokyo Institute of Technology 2-12-1 Ookayama Meguro-ku Tokyo 152-8552 Japan Correspondence should be addressed to Arnar Thor Jensson arnar@ Received 30 April 2008 Revised 25 July 2008 Accepted 29 October 2008 Recommended by Martin Bouchard Text corpus size is an important issue when building a language model LM . This is a particularly important issue for languages where little data is available. This paper introduces an LM adaptation technique to improve an LM built using a small amount of task-dependent text with the help of a machine-translated text corpus. Icelandic speech recognition experiments were performed using data machine translated MT from English to Icelandic on a word-by-word and sentence-by-sentence basis. LM interpolation using the baseline LM and an LM built from either word-by-word or sentence-by-sentence translated text reduced the word error rate significantly when manually obtained utterances used as a baseline were very sparse. Copyright 2008 Arnar Thor Jensson et al. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited. 1. INTRODUCTION The state-of-the-art speech recognition has advanced greatly for several languages 1 . Extensive databases both acoustical and text have been collected in those languages in order to develop the speech recognition systems. Collection of large databases requires both time and resources for each of the target language. More than 6000 living languages are spoken in the world today. Developing a speech recognition

TÀI LIỆU LIÊN QUAN