tailieunhanh - Báo cáo khoa học: "Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion"
We present a discriminative structureprediction model for the letter-to-phoneme task, a crucial step in text-to-speech processing. Our method encompasses three tasks that have been previously handled separately: input segmentation, phoneme prediction, and sequence modeling. The key idea is online discriminative training, which updates parameters according to a comparison of the current system output to the desired output, allowing us to train all of our components together. | Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion Sittichai Jiampojamarn Colin Cherry Grzegorz Kondrak tDepartment of Computing Science University of Alberta Edmonton AB T6G 2E8 Canada sj kondrak @ ÍMicrosoft Research One Microsoft Way Redmond WA 98052 colinc@ Abstract We present a discriminative structureprediction model for the letter-to-phoneme task a crucial step in text-to-speech processing. Our method encompasses three tasks that have been previously handled separately input segmentation phoneme prediction and sequence modeling. The key idea is online discriminative training which updates parameters according to a comparison of the current system output to the desired output allowing us to train all of our components together. By folding the three steps of a pipeline approach into a unified dynamic programming framework we are able to achieve substantial performance gains. Our results surpass the current state-of-the-art on six publicly available data sets representing four different languages. 1 Introduction Letter-to-phoneme L2P conversion is the task of predicting the pronunciation of a word represented as a sequence of phonemes from its orthographic form represented as a sequence of letters. The L2P task plays a crucial role in speech synthesis systems Schroeter et al. 2002 and is an important part of other applications including spelling correction Toutanova and Moore 2001 and speech-to-speech machine translation Engelbrecht and Schultz 2005 . Converting a word into its phoneme representation is not a trivial task. Dictionary-based approaches cannot achieve this goal reliably due to unseen words and proper names. Furthermore the construction of even a modestly-sized pronunciation dictionary requires substantial human effort for each new language. Effective rule-based approaches can be designed for some languages such as Spanish. However Kominek and Black 2006 show that in languages with a less .
đang nạp các trang xem trước