tailieunhanh - Báo cáo khoa học: "Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach"

We address the problem of learning the mapping between words and their possible pronunciations in terms of sub-word units. Most previous approaches have involved generative modeling of the distribution of pronunciations, usually trained to maximize likelihood. We propose a discriminative, feature-rich approach using large-margin learning. This approach allows us to optimize an objective closely related to a discriminative task, to incorporate a large number of complex features, and still do inference efficiently. . | Discriminative Pronunciation Modeling A Large-Margin Feature-Rich Approach Hao Tang Joseph Keshet and Karen Livescu Toyota Technological Institute at Chicago Chicago IL USA haotang jkeshet klivescu @ Abstract We address the problem of learning the mapping between words and their possible pronunciations in terms of sub-word units. Most previous approaches have involved generative modeling of the distribution of pronunciations usually trained to maximize likelihood. We propose a discriminative feature-rich approach using large-margin learning. This approach allows us to optimize an objective closely related to a discriminative task to incorporate a large number of complex features and still do inference efficiently. We test the approach on the task of lexical access that is the prediction of a word given a phonetic transcription. In experiments on a subset of the Switchboard conversational speech corpus our models thus far improve classification error rates from a previously published result of to about 15 . We find that large-margin approaches outperform conditional random field learning and that the Passive-Aggressive algorithm for large-margin learning is faster to converge than the Pegasos algorithm. 1 Introduction One of the problems faced by automatic speech recognition especially of conversational speech is that of modeling the mapping between words and their possible pronunciations in terms of sub-word units such as phones. While pronouncing dictionaries provide each word s canonical pronunciation s in terms of phoneme strings running speech often includes pronunciations that differ greatly from 194 the dictionary. For example some pronunciations of probably in the Switchboard conversational speech database are p r aa b iy p r aa l iy p r ay and p ow ih Greenberg et al. 1996 . While some words . common words are more prone to such variation than others the effect is extremely general In the phonetically transcribed portion of Switchboard fewer

TỪ KHÓA LIÊN QUAN