tailieunhanh - Báo cáo khoa học: "Unsupervised Learning of Acoustic Sub-word Units"

Accurate unsupervised learning of phonemes of a language directly from speech is demonstrated via an algorithm for joint unsupervised learning of the topology and parameters of a hidden Markov model (HMM); states and short state-sequences through this HMM correspond to the learnt sub-word units. The algorithm, originally proposed for unsupervised learning of allophonic variations within a given phoneme set, has been adapted to learn without any knowledge of the phonemes. | Unsupervised Learning of Acoustic Sub-word Units Balakrishnan Varadarajan and Sanjeev Khudanpur Emmanuel Dupoux Center for Language and Speech Processing Laboratoire de Science Cognitive Johns Hopkins University et Psycholinguistique Baltimore MD 21218 75005 Paris France bvarada2 khudanpur @ Abstract Accurate unsupervised learning of phonemes of a language directly from speech is demonstrated via an algorithm for joint unsupervised learning of the topology and parameters of a hidden Markov model HMM states and short state-sequences through this HMM correspond to the learnt sub-word units. The algorithm originally proposed for unsupervised learning of allophonic variations within a given phoneme set has been adapted to learn without any knowledge of the phonemes. An evaluation methodology is also proposed whereby the state-sequence that aligns to a test utterance is transduced in an automatic manner to a phoneme-sequence and compared to its manual transcription. Over 85 phoneme recognition accuracy is demonstrated for speaker-dependent learning from fluent large-vocabulary speech. 1 Automatic Discovery of Phone me s Statistical models learnt from data are extensively used in modern automatic speech recognition ASR systems. Transcribed speech is used to estimate conditional models of the acoustics given a phonemesequence. The phonemic pronunciation of words and the phonemes of the language however are derived almost entirely from linguistic knowledge. In this paper we investigate whether the phonemes may be learnt automatically from the speech signal. Automatic learning of phoneme-like units has significant implications for theories of language acquisition in babies but our considerations here are somewhat more technological. We are interested in developing ASR systems for languages or dialects This work was partially supported by National Science Foundation Grants No IIS-0534359 and OISE-0530118. for which such linguistic knowledge .