tailieunhanh - VOS: The corpus based Vietnamese text to speech system

This paper presents a complete specification of the Vietnamese speech synthesis system named VOS (Voice of Southern Vietnam). Due to the fact that current Vietnamese text-to-speech systems lack the naturalness of output synthetic speech, VOS is based on the unit selection approach which aims to achieve maximum naturalness. | VOS the Corpus-Based Vietnamese Text-to-Speech System Vo Quang Dieu Ha Nguyen Manh Tuan Cao Xuan Nam Phạm Minh Nhut Vu Hai Quan University of Science Vienam National University Ho Chi Minh city Email vhquan@ Abstract This paper presents a complete specification of the Vietnamese speech synthesis system named VOS Voice of Southern Vietnam . Due to the fact that current Vietnamese text-to-speech systems lack the naturalness of output synthetic speech VOS is based on the unit selection approach which aims to achieve maximum naturalness. There are three main parts constituting VOS a corpus manager a synthesizer and a transliteration model. Corpus manager manages automated speech indexing and segmentation for unit selection executed by the synthesizer .while transliteration model deals with the pronunciation of words in foreign languages. A comparative experimental evaluation of VnSpeech VietVoice and VOS is conducted using ITU-T standard. Results show that VOS outperforms the former two TTS systems. Key words VOS Vietnamese speech synthesis text-to-speech corpus-based unit selection I. INTRODUCTION Speech synthesis is a task of generating artificial utterances similar to human speech. This field of study is also known as text-to-speech TTS - the process of converting written text into speech. TTS systems have been studied and developed for different languages since 1968. There are four primary approaches in building a TTS system concatenative synthesis formant synthesis articulatory synthesis and statistical parametric synthesis. Various TTS systems have been recently developed for many languages including Japanese 5 Korean 6 Chinese 3 Thai 9 etc. In this paper VOS Voice of Southern Vietnam a Vietnamese TTS system is presented. Vietnamese is a monosyllable tonal language. Each word unit is pronounced as a syllable and its meaning depends on the tone. There are about 6596 phonetically distinguishable syllables 4 which comprise of legal combinations