Đang chuẩn bị liên kết để tải về tài liệu:
An efficient hardware architecture for HMM-based TTS system
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
This work proposes a hardware architecture for HMM-based text-to-speech synthesis system (HTS). In high speed platforms, HTS with software core-engine can satisfy the requirement of real-time processing. However, in low speed platforms, software core-engine consumes long time-cost to complete the synthesis process. A co-processor was designed and integrated into HTS to accelerate the performance of system. | Science & Technology Development, Vol 18, No.T4-2015 An efficient hardware architecture for HMM-based TTS system Su Hong Kiet Huynh Huu Thuan Bui Trong Tu University of Sciences, VNU-HCM (Received on December 05 th 2014, accepted on September 23rd 2015) ABSTRACT This work proposes a hardware platforms, software core-engine consumes architecture for HMM-based text-to-speech long time-cost to complete the synthesis synthesis system (HTS). In high speed process. A co-processor was designed and platforms, HTS with software core-engine integrated into HTS to accelerate the can satisfy the requirement of real-time performance of system. processing. However, in low speed Keywords: text-to-speech synthesis, HMM, HTS, SoPC, FPGA. INTRODUCTION A HTS consists two parts of training part and synthesis part as shown in Fig. 1. In the training part, a context-dependent HMM database is trained from a speech database. The trained context-dependent HMM database consists of models for spectrum, pitch and state duration; and decision trees for spectrum, pitch and state duration. Then, the trained context-dependent HMM database is used by the synthesis part to generate the speech waveform from the given text. Fig. 1. Scheme of HTS Trang 210 TAÏP CHÍ PHAÙT TRIEÅN KH&CN, TAÄP 18, SOÁ T4- 2015 In the synthesis part, the given text is analyzed and converted into label a sequence. According to the label sequence, an HMM sentence is constructed by concatenating HMMs taken form the trained HMM database. And then, excitation and spectral parameters are extracted from HMM sentence. The extracted excitation and spectral parameters are fed to a synthesis filter to synthesize speech waveform. Depending on the fact that the spectral parameter is presented as mel-cesptral coefficients or melgeneralized cepstral coefficients, the synthesis filter is constructed as an MLSA filter or an MGLSA filter, respectively. In recent research, HTS is applied to many languages such as Japanese [1],