tailieunhanh - Improving bottleneck features for Vietnamese large vocabulary continuous speech recognition system using deep neural networks

In this paper, the pre-training method based on denoising auto-encoder is investigated and proved to be good models for initializing bottleneck networks of Vietnamese speech recognition system that result in better recognition performance compared to base bottleneck features reported previously. The experiments are carried out on the dataset containing speeches on Voice of Vietnam channel (VOV). | Journal of Computer Science and Cybernetics, , (2015), 267–276 DOI: IMPROVING BOTTLENECK FEATURES FOR VIETNAMESE LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION SYSTEM USING DEEP NEURAL NETWORKS QUOC BAO NGUYEN1 , TAT THANG VU2 , AND CHI MAI LUONG2 1 University 2 Institute of Information and Communication Technology, Thai Nguyen University; nqbao@ of Information Technology, Vietnam Academy of Science and Technology; vtthang@, lcmai@ Abstract. In this paper, the pre-training method based on denoising auto-encoder is investigated and proved to be good models for initializing bottleneck networks of Vietnamese speech recognition system that result in better recognition performance compared to base bottleneck features reported previously. The experiments are carried out on the dataset containing speeches on Voice of Vietnam channel (VOV). The results show that the DBNF extraction for Vietnamese speech recognition decreases relative word error rate by 14% and 39% compared to the base bottleneck features and MFCC baseline, respectively. Keywords. Deep bottleneck features, neural network, Vietnamese speech recognition. 1. INTRODUCTION In automatic speech recognition systems, features extraction task is an important part of achieving a good recognition performance. Previous works [1,2] have shown that artificial neural networks can be used to extract good, discriminative features that yield better recognition performance than standard feature extraction algorithms like Mel Frequency Cepstral Coefficient (MFCC) and Perceptual Linear Prediction (PLP). One possible approach for this is to train a network with a small bottleneck layer, and then use the activations of the units in this layer to produce feature vectors (“bottleneck features”, BNF [1]) for the remaining parts of the system. Recently, deep learning has gained a lot of attention in the machine learning community. The general objective of this

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.