tailieunhanh - Improving the naturalness of concatenative vietnamese speech synthesis under limited data conditions
Building a large speech corpus is a costly and time-consuming task. Therefore, how to build high-quality speech synthesis under limited data conditions is an important issue, specifically for under-resourced language like Vietnamese. As the most natural-sounding speech synthesis is currently concatenative speech synthesis (CSS), it is the target speech synthesis under study in this research. All possible units of a specific phonetic unit set are required for CSS. | Journal of Computer Science and Cybernetics, , (2015), 1–16 DOI: IMPROVING THE NATURALNESS OF CONCATENATIVE VIETNAMESE SPEECH SYNTHESIS UNDER LIMITED DATA CONDITIONS PHUNG TRUNG NGHIA1 , LUONG CHI MAI2 AND MASATO AKAGI3 1 Thai 2 Institute Nguyen University of Information and Communication Technology; of Information Technology, Vietnam Academy of Science and Technology; 3 Japan Advanced Institute of Science and Technology. Email: ptnghia@ Abstract. Building a large speech corpus is a costly and time-consuming task. Therefore, how to build high-quality speech synthesis under limited data conditions is an important issue, specifically for under-resourced language like Vietnamese. As the most natural-sounding speech synthesis is currently concatenative speech synthesis (CSS), it is the target speech synthesis under study in this research. All possible units of a specific phonetic unit set are required for CSS. This requirement might be easy for verbal languages, in which the number of all units of a specific phonetic unit set such as phoneme is relatively small. However, the numbers of all tonal phonetic units are significant in tonal languages, and it is difficult to design a small corpus covering all possible tonal phonetic units. Additionally, as all context-dependent phonetic units are required to ensure the naturalness of corpus-based CSS, it needs a large database with a size up to dozens of gigabytes for concatenation. Therefore, the motivation for this work is to improve the naturalness of CSS under limited data conditions, and both of these two mentioned problems are solved. First, the authors attempt to reduce the number of tonal units required for the CSS of tonal languages by using a method of tone transformation and second to reduce mismatch-context errors in concatenation regions to make the CSS available if matching-context units could not be found from the database. Temporal Decomposition (TD), which is an .
đang nạp các trang xem trước