tailieunhanh - Báo cáo khoa học: "An HMM-Based Approach to Automatic Phrasing for Mandarin Textto-Speech Synthesis"
Automatic phrasing is essential to Mandarin textto-speech synthesis. We select word format as target linguistic feature and propose an HMMbased approach to this issue. Then we define four states of prosodic positions for each word when employing a discrete hidden Markov model. The approach achieves high accuracy of roughly 82%, which is very close to that from manual labeling. Our experimental results also demonstrate that this approach has advantages over those part-ofspeech-based ones. | An HMM-Based Approach to Automatic Phrasing for Mandarin Text-to-Speech Synthesis Jing Zhu Department of Electronic Engineering Shanghai Jiao Tong University zhuj ing@sj Jian-Hua Li Department of Electronic Engineering Shanghai Jiao Tong University lijh888@sj Abstract Automatic phrasing is essential to Mandarin text-to-speech synthesis. We select word format as target linguistic feature and propose an HMM-based approach to this issue. Then we define four states of prosodic positions for each word when employing a discrete hidden Markov model. The approach achieves high accuracy of roughly 82 which is very close to that from manual labeling. Our experimental results also demonstrate that this approach has advantages over those part-of-speech-based ones. 1 Introduction Owing to the limitation of vital capacity and contextual information breaks or pauses are always an important ingredient of human speech. They play a great role in signaling structural boundaries. Similarly in the area of text-to-speech TTS synthesis assigning breaks is very crucial to naturalness and intelligibility particularly in long sentences. The challenge in achieving naturalness mainly results from prosody generation in TTS synthesis. Generally speaking prosody deals with phrasing loudness duration and speech intonation. Among these prosodic features phrasing divides utterances into meaningful chunks of information called hierarchic breaks. However there is no unique solution to prosodic phrasing in most cases. Different solution in phrasing can result in different meaning that a listener could perceive. Considering its importance recent TTS research has focused on automatic prediction of prosodic phrase based on the part-of-speech POS feature or syntactic structure Black and Taylor 1994 Klatt 1987 Wightman 1992 Hirschberg 1996 Wang 1995 Taylor and Black 1998 . To our understanding POS is a grammarbased structure that can be extracted from text. There is no explicit .
đang nạp các trang xem trước