Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Learning to Say It Well: Reranking Realizations by Predicted Synthesis Quality"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
This paper presents a method for adapting a language generator to the strengths and weaknesses of a synthetic voice, thereby improving the naturalness of synthetic speech in a spoken language dialogue system. The method trains a discriminative reranker to select paraphrases that are predicted to sound natural when synthesized. The ranker is trained on realizer and synthesizer features in supervised fashion, using human judgements of synthetic voice quality on a sample of the paraphrases representative of the generator’s capability. . | Learning to Say It Well Reranking Realizations by Predicted Synthesis Quality Crystal Nakatsu and Michael White Department of Linguistics The Ohio State University Columbus OH 43210 USA cnakatsu mwhite @ling.ohio-state.edu Abstract This paper presents a method for adapting a language generator to the strengths and weaknesses of a synthetic voice thereby improving the naturalness of synthetic speech in a spoken language dialogue system. The method trains a discriminative reranker to select paraphrases that are predicted to sound natural when synthesized. The ranker is trained on realizer and synthesizer features in supervised fashion using human judgements of synthetic voice quality on a sample of the paraphrases representative of the generator s capability. Results from a cross-validation study indicate that discriminative paraphrase reranking can achieve substantial improvements in naturalness on average ameliorating the problem of highly variable synthesis quality typically encountered with today s unit selection synthesizers. 1 Introduction Unit selection synthesis1 a technique which concatenates segments of natural speech selected from a database has been found to be capable of producing high quality synthetic speech especially for utterances that are similar to the speech in the database in terms of style delivery and coverage Black and Lenzo 2001 . In particular in the limited domain of a spoken language dialogue system it is possible to achieve highly natural synthesis with a purpose-built voice Black and Lenzo 2000 . However it can be difficult to develop 1See e.g. Hunt and Black 1996 Black and Taylor 1997 Beutnagel et al. 1999 . a synthetic voice for a dialogue system that produces natural speech completely reliably and thus in practice output quality can be quite variable. Two important factors in this regard are the labeling process for the speech database and the direction of the dialogue system s further development after the voice has been built when