Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Personalising speech-to-speech translation in the EMIME project"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
In the EMIME project we have studied unsupervised cross-lingual speaker adaptation. We have employed an HMM statistical framework for both speech recognition and synthesis which provides transformation mechanisms to adapt the synthesized voice in TTS (text-to-speech) using the recognized voice in ASR (automatic speech recognition). An important application for this research is personalised speech-to-speech translation that will use the voice of the speaker in the input language to utter the translated sentences in the output language. . | Personalising speech-to-speech translation in the EMIME project Mikko Kurimo1 William Byrne6 John Dines3 Philip N. Garner3 Matthew Gibson6 Yong Guan5 Teemu Hirsimaki1 Reima Karhila1 Simon King2 Hui Liang3 Keiichiro Oura4 Lakshmi Saheer3 Matt Shannon6 Sayaka Shiota4 Jilei Tian5 Keiichi Tokuda4 Mirjam Wester2 Yi-Jian Wu4 Junichi Yamagishi2 1 Aalto University Finland 2 University of Edinburgh UK 3 Idiap Research Institute Switzerland 4 Nagoya Institute of Technology Japan 5 Nokia Research Center Beijing China 6 University of Cambridge UK Corresponding author Mikko.Kurimo@tkk.fi Abstract In the EMIME project we have studied unsupervised cross-lingual speaker adaptation. We have employed an HMM statistical framework for both speech recognition and synthesis which provides transformation mechanisms to adapt the synthesized voice in TTS text-to-speech using the recognized voice in ASR automatic speech recognition . An important application for this research is personalised speech-to-speech translation that will use the voice of the speaker in the input language to utter the translated sentences in the output language. In mobile environments this enhances the users interaction across language barriers by making the output speech sound more like the original speaker s way of speaking even if she or he could not speak the output language. 1 Introduction A mobile real-time speech-to-speech translation S2ST device is one of the grand challenges in natural language processing NLP . It involves several important NLP research areas automatic speech recognition ASR statistical machine translation SMT and speech synthesis also known as text-to-speech TTS . In recent years significant advance have also been made in relevant technological devices the size of powerful computers has decreased to fit in a mobile phone and fast WiFi and 3G networks have spread widely to connect them to even more powerful computation servers. Several hand-held S2ST applications and devices have already .