Technical Program

Paper Detail

Paper:	SLP-P13.3
Session:	Speech Synthesis III
Time:	Thursday, May 18, 10:00 - 12:00
Presentation:	Poster
Topic:	Speech and Spoken Language Processing: Prosody, Emotional, and Expressive Synthesis
Title:	Pronunciation Variant Selection for Spontaneous Speech Synthesis - Listening Effort as a Quality Parameter
Authors:	Steffen Werner, Matthias Wolff, Rüdiger Hoffmann, Dresden University of Technology, Germany
Abstract:	In previous works we introduced different duration control methods in speech synthesis. The most outstanding approach is to control the grapheme to phoneme conversion (and thus indirectly control the speaking rate) by selecting (reduced) pronunciation variants according to a pronunciation variant sequence model. Listeners would only accept long synthesized utterances if the listening effort is nearly the same as the one when listening to natural speech. To evaluate the quality of the variant synthesis compared to the canonical one (as the state-of-the-art system), we performed a listening test with two different synthesis systems. The variant synthesis applying a pronunciation variant sequence model achieved a significant lower listening effort and a higher overall rate (MOS) compared to the canonical synthesis. We also show that the listening effort can act as a quality parameter for a speech sample. The rating for the listening effort is correlated with the rating of the naturalness and intelligibility of synthesized speech sample.