Paper: | SLP-P13.3 |
Session: | Speech Synthesis III |
Time: | Thursday, May 18, 10:00 - 12:00 |
Presentation: |
Poster
|
Topic: |
Speech and Spoken Language Processing: Prosody, Emotional, and Expressive Synthesis |
Title: |
Pronunciation Variant Selection for Spontaneous Speech Synthesis - Listening Effort as a Quality Parameter |
Authors: |
Steffen Werner, Matthias Wolff, RĂ¼diger Hoffmann, Dresden University of Technology, Germany |
Abstract: |
In previous works we introduced different duration control methods in speech synthesis. The most outstanding approach is to control the grapheme to phoneme conversion (and thus indirectly control the speaking rate) by selecting (reduced) pronunciation variants according to a pronunciation variant sequence model. Listeners would only accept long synthesized utterances if the listening effort is nearly the same as the one when listening to natural speech. To evaluate the quality of the variant synthesis compared to the canonical one (as the state-of-the-art system), we performed a listening test with two different synthesis systems. The variant synthesis applying a pronunciation variant sequence model achieved a significant lower listening effort and a higher overall rate (MOS) compared to the canonical synthesis. We also show that the listening effort can act as a quality parameter for a speech sample. The rating for the listening effort is correlated with the rating of the naturalness and intelligibility of synthesized speech sample. |