Technical Program

Paper Detail

Paper:	SLP-P10.1
Session:	Speech Synthesis II
Time:	Wednesday, May 17, 14:00 - 16:00
Presentation:	Poster
Topic:	Speech and Spoken Language Processing: Segmental-Level and/or concatenative synthesis
Title:	LSM-Based Boundary Training for Concatenative Speech Synthesis
Authors:	Jerome Bellegarda, Apple Computer, United States
Abstract:	The level of quality that can be achieved in concatenative text-to-speech synthesis depends, among other things, on a judicious chiseling of the inventory used in unit selection. Unit boundary optimization, in particular, can make a huge difference in the users' perception of the concatenated acoustic waveform. This paper considers the iterative refinement of unit boundaries based on a data-driven feature extraction framework separately optimized for each boundary region. Such unsupervised boundary training guarantees a globally optimal cut point between any two matching units in the inventory. This optimization is objectively characterized, first in terms of convergence behavior, and then by comparing the average inter-unit discontinuity obtained before and after training. Experimental results and listening evidence both underscore the viability of this approach for unit boundary optimization.