ICASSP 2006 - May 15-19, 2006 - Toulouse, France

Technical Program

Paper Detail

Paper:SLP-P13.9
Session:Speech Synthesis III
Time:Thursday, May 18, 10:00 - 12:00
Presentation: Poster
Topic: Speech and Spoken Language Processing: Tools and data for speech synthesis
Title: Constructing a Phonetic-Rich Speech Corpus While Controlling Time-Dependent Voice Quality Variability for English Speech Synthesis
Authors: Jinfu Ni, Toshio Hirai, ATR Spoken Language Communication Research Laboratories, Japan; Hisashi Kawai, KDDI R&D Laboratories, Japan
Abstract: This paper presents a practical approach to constructing a large-scale speech corpus for corpus-based speech synthesis. This consists of (1) selecting a source text corpus that fits limited target domains; (2) analyzing the source text corpus to obtain the unit statistics; (3) automatically extracting prompt subjects (sentences) from the source text corpus to maximize the intended unit coverage with the given amount of text; and (4) recording prompt subjects while controlling such critical factors that cause undesirable voice variability. This paper describes related computational methods, such as a greedy algorithm for prompt selection, the proximity effects found in a real recording system, and a technique for detecting the timedependent voice variations. While the approach is demonstrated in English, it is also promising for other languages.



IEEESignal Processing Society

©2018 Conference Management Services, Inc. -||- email: webmaster@icassp2006.org -||- Last updated Friday, August 17, 2012