Technical Program

Paper Detail

Paper:	SLP-P13.9
Session:	Speech Synthesis III
Time:	Thursday, May 18, 10:00 - 12:00
Presentation:	Poster
Topic:	Speech and Spoken Language Processing: Tools and data for speech synthesis
Title:	Constructing a Phonetic-Rich Speech Corpus While Controlling Time-Dependent Voice Quality Variability for English Speech Synthesis
Authors:	Jinfu Ni, Toshio Hirai, ATR Spoken Language Communication Research Laboratories, Japan; Hisashi Kawai, KDDI R&D Laboratories, Japan
Abstract:	This paper presents a practical approach to constructing a large-scale speech corpus for corpus-based speech synthesis. This consists of (1) selecting a source text corpus that fits limited target domains; (2) analyzing the source text corpus to obtain the unit statistics; (3) automatically extracting prompt subjects (sentences) from the source text corpus to maximize the intended unit coverage with the given amount of text; and (4) recording prompt subjects while controlling such critical factors that cause undesirable voice variability. This paper describes related computational methods, such as a greedy algorithm for prompt selection, the proximity effects found in a real recording system, and a technique for detecting the timedependent voice variations. While the approach is demonstrated in English, it is also promising for other languages.