Paper: | SLP-P13.9 |
Session: | Speech Synthesis III |
Time: | Thursday, May 18, 10:00 - 12:00 |
Presentation: |
Poster
|
Topic: |
Speech and Spoken Language Processing: Tools and data for speech synthesis |
Title: |
Constructing a Phonetic-Rich Speech Corpus While Controlling Time-Dependent Voice Quality Variability for English Speech Synthesis |
Authors: |
Jinfu Ni, Toshio Hirai, ATR Spoken Language Communication Research Laboratories, Japan; Hisashi Kawai, KDDI R&D Laboratories, Japan |
Abstract: |
This paper presents a practical approach to constructing a large-scale speech corpus for corpus-based speech synthesis. This consists of (1) selecting a source text corpus that fits limited target domains; (2) analyzing the source text corpus to obtain the unit statistics; (3) automatically extracting prompt subjects (sentences) from the source text corpus to maximize the intended unit coverage with the given amount of text; and (4) recording prompt subjects while controlling such critical factors that cause undesirable voice variability. This paper describes related computational methods, such as a greedy algorithm for prompt selection, the proximity effects found in a real recording system, and a technique for detecting the timedependent voice variations. While the approach is demonstrated in English, it is also promising for other languages. |