Paper: | SLP-P13.11 |
Session: | Speech Synthesis III |
Time: | Thursday, May 18, 10:00 - 12:00 |
Presentation: |
Poster
|
Topic: |
Speech and Spoken Language Processing: Tools and data for speech synthesis |
Title: |
Database Pruning for Unsupervised Building of Text-to-Speech Voices |
Authors: |
Jordi Adell, Pablo Daniel Agüero, Antonio Bonafonte, Universitat Politècnica de Catalunya (UPC), Spain |
Abstract: |
Unit Selection speech synthesis techniques lead the speech synthesis state of the art. Automatic segmentation of databases is necessary in order to build new voices. They may contain errors and segmentation processes may introduce some more. Quality systems require a significant effort to find and correct these segmentation errors. Phonetic transcription is crucial and is one of the manually supervised tasks. The possibility to automatically remove incorrectly transcribed units from the inventory will help to make the process more automatic. Here we present a new technique based on speech recognition confidence measures that reaches to remove 90% of incorrectly transcribed units from a database. The cost for it is loosing only a 10% of correctly transcribed units. |