Paper: | SLP-P13.1 |
Session: | Speech Synthesis III |
Time: | Thursday, May 18, 10:00 - 12:00 |
Presentation: |
Poster
|
Topic: |
Speech and Spoken Language Processing: Segmental-Level and/or concatenative synthesis |
Title: |
SCALABLE IMPLEMENTATION OF UNIT SELECTION BASED TEXT-TO-SPEECH SYSTEM FOR EMBEDDED SOLUTIONS |
Authors: |
Nobuo Nukaga, Ryota Kamoshida, Kenji Nagamatsu, Yoshinori Kitahara, Hitachi, Ltd., Japan |
Abstract: |
In this paper, we propose two methods in order to implement unit selection-based text-to-speech engine into resource limited embedded systems. While we have achieved improving the quality of synthesized speech by unit selection-based text-to-speech engine, there is a practical problem regarding trade-off between the size of database and the quality. That is, we need large database and expensive computation to generate highly natural sounding voices and a system is required to meet the specifications of target systems. For this problem, we introduce methods to reduce the size of speech database based on frequency-based approach. From the experiments, step-by-step downsizing method was better than direct one in terms of the cumulative join cost and the target cost. Furthermore, some techniques for implementation are presented and evaluated on an open tool kit for embedded system. From experimental results, it developed that the runtime work load for test sentences was approximately 80 MIPS and the implemented engine was useful and scalable for mid-class embedded system. |