Paper: | SLP-P1.8 |
Session: | Feature Extraction and Modeling |
Time: | Tuesday, May 16, 10:30 - 12:30 |
Presentation: |
Poster
|
Topic: |
Speech and Spoken Language Processing: Prosody, dialect, accent and other speech characteristics |
Title: |
Speech Recognition Using Syllable Duration Ratio Model |
Authors: |
Masahide Ariu, Takashi Masuko, Shinichi Tanaka, Akinori Kawamura, Toshiba Corporation, Japan |
Abstract: |
This paper describes a novel approach to duration information modeling for speech recognition. To eliminate the influence of speaking rate on the duration model, we propose a model utilizing the duration ratios of two successive syllables by log-normal distributions. We refer to this model as a syllable duration ratio model (SDRM). Recognition experiments are conducted on isolated word and connected digit recognition tasks under noisy conditions. Experimental results show that the SDRM reduced the errors by approximately 30% compared to the baseline system at 15dB or higher SNR in 10 digits recognition tasks. In addition, we show that the SDRM is robust with respect to the difference in speaking rate between training and test data. |