Technical Program

Paper Detail

Paper:	SLP-P1.8
Session:	Feature Extraction and Modeling
Time:	Tuesday, May 16, 10:30 - 12:30
Presentation:	Poster
Topic:	Speech and Spoken Language Processing: Prosody, dialect, accent and other speech characteristics
Title:	Speech Recognition Using Syllable Duration Ratio Model
Authors:	Masahide Ariu, Takashi Masuko, Shinichi Tanaka, Akinori Kawamura, Toshiba Corporation, Japan
Abstract:	This paper describes a novel approach to duration information modeling for speech recognition. To eliminate the influence of speaking rate on the duration model, we propose a model utilizing the duration ratios of two successive syllables by log-normal distributions. We refer to this model as a syllable duration ratio model (SDRM). Recognition experiments are conducted on isolated word and connected digit recognition tasks under noisy conditions. Experimental results show that the SDRM reduced the errors by approximately 30% compared to the baseline system at 15dB or higher SNR in 10 digits recognition tasks. In addition, we show that the SDRM is robust with respect to the difference in speaking rate between training and test data.