ICASSP 2006 - May 15-19, 2006 - Toulouse, France

Technical Program

Paper Detail

Paper:SLP-L2.1
Session:Advances in Robust Speech Recognition
Time:Tuesday, May 16, 14:00 - 14:20
Presentation: Lecture
Topic: Speech and Spoken Language Processing: Model-based robust Speech Recognition
Title: Discriminatively Trained Context-Dependent Duration-Bigram Models for Korean Digit Recognition
Authors: Daniel Willett, Franz Gerl, Raymond Brueckner, Harman/Becker Automotive Systems, Germany
Abstract: The recognition of continuously spoken Korean digits is well known to be a particularly challenging task among small vocabulary recognition problems. In this paper, we review and evaluate our acoustic modeling efforts for the purpose of efficient high-accuracy recognition of Korean digit strings for in-car applications. The measures comprise context-dependent word models, duration-dependent distribution functions, error-weighted discriminative training as well as a compressed bigram model that strongly constrains the HMM state durations. Finally, an average word error rate across multiple channel and noise conditions of below 3% is achieved, which is a relative reduction of 62% over the error observed with traditional context-independent digit modeling techniques and about 36% relative error reduction compared to ML-trained context-dependent digit models of ordinary linear topology. Fast unsupervised model adaptation during decoding yields additional 10% of relative improvement.



IEEESignal Processing Society

©2018 Conference Management Services, Inc. -||- email: webmaster@icassp2006.org -||- Last updated Friday, August 17, 2012