ICASSP 2006 - May 15-19, 2006 - Toulouse, France

Technical Program

Paper Detail

Paper:SLP-P16.5
Session:Speaker Tracking and Adaptation
Time:Thursday, May 18, 16:30 - 18:30
Presentation: Poster
Topic: Speech and Spoken Language Processing: Speaker adapted training methods
Title: Improving Rapid Unsupervised Speaker Adaptation Based on HMM Sufficient Statistics
Authors: Randy Gomez, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano, National Institute of Advanced Industrial Science and Technology, Japan
Abstract: In real-time speech recognition applications, there is a need to implement a fast and reliable adaptation algorithm. We propose a method to reduce adaptation time of the unsupervised speaker adaptation based on HMM-Sufficient Statistics. We use only a single arbitrary utterance without transcriptions in selecting the N-best speakers' Sufficient Statistics created offline to provide data for adaptation to a target speaker. Further reduction of N-best implies a reduction in adaptation time. However, it degrades recognition performance due to insufficiency of data needed to robustly adapt the model. Linear interpolation of the global HMM-Sufficient Statistics offsets this negative effect and achieves a 50% reduction in adaptation time without compromising the recognition performance. We have reduced the adaptation time from 10 sec to 5 sec without degradation of word accuracy. Furthermore, we compared our method with Vocal Tract Length Normalization (VTLN), Maximum A Posteriori (MAP) and Maximum Likelihood Linear Regression (MLLR). Moreover, we tested in office, car, crowd and booth noise environments in 10 dB, 15 dB, 20 dB and 25 dB SNRs.



IEEESignal Processing Society

©2018 Conference Management Services, Inc. -||- email: webmaster@icassp2006.org -||- Last updated Friday, August 17, 2012