Paper: | SLP-P16.5 |
Session: | Speaker Tracking and Adaptation |
Time: | Thursday, May 18, 16:30 - 18:30 |
Presentation: |
Poster
|
Topic: |
Speech and Spoken Language Processing: Speaker adapted training methods |
Title: |
Improving Rapid Unsupervised Speaker Adaptation Based on HMM Sufficient Statistics |
Authors: |
Randy Gomez, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano, National Institute of Advanced Industrial Science and Technology, Japan |
Abstract: |
In real-time speech recognition applications, there is a need to implement a fast and reliable adaptation algorithm. We propose a method to reduce adaptation time of the unsupervised speaker adaptation based on HMM-Sufficient Statistics. We use only a single arbitrary utterance without transcriptions in selecting the N-best speakers' Sufficient Statistics created offline to provide data for adaptation to a target speaker. Further reduction of N-best implies a reduction in adaptation time. However, it degrades recognition performance due to insufficiency of data needed to robustly adapt the model. Linear interpolation of the global HMM-Sufficient Statistics offsets this negative effect and achieves a 50% reduction in adaptation time without compromising the recognition performance. We have reduced the adaptation time from 10 sec to 5 sec without degradation of word accuracy. Furthermore, we compared our method with Vocal Tract Length Normalization (VTLN), Maximum A Posteriori (MAP) and Maximum Likelihood Linear Regression (MLLR). Moreover, we tested in office, car, crowd and booth noise environments in 10 dB, 15 dB, 20 dB and 25 dB SNRs. |