ICASSP 2006 - May 15-19, 2006 - Toulouse, France

Technical Program

Paper Detail

Paper:SLP-P20.7
Session:Acoustic Modeling and Adaptation
Time:Friday, May 19, 14:00 - 16:00
Presentation: Poster
Topic: Speech and Spoken Language Processing: Speaker adaptation and normalization (e.g., VTLN)
Title: MULTI-PARAMETER FREQUENCY WARPING FOR VTLN BY GRADIENT SEARCH
Authors: Sankaran Panchapagesan, Abeer Alwan, University of California, Los Angeles, United States
Abstract: The current method for estimating frequency warping (FW) functions for vocal tract length normalization (VTLN) is by maximizing the ASR likelihood score by an exhaustive search over a grid of FW parameters. Exhaustive search is inefficient when estimating multi-parameter FWs, which have been shown to give improvements in recognition accuracy over single parameter FWs [Mcdonough, 2000]. Here we develop a gradient search algorithm to obtain the optimal FW parameters for MFCC features, since previous work focussed on PLP cepstral features [Mcdonough, 2000]. The novel calculation involved was that of the gradient of the Mel filterbank with respect to the FW parameters. Even for a single parameter, the gradient search method was more efficient than grid search by a factor of around 1.6 on the average for male children speakers tested on models trained from adult males. When used to estimate multi-parameter sine-log allpass transform (SLAPT, [Mcdonough, 2000]) FWs for VTLN, more than 50\% reduction in word error rate was obtained with five parameter SLAPT compared to single-parameter piecewise linear FW.



IEEESignal Processing Society

©2018 Conference Management Services, Inc. -||- email: webmaster@icassp2006.org -||- Last updated Friday, August 17, 2012