Paper: | SLP-P20.7 |
Session: | Acoustic Modeling and Adaptation |
Time: | Friday, May 19, 14:00 - 16:00 |
Presentation: |
Poster
|
Topic: |
Speech and Spoken Language Processing: Speaker adaptation and normalization (e.g., VTLN) |
Title: |
MULTI-PARAMETER FREQUENCY WARPING FOR VTLN BY GRADIENT SEARCH |
Authors: |
Sankaran Panchapagesan, Abeer Alwan, University of California, Los Angeles, United States |
Abstract: |
The current method for estimating frequency warping (FW) functions for vocal tract length normalization (VTLN) is by maximizing the ASR likelihood score by an exhaustive search over a grid of FW parameters. Exhaustive search is inefficient when estimating multi-parameter FWs, which have been shown to give improvements in recognition accuracy over single parameter FWs [Mcdonough, 2000]. Here we develop a gradient search algorithm to obtain the optimal FW parameters for MFCC features, since previous work focussed on PLP cepstral features [Mcdonough, 2000]. The novel calculation involved was that of the gradient of the Mel filterbank with respect to the FW parameters. Even for a single parameter, the gradient search method was more efficient than grid search by a factor of around 1.6 on the average for male children speakers tested on models trained from adult males. When used to estimate multi-parameter sine-log allpass transform (SLAPT, [Mcdonough, 2000]) FWs for VTLN, more than 50\% reduction in word error rate was obtained with five parameter SLAPT compared to single-parameter piecewise linear FW. |