ICASSP 2006 - May 15-19, 2006 - Toulouse, France

Technical Program

Paper Detail

Paper:SLP-L10.4
Session:Speaker Adaptation
Time:Friday, May 19, 11:00 - 11:20
Presentation: Lecture
Topic: Speech and Spoken Language Processing: Speaker adaptation and normalization (e.g., VTLN)
Title: IMPROVING REFERENCE SPEAKER WEIGHTING ADAPTATION BY THE USE OF MAXIMUM-LIKELIHOOD REFERENCE SPEAKERS
Authors: Brian Mak, Tsz-Chung Lai, Hong Kong University of Science and Technology, Hong Kong SAR of China; Roger Hsiao, Carnegie Mellon University, United States
Abstract: We would like to revisit a simple fast adaptation technique called reference speaker weighting (RSW). RSW is similar to eigenvoice (EV) adaptation, and simply requires the model of a new speaker to lie on the span of a set of reference speaker vectors. In the original RSW, the reference speakers are computed through a hierarchical speaker clustering (HSC) algorithm using information such as the gender and speaking rate. We show in this paper that RSW adaptation may be improved if those training speakers that have the highest likelihoods of the adaptation data are selected as the reference speakers; we call them the maximum-likelihood (ML) reference speakers. When RSW adaptation was evaluated on WSJ0 using 5s of adaptation speech, the word error rate reduction can be boosted from 2.54% to 9.15% by using 10 ML reference speakers instead of reference speakers determined from HSC. Moreover, when compared with EV, MAP, MLLR, and eKEV on fast adaptation, we are surprised that the algorithmically simplest RSW technique actually gives the best performance.



IEEESignal Processing Society

©2018 Conference Management Services, Inc. -||- email: webmaster@icassp2006.org -||- Last updated Friday, August 17, 2012