Technical Program

Paper Detail

Paper:	SLP-P16.3
Session:	Speaker Tracking and Adaptation
Time:	Thursday, May 18, 16:30 - 18:30
Presentation:	Poster
Topic:	Speech and Spoken Language Processing: Speaker adaptation and normalization (e.g., VTLN)
Title:	Unsupervised Learning of Overlap Speech Model Parameters for Multichannel Speech Activity Detection in Meetings
Authors:	Kornel Laskowski, Tanja Schultz, Carnegie Mellon University, United States
Abstract:	The study of meetings, and multi-party conversation in general, is currently the focus of much attention, calling for more robust and more accurate speech activity detection systems. We present a novel multichannel speech activity detection algorithm, which explicitly models the overlap incurred by participants taking turns at speaking. Parameters for overlapped speech states are estimated during decoding by using and combining knowledge from other observed states in the same meeting, in an unsupervised manner. We demonstrate on the NIST Rich Transcription Spring 2004 data set that the new system almost halves the number of frames missed by a competitive algorithm within regions of overlapped speech. The overall speech detection error on unseen data is reduced by 36% relative.