ICASSP 2006 - May 15-19, 2006 - Toulouse, France

Technical Program

Paper Detail

Paper:SLP-P1.12
Session:Feature Extraction and Modeling
Time:Tuesday, May 16, 10:30 - 12:30
Presentation: Poster
Topic: Speech and Spoken Language Processing: Prosody, dialect, accent and other speech characteristics
Title: SPEAKER OVERLAPS AND ASR ERRORS IN MEETINGS EFFECTS BEFORE, DURING, AND AFTER THE OVERLAP
Authors: Ozgur Cetin, University of California, Berkeley, United States; Elizabeth Shriberg, SRI International / University of California, Berkeley, United States
Abstract: In this paper we analyze automatic speech recognition (ASR) errors made by a state-of-the-art speech recognizer in meetings, with respect to locations of overlapping speech. Our analysis focuses on recognition errors made both {\em during} an overlap, and in the regions immediately {\em preceding} and {\em following} the location of overlapped speech. We devise an experiment paradigm to allow examination of the same foreground speech both with and without naturally-occurring cross-talk. We then analyze ASR errors with respect to a number of parameters, including the number of speakers involved in the overlap, the severity of the cross-talk, and the distance of surrounding nonoverlapping speech from the overlap region. In addition to reporting effects on ASR errors, we discover a number of interesting phenomena. First, we find that overlaps by other speakers tend to occur in high-perplexity regions in the foreground talker's word stream, and the background speakers jump in and leave the overlap in relatively low perplexity regions. Second, we discover that WER after the overlap is consistently higher than that before the overlap. This finding cannot be explained by the recognition process itself, and thus suggests that speakers may modify their speech after being overlapped. This modification appears to be acoustic or prosodic rather than lexical, because perplexities are actually {\em lower} after overlaps than before them. Third, we observe that the detrimental effect of overlaps on ASR performance extends multiple seconds (at least $3$) beyond where they actually occur. Taken together these observations suggest automatic modeling of meetings could benefit from a broader view of the relationship between overlap and ASR in natural conversation.



IEEESignal Processing Society

©2018 Conference Management Services, Inc. -||- email: webmaster@icassp2006.org -||- Last updated Friday, August 17, 2012