ICASSP 2006 - May 15-19, 2006 - Toulouse, France

Technical Program

Paper Detail

Paper:SLP-P7.3
Session:Audio-visual and Multimodal Processing
Time:Wednesday, May 17, 10:00 - 12:00
Presentation: Poster
Topic: Speech and Spoken Language Processing: Multi-modal/multimedia processing (such as audio/visual, etc)
Title: An analysis of visual speech information applied to voice activity detection
Authors: David Sodoyer, Bertrand Rivet, Laurent Girin, Jean-Luc Schwartz, ICP / INPG, France; Christian Jutten, Laboratory of Image and Signal (LIS), France
Abstract: We present a new approach to the voice activity detection (VAD) problem for speech signals embedded in non-statio-nary noise. The method is based on automatic lipreading: the objective is to detect voice activity or non-activity by exploiting the coherence between the speech acoustic signal and the speaker's lip movements. From a comprehensive analysis of lip shape parameters during speech and non-speech events, we show that a single appropriate visual parameter, defined to characterize the lip movements, can be used for the detection of sections of voice activity or more precisely, for the detection of silence sections. Detection scores obtained on spontaneous speech confirm the efficien-cy of the visual voice activity detector (VVAD).



IEEESignal Processing Society

©2018 Conference Management Services, Inc. -||- email: webmaster@icassp2006.org -||- Last updated Friday, August 17, 2012