ICASSP 2006 - May 15-19, 2006 - Toulouse, France

Technical Program

Paper Detail

Paper:IMDSP-P4.9
Session:Image/Video Indexing and Retrieval
Time:Tuesday, May 16, 16:30 - 18:30
Presentation: Poster
Topic: Image and Multidimensional Signal Processing: Video Indexing, Retrieval and Editing
Title: A Mid-level Scene Change Representation via Audiovisual Alignment
Authors: Jinqiao Wang, Chinese Academy of Sciences, China; Lingyu Duan, Institute for Infocomm Research, Singapore; Hanqing Lu, Chinese Academy of Sciences, China; Jesse S. Jin, University of Newcastle, Australia; Changsheng Xu, Institute for Infocomm Research, Singapore
Abstract: Scene is a series of semantic correlated video shots. An effective scene detection depends on domain knowledge more or less. Most existing approaches try to directly detect various scene changes by applying clustering or supervised learning methods to low level audiovisual features. However, robustly detecting diverse scene changes derived from complex semantic meanings is still a challenging problem. In this paper we are focused on the association of visual signal changes (e.g. cuts, fade-in, fade-out, etc.) and audio signal changes (e.g. speaker change, background music change, etc.) to propose a mid-level scene change representation, which is meant to locate candidate scene change points by characterizing temporally uncorrelated properties of audio and visual track in case of scene change happening. By incorporating domain knowledge, enhanced features can be further extracted to complement this representation to bridge semantic gap towards scene change detection. We utilize camera motion estimation algorithm to detect visual signal changes. Such visual change positions are selected as time-stamp points. An alignment is performed to search for candidate audio signal change positions by multi-scale Kullback-Leibler(K-L) distance computing. Both metric-based K-L distance approach and model-based HMM are applied to determine true audio signal changes. The associated visual and audio signal changes are considered as the mid-level scene change representation. This representation has been successfully applied to detect boundaries of individual commercial in TV broadcast stream with an accuracy of around 95\%. Particularly the systematic alignment approach can be utilized in video summarization.



IEEESignal Processing Society

©2018 Conference Management Services, Inc. -||- email: webmaster@icassp2006.org -||- Last updated Friday, August 17, 2012