Paper: | SLP-P7.2 |
Session: | Audio-visual and Multimodal Processing |
Time: | Wednesday, May 17, 10:00 - 12:00 |
Presentation: |
Poster
|
Topic: |
Speech and Spoken Language Processing: Multi-modal/multimedia processing (such as audio/visual, etc) |
Title: |
A Dempster-Shafer Based Fusion Approach for Audio-Visual Speech Recognition with Application to Large Vocabulary French Speech |
Authors: |
Samuel Foucher, France Laliberté, Gilles Boulianne, Langis Gagnon, CRIM, Canada |
Abstract: |
This work explores a new way of fusing audio and visual information for audio-visual automatic speech recognition in the context of a large vocabulary application. Mouth shape information is extracted off-line and integrated into a speech recognition system using a phoneme-based Dempster-Shafer fusion approach. The fusion methodology assumes that the audio information about the phonemes is a precise Bayesian source while the visual information is an imprecise evidential source. This ensures the visual information does not degrade significantly the audio information in situation where the audio performs well in controlled noiseless environment. Bayesian and simple consonance belief structures are explored and compared, along with standard stack-based fusion. |