Technical Program

Paper Detail

Paper:	SLP-P7.7
Session:	Audio-visual and Multimodal Processing
Time:	Wednesday, May 17, 10:00 - 12:00
Presentation:	Poster
Topic:	Speech and Spoken Language Processing: Multi-modal/multimedia processing (such as audio/visual, etc)
Title:	Learning Edit Machines for Robust Multimodal Understanding
Authors:	Michael Johnston, Srinivas Bangalore, AT&T Labs – Research, United States
Abstract:	Multimodal grammars provide an expressive formalism for multimodal integration and understanding. However, hand-crafted multimodal grammars can be brittle with respect to unexpected, erroneous, or disfluent inputs. In previous work, we have shown how the robustness of stochastic language models can be combined with the expressiveness of multimodal grammars by adding a finite-state edit machine to the multimodal language processing cascade. In this paper, we present an approach where the edits are trained from data using a noisy channel model paradigm. We evaluate this model and compare its performance against hand-crafted edit machines from our previous work in the context of a multimodal conversational system (MATCH).