| Paper: | SLP-P7.7 | 
| Session: | Audio-visual and Multimodal Processing | 
| Time: | Wednesday,  May 17, 10:00 - 12:00 | 
| Presentation: | Poster | 
	 | Topic: | Speech and Spoken Language Processing: Multi-modal/multimedia processing (such as audio/visual, etc) | 
	
	 | Title: | Learning Edit Machines for Robust Multimodal Understanding | 
	| Authors: | Michael Johnston, Srinivas Bangalore, AT&T Labs – Research, United States | 
  | Abstract: | Multimodal grammars provide an expressive formalism for multimodal integration and understanding. However, hand-crafted multimodal grammars can be brittle with respect to unexpected, erroneous, or disfluent inputs. In previous work, we have shown how the robustness of stochastic language models can be combined with the expressiveness of multimodal grammars by adding a finite-state edit machine to the multimodal language processing cascade. In this paper, we present an approach where the edits are trained from data using a noisy channel model paradigm. We evaluate this model and compare its performance against hand-crafted edit machines from our previous work in the context of a multimodal conversational system (MATCH). |