Technical Program

Paper Detail

Paper:	SLP-P17.10
Session:	Spoken Language Modeling, Identification and Characterization
Time:	Thursday, May 18, 16:30 - 18:30
Presentation:	Poster
Topic:	Speech and Spoken Language Processing: Language modeling and Adaptation
Title:	BOOTSTRAPPING LANGUAGE MODELS FOR SPOKEN DIALOG SYSTEMS FROM THE WORLD WIDE WEB
Authors:	Dilek Hakkani-Tür, Mazin Gilbert, AT&T Labs – Research, United States
Abstract:	In this paper, we describe our approach for bootstrapping statistical language models for spoken dialog systems using in-domain web data and utterances collected from previous applications. The approach is based on the idea of stitching conversational templates with the predicate and arguments extracted from the web pages using semantic role labeling, to generate conversational style utterances. The conversational templates represent the task-independent portions of user utterances and can be built by hand, or learned from utterances collected from other domain applications. Experiments have shown that, stitching with both types of conversational templates have resulted in significantly better ASR word accuracy. Furthermore, the new language model bootstrapping approach can be combined with unsupervised and active learning to improve word accuracy even with very little in-domain transcribed data.