Technical Program

Paper Detail

Paper:	SLP-P17.6
Session:	Spoken Language Modeling, Identification and Characterization
Time:	Thursday, May 18, 16:30 - 18:30
Presentation:	Poster
Topic:	Speech and Spoken Language Processing: Language modeling and Adaptation
Title:	Efficient Estimation of Language Model Statistics of Spontaneous Speech via Statistical Transformation Model
Authors:	Yuya Akita, Tatsuya Kawahara, Kyoto University, Japan
Abstract:	One of the most significant problems in language modeling of spontaneous speech such as meetings and lectures is that only limited amount of matched training data, i.e. faithful transcript for the relevant task domain, is available. In this paper, we propose a novel transformation approach to estimate language model statistics of spontaneous speech from a document-style text database, which is often available with a large scale. The proposed statistical transformation model is designed for modeling characteristic linguistic phenomena in spontaneous speech and estimating their occurrence probabilities. These contextual patterns and probabilities are derived from a small amount of parallel aligned corpus of the faithful transcripts and their document-style texts. To realize wide coverage and reliable estimation, a model based on part-of-speech (POS) is also prepared to provide a back-off scheme from a word-based model. The approach has been successfully applied to estimation of the language model for National Congress meetings from their minute archives, and significant reduction of test-set perplexity is achieved.