Technical Program

Paper Detail

Paper:	SLP-P17.4
Session:	Spoken Language Modeling, Identification and Characterization
Time:	Thursday, May 18, 16:30 - 18:30
Presentation:	Poster
Topic:	Speech and Spoken Language Processing: Language modeling and Adaptation
Title:	PROFILE BASED COMPRESSION OF N-GRAM LANGUAGE MODELS
Authors:	Jesper Olsen, Daniela Oria, Nokia, Finland
Abstract:	A profile based technique for encoding and compression of n-gram language models is presented. The technique is intended to be used in combination with existing techniques for size reduction of n-gram language models such as pruning, quantisation and word class modelling. The technique is here evaluated on an embedded large vocabulary speech recognition task. When used in combination with quantisation, the technique can reduce the memory needed for storing probabilities by a factor 10 with little or no degradation in word accuracy. The structure of the language model is well suited for “best-first” type decoding styles, and is here used for guiding an isolated word recogniser. The language model structure is well suited for predicting several likely word continuations, but is computationally less suitable for efficient lookup of individual ngram probabilities.