Technical Program

Paper Detail

Paper:	SLP-P3.7
Session:	Novel LVCSR Algorithms
Time:	Tuesday, May 16, 14:00 - 16:00
Presentation:	Poster
Topic:	Speech and Spoken Language Processing: Lattices and Multi-pass strategies
Title:	Sentence-adapted Factored Language Model for Transcribing Estonian Speech
Authors:	Tanel Alumäe, Tallinn University of Technology, Estonia
Abstract:	This work presents a 2-pass recognition method for highly inflected agglutinative languages based on an Estonian large vocabulary recognition task. Morphemes are used as basic recognition units in a standard trigram language model in the first pass. The recognized morphemes are reconstructed back to words using hidden event language model for compound word detection. In the second pass, the vocabulary from N-best sentence candidates from the first pass is used to create an adaptive sentence-specific word-based language model which is applied for rescoring the N-best hypotheses. The sentence specific language model is based on the factored language model paradigm and estimates word probabilities based on the preceding two words and part-of-speech tags. The method achieves a 7.3% relative word error rate improvement over the baseline system that is used in the first pass.