Paper: | SLP-P3.7 |
Session: | Novel LVCSR Algorithms |
Time: | Tuesday, May 16, 14:00 - 16:00 |
Presentation: |
Poster
|
Topic: |
Speech and Spoken Language Processing: Lattices and Multi-pass strategies |
Title: |
Sentence-adapted Factored Language Model for Transcribing Estonian Speech |
Authors: |
Tanel Alumäe, Tallinn University of Technology, Estonia |
Abstract: |
This work presents a 2-pass recognition method for highly inflected agglutinative languages based on an Estonian large vocabulary recognition task. Morphemes are used as basic recognition units in a standard trigram language model in the first pass. The recognized morphemes are reconstructed back to words using hidden event language model for compound word detection. In the second pass, the vocabulary from N-best sentence candidates from the first pass is used to create an adaptive sentence-specific word-based language model which is applied for rescoring the N-best hypotheses. The sentence specific language model is based on the factored language model paradigm and estimates word probabilities based on the preceding two words and part-of-speech tags. The method achieves a 7.3% relative word error rate improvement over the baseline system that is used in the first pass. |