Paper: | SLP-P3.12 |
Session: | Novel LVCSR Algorithms |
Time: | Tuesday, May 16, 14:00 - 16:00 |
Presentation: |
Poster
|
Topic: |
Speech and Spoken Language Processing: Model-based robust Speech Recognition |
Title: |
Robust Large Vocabulary Continuous Speech Recognition using Polynomial Segment Model with Unsupervised Adaptation |
Authors: |
Man-Hung Siu, Siu-Kei Au yeung, Hong Kong University of Science and Technology, Hong Kong SAR of China |
Abstract: |
Robustness has been an important issue for applying speech technologies to real applications. While the Polynomial Segment Models (PSMs) have been shown to outperform HMM under the clean environment, the segmental likelihood evaluation may make the PSM distributions sharper and may adversely affect their performance in mis-matched conditions. In this paper, we explore the robustness properties of the PSM under noisy and channel mis-match conditions. In addition, unsupervised adaptation techniques have been shown to work well for environmental adaptation even with small amount of adaptation data. Thus, it is interesting to compare the PSMs' and the HMMs' performances after applying two types of unsupervised adaptation: the Maximum Likelihood Linear Regression (MLLR) and the Reference Speaker Weighting (RSW). Experiments were performed on the Aurora 4 corpus under both clean and multi-conditional training. Our results show that even under noisy and mis-match conditions, the PSMs performed well compared to the HMMs both before and after environmental adaptation. Using the best lattice, the RSW adapted PSM gave word error rates of 26.5% and 21.3% for clean and multi-conditional training respectively which were approximately 24% better than the unadapted HMM. |