Paper: | SLP-P19.5 |
Session: | Model-based Robust Speech Recognition |
Time: | Friday, May 19, 10:00 - 12:00 |
Presentation: |
Poster
|
Topic: |
Speech and Spoken Language Processing: Model-based robust Speech Recognition |
Title: |
Model Adaptation for Long Convolutional Distortion by Maximum Likelihood Based State Filtering Approach |
Authors: |
Chandra Kant Raut, Takuya Nishimoto, Shigeki Sagayama, University of Tokyo, Japan |
Abstract: |
In environment with considerably long reverberation time, each frame of speech is affected by energy components from the preceding frames. Therefore, to adapt parameters of a state of HMM, it becomes necessary to consider these frames, and compute their contributions to current state. However, these speech frames preceding to a state of HMM are not known during adaptation of the models. In this paper, we propose to use preceding states as units of preceding speech segments, estimate their contributions to current state in maximum likelihood manner, and adapt models by accounting their contributions. When clean models were adapted by proposed method for a speaker-dependent isolated word recognition task, word accuracy of the system typically increased from 67.6% to 83.2%, and from 44.8% to 72.5%, for channel distorted speech simulated by linear convolution of clean speech and impulse responses with reverberation time (T_60) of 310 ms and 780 ms, respectively. |