Paper: | SLP-L13.6 |
Session: | Missing Data Methods in Robust Speech Recognition |
Time: | Friday, May 19, 18:10 - 18:30 |
Presentation: |
Lecture
|
Topic: |
Speech and Spoken Language Processing: Feature-based Robust Speech Recognition (e.g., noise, etc) |
Title: |
SPEECH RECOGNITION IN MULTISOURCE REVERBERANT ENVIRONMENTS WITH BINAURAL INPUTS |
Authors: |
Nicoleta Roman, The Ohio State University at Lima, United States; Soundararajan Srinivasan, DeLiang Wang, The Ohio State University, United States |
Abstract: |
We present a binaural solution to robust speech recognition in multi-source reverberant environments. We employ the notion of an ideal time-frequency binary mask, which selects the target if it is stronger than the interference in a local time-frequency (T-F) unit. Our system estimates this ideal binary mask at the output of a target cancellation module implemented using adaptive filtering. This mask is used in conjunction with a missing-data algorithm to decode the target utterance. A systematic evaluation in terms of automatic speech recognition (ASR) performance shows substantial improvements over the baseline performance and better results over related two-microphone approaches. |