Paper: | SAM-P6.7 |
Session: | Applications of Multichannel Signal Processing |
Time: | Friday, May 19, 16:30 - 18:30 |
Presentation: |
Poster
|
Topic: |
Sensor Array and Multichannel Signal Processing: Source localization, separation, classification, and tracking |
Title: |
Spatial Separation of Speech Signals Using Continuously-Variable Masks Estimated from Comparisons of Zero Crossings |
Authors: |
Hyung-Min Park, Richard Stern, Carnegie Mellon University, United States |
Abstract: |
This paper describes an algorithm that achieves noise robustness in speech recognition by reconstructing the desired signal from a mixture of two signals using continuously-variable masks. In contrast to current methods which use binary masks, this approach estimates the relative contribution of the desired source in a mixture of sources and reconstructs the desired signal in proportion to its estimated contribution to each time-frequency segment. Estimation of the continuously-variable masks is based on the relationship between the relative intensity of each source and the interaural time difference (ITD). Estimation of the ITD is accomplished using zero-crossing-based methods. It is shown that the use of zero-crossing approaches to estimate ITDs and continuously-variable masks provide better speech recognition accuracy than cross-correlation-based approaches to ITD estimation and binary masks. |