Paper: | SLP-P8.3 |
Session: | Speaker Recognition: Features |
Time: | Wednesday, May 17, 10:00 - 12:00 |
Presentation: |
Poster
|
Topic: |
Speech and Spoken Language Processing: Speaker Identification |
Title: |
Robust Speaker Recognition Using Binary Time-Frequency Masks |
Authors: |
Yang Shao, DeLiang Wang, The Ohio State University, United States |
Abstract: |
Conventional speaker recognition systems perform poorly under noisy conditions. In this paper, we evaluate binary time-frequency masks for robust speaker recognition. An ideal binary mask is a priori defined as a binary matrix where 1 indicates that the target is stronger than the interference within the corresponding time-frequency unit and 0 indicates otherwise. We perform speaker identification and verification using a missing data recognizer under cochannel and other noise conditions, and show that the ideal binary mask provides large performance gains. By employing a speech segregation system that estimates the ideal binary mask, we achieve significant improvements over alternative approaches. Our study, thus, demonstrates that the use of binary masking represents a promising direction for robust speaker recognition. |