Paper: | SLP-P4.10 |
Session: | Speech Enhancement in Adverse Environments |
Time: | Tuesday, May 16, 14:00 - 16:00 |
Presentation: |
Poster
|
Topic: |
Speech and Spoken Language Processing: Speech Enhancement (for Impaired Situations) |
Title: |
SPEECH BANDWIDTH ENHANCEMENT USING STATE SPACE SPEECH DYNAMICS |
Authors: |
Sheng Yao, Cheung-Fat Chan, City University of Hong Kong, Hong Kong SAR of China |
Abstract: |
Extending narrowband speech (0-4 kHz) to wideband speech (0-8 kHz) has applications in telephone systems and speech recognition systems where wideband training speech data may not be available. A couple of methods have been proposed to retrieve the missing high-band information (4-8 kHz) from narrowband speech. Memoryless systems are likely to produce large hissing artifacts since mutual information between low-band (0-4 kHz) and high-band (4-8 kHz) spectra are actually quite low. Generally speaking, bandwidth extension cannot recover original high-band information but good approximates with less over-estimation of the high-band energy, which usually refers to hissing artifact, can be obtained by considering the neighboring speech frames. In this paper, we propose a new bandwidth extension system with memory by using a state-space model to capture the long-term speech dynamics. The model parameters can be trained in the sense of maximum likelihood (ML) and the enhancement is obtained via wideband state vector estimation and Kalman filtering. The performance in terms of spectral distortion is shown to be much better than other memoryless systems and is comparable with early Continuous Density Hidden Markov Model (CDHMM) memory system. The new state-space method is inherent sequential and has advantages of less processing delays and robustness against block detection errors. |