Paper: | SLP-P11.11 |
Session: | Front-end For Robust Speech Recognition |
Time: | Wednesday, May 17, 16:30 - 18:30 |
Presentation: |
Poster
|
Topic: |
Speech and Spoken Language Processing: End-point detection and barge-in methods |
Title: |
Robust Endpoint Detection for Speech Recognition based on Discriminative Feature Extraction |
Authors: |
Koichi Yamamoto, Toshiba Corporation, Japan; Firas Jabloun, Klaus Reinhard, Toshiba Research Europe Ltd., United Kingdom; Akinori Kawamura, Toshiba Corporation, Japan |
Abstract: |
This paper proposes a robust endpointer for automatic speech recognition (ASR). The proposed endpointer is based on voice activity detection (VAD) with energy and likelihood ratio criteria, where the likelihood ratio is calculated using speech and non-speech Gaussian Mixture Models (GMMs). In order to improve the performance of speech/non-speech classification, the parameters required to calculate the likelihood ratio are trained by discriminative feature extraction (DFE). Experimental results have shown that the proposed endpointer achieves good performance compared to an energy-based endpointer in terms of start-of-speech and end-of-speech detections. Due to the improvement of the endpointer, the performance of ASR has also been improved. |