ICASSP 2006 - May 15-19, 2006 - Toulouse, France

Recent advances in large vocabulary continuous speech recognition: An HTK perspective

Date: Monday Afternoon, May 15
14:00 - 17:00

Presented by

M. J. F. Gales and P. C. Woodland

Abstract

In recent years the performance of large vocabulary speech recognition systems has greatly improved. As the error rate of such systems has decreased, the complexity and range of tasks for which speech recognition can be successfully applied has broadened. For application domains such as the transcription of Conversational Telephone Speech (CTS) and Broadcast News (BN) systems are required to handle an open vocabulary with a wide range of speaker characteristics, noise conditions, speaking styles and channel effects. These issues have led to the development of new techniques for handling acoustic variations as well as general modelling techniques. Databases with thousands of hours of training data are now also available, sometimes with approximate transcriptions including errors such as close captions. New methods have been developed to handle these large amounts of data. The aim of this tutorial is to give descriptions of the structure, key techniques, design decisions and performance of such systems with an emphasis on recent improvements. A “generic” state-of-the-art large vocabulary continuous speech recognition (LVCSR) architecture will be discussed in which these techniques may be effectively and efficiently implemented. Applications of such a framework to both the transcription of CTS and BN from American English and Mandarin sources will be presented. These systems are based on work in the Speech Group at Cambridge University over the past 10 years using the Hidden Markov Model Toolkit (HTK) and have consistently shown state-of-the-art performance.


IEEESignal Processing Society

©2018 Conference Management Services, Inc. -||- email: webmaster@icassp2006.org -||- Last updated Monday, February 13, 2006