Technical Program

Paper Detail

Paper:	SLP-L6.1
Session:	Advances in LVCSR Algorithms
Time:	Wednesday, May 17, 16:30 - 16:50
Presentation:	Lecture
Topic:	Speech and Spoken Language Processing: Decoding algorithms and implementation
Title:	MODELING POLYPHONE CONTEXT WITH WEIGHTED FINITE-STATE TRANSDUCERS
Authors:	Emilian Stoimenov, John McDonough, Institut fuer Theoretische Informatik, Germany
Abstract:	As coarticulation effects are prevalent in all speech, a phone must be modeled in its context to achieve optimal performance in large vocabulary continuous speech recognition systems. Schuster and Hori proposed a technique for modeling polyphone context with weighted finite-state transducers whereby all valid three-state sequences of Gaussian mixture models are enumerated, and thereafter the possbile connections between these three-state sequences are determined. Hence, the explicit modeling of all possible polyphones is avoided. Rather, Schuster and Hori derive a transducer that translates from sequences of Gaussian mixture models directly to phone sequences. The resulting network is much smaller than the conventional network proposed by Mohri et al. While Schuster and Hori's approach to modeling polyphone context is quite interesting, it is incorrect for contexts larger than triphones. In this work, we correct the errors of Schuster and Hori. Thereafter we discuss how the intermediate size of the network can be held in check. We also present the results of a set of experiments comparing network size and speech recognition performance for networks obtained with Schuster and Hori's technique and with the correct technique.