Paper: | SLP-L6.1 |
Session: | Advances in LVCSR Algorithms |
Time: | Wednesday, May 17, 16:30 - 16:50 |
Presentation: |
Lecture
|
Topic: |
Speech and Spoken Language Processing: Decoding algorithms and implementation |
Title: |
MODELING POLYPHONE CONTEXT WITH WEIGHTED FINITE-STATE TRANSDUCERS |
Authors: |
Emilian Stoimenov, John McDonough, Institut fuer Theoretische Informatik, Germany |
Abstract: |
As coarticulation effects are prevalent in all speech, a phone must be modeled in its context to achieve optimal performance in large vocabulary continuous speech recognition systems. Schuster and Hori proposed a technique for modeling polyphone context with weighted finite-state transducers whereby all valid three-state sequences of Gaussian mixture models are enumerated, and thereafter the possbile connections between these three-state sequences are determined. Hence, the explicit modeling of all possible polyphones is avoided. Rather, Schuster and Hori derive a transducer that translates from sequences of Gaussian mixture models directly to phone sequences. The resulting network is much smaller than the conventional network proposed by Mohri et al. While Schuster and Hori's approach to modeling polyphone context is quite interesting, it is incorrect for contexts larger than triphones. In this work, we correct the errors of Schuster and Hori. Thereafter we discuss how the intermediate size of the network can be held in check. We also present the results of a set of experiments comparing network size and speech recognition performance for networks obtained with Schuster and Hori's technique and with the correct technique. |