Technical Program

Paper Detail

Paper:	SLP-P3.4
Session:	Novel LVCSR Algorithms
Time:	Tuesday, May 16, 14:00 - 16:00
Presentation:	Poster
Topic:	Speech and Spoken Language Processing: Miscellaneous Topics
Title:	Flexible Multi-Stream Framework for Speech Recognition Using Multi-Tape Finite-State Transducers
Authors:	I. Lee Hetherington, Han Shu, James R. Glass, Massachusetts Institute of Technology, United States
Abstract:	We present an approach to general multi-stream recognition utilizing multi-tape finite-state transducers (FSTs). The approach is novel in that each of the multiple "streams" of features can represent either a sequence (e.g., fixed- or variable-rate frames) or a directed acyclic graph (e.g., containing hypothesized phonetic segmentations). Each transition of the multi-tape FST specifies the models to be applied to each stream and the degree of feature stream asynchrony to allow. We show how this framework can easily represent the 2-stream variable-rate landmark and segment modeling utilized by our baseline SUMMIT speech recognizer. We present experiments merging standard hidden Markov models (HMMs) with landmark models on the Wall Street Journal speech recognition task, and find that some degree of asynchrony can be critical when combining different types of models. We also present experiments performing audio-visual speech recognition on the AV-TIMIT task.