Paper: | SLP-P2.3 |
Session: | Speech Production, Analysis and Modeling |
Time: | Tuesday, May 16, 10:30 - 12:30 |
Presentation: |
Poster
|
Topic: |
Speech and Spoken Language Processing: Speech Analysis |
Title: |
A DATABASE OF VOCAL TRACT RESONANCE TRAJECTORIES FOR RESEARCH IN SPEECH PROCESSING |
Authors: |
Li Deng, Microsoft Research, United States; Xiaodong Cui, University of California, Los Angeles, United States; Robert Pruvenok, Georgia Institute of Technology, United States; Jonathan Huang, Safiyy Momen, Carnegie Mellon University, United States; Yanyi Chen, Cornell University, United States; Abeer Alwan, University of California, Los Angeles, United States |
Abstract: |
While vocal tract resonances (VTRs or formants defined as such resonances) are known to play a critical role in human speech perception and in computer speech processing, there has been a conspicuous lack of standard databases needed for quantitative evaluation of automatic VTR extraction techniques. We report in this paper our recent effort for creating a database of F1, F2, and F3 VTR frequency trajectories. The database contains a representative subset of the TIMIT corpus with respect to the diversity of speaker, gender, dialect and phonetic context, with a total of 538 sentences. A Matlab-based labeling tool is developed, with high-resolution wideband spectrograms displayed to assist in visual identification of VTR frequency values which are then recorded via mouse clicks and local spline interpolation. Special attention is paid to VTR values during consonant-to-vowel (CV) and vowel-to-consonant (VC) transitions, and on speech segments involving vocal tract anti-resonances where the VTR and spectral prominence deviate from each other. Using this database, we quantitatively assessed two common automatic VTR tracking techniques in terms of their averaged tracking errors analyzed within each of the six major broad phonetic classes as well as those during fine CV and VC transitions. The potential use of the VTR database for research in several areas of speech processing is discussed. |