Paper: | SLP-P10.9 |
Session: | Speech Synthesis II |
Time: | Wednesday, May 17, 14:00 - 16:00 |
Presentation: |
Poster
|
Topic: |
Speech and Spoken Language Processing: Prosody, Emotional, and Expressive Synthesis |
Title: |
A HIERARCHICAL APPROACH TO AUTOMATIC STRESS DETECTION IN ENGLISH SENTENCES |
Authors: |
Min Lai, University of Science and Technology of China, China; Yining Chen, Min Chu, Yong Zhao, Microsoft Research Asia, China; Fangyu Hu, University of Science and Technology of China, China |
Abstract: |
This paper proposes a hierarchical framework, which consists of three layers of classifiers, for automatic stress detection in English speech utterances. The top two layers are a linguistic classifier, which assigns stressed labels to all content words and unstressed labels to all functions words, and an acoustic classifier, which assigns stressed and unstressed labels with HMM based models and using only acoustic features such as MFCC, energy and f0. When there is no manual stressed label available, only the top two layers are activated. The best performance we achieved is 92.9%. The third layer in the framework is an AdaBoost classifier that can improve the accuracy by using more features and manual labels. The best result we obtained is 94.1%, which is approaching to the self-agreement ratio (97.4%) of the same annotator, or the upper bound of the performance. |