ICASSP 2006 - May 15-19, 2006 - Toulouse, France

Technical Program

Paper Detail

Paper:SLP-P18.4
Session:LVCSR Systems
Time:Friday, May 19, 10:00 - 12:00
Presentation: Poster
Topic: Speech and Spoken Language Processing: Miscellaneous Topics
Title: Morphological Decomposition for Arabic Broadcast News Transcription
Authors: Bing Xiang, BBN Technologies, United States; Kham Nguyen, Northeastern University, United States; Long Nguyen, Richard Schwartz, John Makhoul, BBN Technologies, United States
Abstract: In this paper, we present a novel approach for morphological decomposition in large vocabulary Arabic speech recognition. It achieved low out-of-vocabulary (OOV) rate as well as high recognition accuracy in a state-of-the-art Arabic broadcast news transcription system. In this approach, the compound words are decomposed into roots and affixes in both language training and acoustic training data. The decomposed words in the recognition output are re-joined before scoring. Four algorithms are experimented and compared in this work. The best system achieved 1.7% absolute reduction (8.7% relative) in word error rate (WER) when compared to the 64K-word baseline. The recognition performance of this system is also comparable to a 200K-word recognition system trained on the normal words. In the meantime, the decomposed system is much faster in terms of speed and also needs less memory than the systems with larger than 64K vocabularies.



IEEESignal Processing Society

©2018 Conference Management Services, Inc. -||- email: webmaster@icassp2006.org -||- Last updated Friday, August 17, 2012