ICASSP 2006 - May 15-19, 2006 - Toulouse, France

Technical Program

Paper Detail

Paper:SLP-P15.9
Session:Spoken Document Search, Navigation and Summarization
Time:Thursday, May 18, 14:00 - 16:00
Presentation: Poster
Topic: Speech and Spoken Language Processing: Speech data mining and document retrieval
Title: An Extremely Large Vocabulary Approach to Named Entity Extraction from Speech
Authors: Takaaki Hori, Atsushi Nakamura, NTT Corporation, Japan
Abstract: This paper describes an approach to Named Entity (NE) extraction from speech data, in which an extremely large vocabulary lexicon including all NEs occurring in a large text corpus is used for Automatic Speech Recognition (ASR). Accordingly, NEs appear in the recognition results just as they are. Our approach is implemented by the following steps: (1) run an NE-tagger for a whole text corpus and make an NE-tagged corpus in which each NE is padded with its category, (2) construct a lexicon and a language model for ASR using the tagged corpus where each NE is considered as a regular word, and (3) run the speech recognizer in one pass. Although a very large vocabulary is necessary to ensure a high coverage of NEs, that is no longer a big problem since we recently achieved real-time extremely large vocabulary ASR using WFSTs. In experiments on NE extraction from spoken queries for an open-domain question-answering system, our approach yielded higher F-measure values than a conventional approach.



IEEESignal Processing Society

©2018 Conference Management Services, Inc. -||- email: webmaster@icassp2006.org -||- Last updated Friday, August 17, 2012