| Paper: | SLP-P15.9 | 
| Session: | Spoken Document Search, Navigation and Summarization | 
| Time: | Thursday,  May 18, 14:00 - 16:00 | 
| Presentation: | Poster | 
	 | Topic: | Speech and Spoken Language Processing: Speech data mining and document retrieval | 
	
	 | Title: | An Extremely Large Vocabulary Approach to Named Entity Extraction from Speech | 
	| Authors: | Takaaki Hori, Atsushi Nakamura, NTT Corporation, Japan | 
  | Abstract: | This paper describes an approach to Named Entity (NE) extraction from speech data, in which an extremely large vocabulary lexicon including all NEs occurring in a large text corpus is used for Automatic Speech Recognition (ASR). Accordingly, NEs appear in the recognition results just as they are. Our approach is implemented by the following steps: (1) run an NE-tagger for a whole text corpus and make an NE-tagged corpus in which each NE is padded with its category, (2) construct a lexicon and a language model for ASR using the tagged corpus where each NE is considered as a regular word, and (3) run the speech recognizer in one pass. Although a very large vocabulary is necessary to ensure a high coverage of NEs, that is no longer a big problem since we recently achieved real-time extremely large vocabulary ASR using WFSTs. In experiments on NE extraction from spoken queries for an open-domain question-answering system, our approach yielded higher F-measure values than a conventional approach. |