Paper: | SLP-P15.6 |
Session: | Spoken Document Search, Navigation and Summarization |
Time: | Thursday, May 18, 14:00 - 16:00 |
Presentation: |
Poster
|
Topic: |
Speech and Spoken Language Processing: Speech data mining and document retrieval |
Title: |
Improved Spoken Document Retrieval with Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis (PLSA) |
Authors: |
Ya-chao Hsieh, Yu-tsun Huang, Chien-chih Wang, Lin-shan Lee, National Taiwan University, Taiwan |
Abstract: |
Spoken document retrieval will be very important in the future network era. In this paper, we propose using a "dynamic key term lexicon" automatically extracted from the ever-changing document archives as an extra feature set in the retrieval task. This lexicon is much more compact but semantically rich; thus it can retrieve relevant documents more efficiently. The key terms include named entities and others selected by a new metric referred to as the term entropy here derived from probabilistic latent semantic analysis (PLSA). Various configurations of retrieval models were tested with a broadcast news archive in Mandarin Chinese and significant performance improvements were obtained, especially with the new version of PLSA models based on a key term lexicon rather than the full lexicon. |