Paper: | SLP-P15.1 |
Session: | Spoken Document Search, Navigation and Summarization |
Time: | Thursday, May 18, 14:00 - 16:00 |
Presentation: |
Poster
|
Topic: |
Speech and Spoken Language Processing: Speech data mining and document retrieval |
Title: |
Improved Spoken Document Summarization Using Probabilistic Latent Semantic Analysis (PLSA) |
Authors: |
Sheng-Yi Kong, Lin-shan Lee, National Taiwan University, Taiwan |
Abstract: |
In this paper we propose a set of new methods exploring the topical information embedded in the spoken documents and using such information in automatic summarization of spoken documents. By introducing a set of latent topic variables, the Probabilistic Latent Semantic Analysis (PLSA) is useful to find the underlying probabilistic relationships between the documents and the terms. Two useful measures, referred to as topic significance and term entropy here in this paper, are proposed based on the PLSA modeling to determine the terms and thus sentences important for the document which can then be used to construct the summary. Experiment results for preliminary tests performed on broadcast news stories in Mandarin Chinese indicated improved performance as compared to some existing approaches. |