Technical Program

Paper Detail

Paper:	SLP-P15.11
Session:	Spoken Document Search, Navigation and Summarization
Time:	Thursday, May 18, 14:00 - 16:00
Presentation:	Poster
Topic:	Speech and Spoken Language Processing: Speech data mining and document retrieval
Title:	Automatic Sentence Segmentation of Speech for Automatic Summarization
Authors:	Joanna Mrozinski, Edward W. D. Whittaker, Pierre Chatain, Sadaoki Furui, Tokyo Institute of Technology, Japan
Abstract:	This paper presents an automatic segmentation method for an automatic speech summarization system. The segmentation method is based on combining word- and class-based statistical language models to predict sentence and non-sentence boundaries. We study both the performance of the segmentation system itself and the effect of the segmentation on the summarization accuracy. The sentence segmentation is done by modelling the probability of a sentence boundary given a certain word history with language models trained on transcriptions and texts from several sources. The resulting segmented data is used as the input to an existing automatic summarization system to determine the effect it has on the summarization process. We conduct all our experiments with two types of data, broadcast news and lecture transcriptions. The automatic summarizations are created with different segmentations and different summarization ratios and evaluated by comparing them to human-made summaries. We show that a proper segmentation is essential to achieve good performance with an automatic summarization system.