ICASSP 2006 - May 15-19, 2006 - Toulouse, France

Technical Program

Paper Detail

Paper:SLP-P17.11
Session:Spoken Language Modeling, Identification and Characterization
Time:Thursday, May 18, 16:30 - 18:30
Presentation: Poster
Topic: Speech and Spoken Language Processing: Language modeling and Adaptation
Title: Strategies for Language Model Web-data Collection
Authors: Vincent Wan, Thomas Hain, University of Sheffield, United Kingdom
Abstract: This paper presents an analysis of the use of textual information collected from the internet via a search engine for the purpose of building domain specific language models. A framework to analyse the effect of search query formulation on the resulting web-data language model performance in an evaluation is developed. The framework gives rise to improved methods of selecting n-gram search engine queries, which return documents that make better domain specific language models.



IEEESignal Processing Society

©2018 Conference Management Services, Inc. -||- email: webmaster@icassp2006.org -||- Last updated Friday, August 17, 2012