Technical Program

Paper Detail

Paper:	SLP-P7.5
Session:	Audio-visual and Multimodal Processing
Time:	Wednesday, May 17, 10:00 - 12:00
Presentation:	Poster
Topic:	Speech and Spoken Language Processing: Multi-modal/multimedia processing (such as audio/visual, etc)
Title:	IMPROVED CHINESE CHARACTER INPUT BY MERGING SPEECH AND HANDWRITING RECOGNITION HYPOTHESES
Authors:	Xi Zhou, University of Science and Technology of China, China; Ye Tian, Microsoft Corporation, United States; Jian-Lai Zhou, Frank K. Soong, Microsoft Research Asia, China; Bei-qian Dai, University of Science and Technology of China, China
Abstract:	In this paper we propose to merge speech and handwriting recognition hypotheses together for improving the performance of Chinese character input. The recognition result of handwriting character input can be reliable when the character is written rather squarely. However, more legible of square handwriting tends to slow down the input (stroke writing) speed. On the other hand, speech input is fairly efficient but a large number of homonyms and its vulnerability to adverse environment prevent speech from being used as a robust Chinese character input method. The handwriting stroke information and acoustic speech information, in many cases, are complementary to each other. In this study we use independent, statistically trained HMMs for recognizing each input mode individually but merge recognition hypotheses from the two recognizers. Generalized posterior probabilities are used to synchronize, compare and merge hypotheses appropriately. Experimental results have shown that significant input speedup can be obtained while maintaining the same recognition performance.