Paper: | ITT-L1.1 |
Session: | Speech Processing Applications |
Time: | Thursday, May 18, 16:30 - 16:50 |
Presentation: |
Lecture
|
Topic: |
Industry Technology Track: Speech Recognition |
Title: |
UNSUPERVISED TRAINING ON LARGE AMOUNTS OF BROADCAST NEWS DATA |
Authors: |
Jeff Ma, Spyros Matsoukas, Owen Kimball, Richard Schwartz, BBN Technologies, United States |
Abstract: |
This paper presents our recent effort that aims at improving our Arabic Broadcast News (BN) recognition system by using thousands of hours of un-transcribed Arabic audio in the way of unsupervised training. Unsupervised training is first carried out on the 1,900-hour English Topic Detection and Tracking (TDT) data and is compared with the lightly-supervised training method that we have used for the DARPA EARS evaluations. The comparison shows that unsupervised training produces a 21.7% relative reduction in word error rate (WER), which is comparable to the gain obtained with light supervision methods. The same unsupervised training strategy carried out on a similar amount of Arabic BN data produces an 11.6% relative gain. The gain, though considerable, is substantially smaller than what is observed on the English data. Our initial work towards understanding the reasons for this difference is also described. |