Paper: | SLP-L9.6 |
Session: | Spoken Language Identification |
Time: | Thursday, May 18, 18:10 - 18:30 |
Presentation: |
Lecture
|
Topic: |
Speech and Spoken Language Processing: Language Identification |
Title: |
Discriminative Classifiers for Language Recognition |
Authors: |
Christopher White, Izhak Shafran, CLSP / The Johns Hopkins University, United States; Jean-luc Gauvain, LIMSI-CNRS, France |
Abstract: |
Most language recognition systems consist of a cascade of three stages: (1) tokenizers that produces parallel phone streams, (2) phonotactic models that score the match between each phone stream and the phonotactic constraints in the target languages, and (3) a final stage that combines the scores from the parallel streams appropriately. This paper reports a series of contrastive experiments to assess the impact of replacing the second and third stages with large-margin discriminative classifiers. In addition, we demonstrate how sounds that are not represented in the tokenizers of the first stage can be approximated with composite units that utilize cross-stream dependencies obtained via multi-string alignments. This leads to a unified discriminative framework which can potentially incorporate a richer set of features such as prosodic and lexical cues. Experiments are reported on the NIST LRE 1996 and 2003 task and the results show that the new techniques give substantial gains over a competitive PPRLM baseline. |