Paper: | SLP-P16.2 |
Session: | Speaker Tracking and Adaptation |
Time: | Thursday, May 18, 16:30 - 18:30 |
Presentation: |
Poster
|
Topic: |
Speech and Spoken Language Processing: Clustering and novel modeling algorithms |
Title: |
FAST AND ROBUST SPEAKER CLUSTERING USING THE EARTH MOVER'S DISTANCE AND MIXMAX MODELS |
Authors: |
Thilo Stadelmann, Bernd Freisleben, University of Marburg, Germany |
Abstract: |
Speaker clustering is the task of assigning a unique label to all speech segments in a video uttered by the same speaker. There are two key challenges: processing speed and robustness in the presence of noise. In this paper, we present an approach to significantly improve the processing speed of a hierarchical speaker clustering algorithm by using the earth mover's distance (EMD) as the distance measure. By extending the well-known MIXMAX speaker model such that the EMD can be applied, noise robustness is achieved. Experimental results show that the runtime of the proposed EMD approach decreases by more than factor of 120 compared to a likelihood ratio based distance measure while the clustering performance remains nearly the same. |