Paper: | SLP-P14.4 |
Session: | Speaker Recognition: Models and Methods |
Time: | Thursday, May 18, 14:00 - 16:00 |
Presentation: |
Poster
|
Topic: |
Speech and Spoken Language Processing: Speaker Identification |
Title: |
ON MAXIMIZING THE WITHIN-CLUSTER HOMOGENEITY OF SPEAKER VOICE CHARACTERISTICS FOR SPEECH UTTERANCE CLUSTERING |
Authors: |
Wei-Ho Tsai, Hsin-Min Wang, Academia Sinica, Taiwan |
Abstract: |
This paper investigates the problem of how to partition unknown speech utterances into clusters, such that the overall within-cluster homogeneity of speakers' voice characteristics can be maximized. The within-cluster homogeneity is characterized by the likelihood probability that a cluster model, trained using all the utterances within a cluster, matches each of the within-cluster utterances. Such probability is then maximized by using a genetic algorithm, which determines the best cluster where each utterance should be located. For greater computational efficiency, also proposed is an alternative solution that approximates the likelihood probability with a divergence-based model similarity. The method is further designed to estimate the optimal number of clusters automatically. |