Paper: | SLP-P10.3 |
Session: | Speech Synthesis II |
Time: | Wednesday, May 17, 14:00 - 16:00 |
Presentation: |
Poster
|
Topic: |
Speech and Spoken Language Processing: Text-to-phoneme conversion |
Title: |
IDENTIFYING LANGUAGE ORIGIN OF PERSON NAMES WITH N-GRAMS OF DIFFERENT UNITS |
Authors: |
Yining Chen, Microsoft Research Asia, China; Jiali You, Chinese Academy of Sciences, China; Min Chu, Yong Zhao, Microsoft Research Asia, China; Jinlin Wang, Chinese Academy of Sciences, China |
Abstract: |
Identifying the language origin of a name appeared in English is important for generating correct pronunciation of the name. In this paper, N-grams of syllable-based letter clusters are proposed for the task. The performance of the N-gram model of a set of frequently used letter clusters (correspond to syllables) is compared to that of letter N-gram model in a four language task (English, German, French and Portuguese). On average, the letter cluster N-gram that has 26% error rate, is slightly better than the letter N-gram that has 27.2% error rate. Furthermore, it is found that the error distributions from the two N-grams have pretty large differences. Therefore, AdaBoost is used to combine the results from N-grams of different units. The error rate is reduced to 22.5% or a relative 17.5% error reduction is achieved after the combination. |