Evaluation of phoneme recognition through TDNN-OPGRU on Mandarin speech
J. van der Tang (TU Delft - Electrical Engineering, Mathematics and Computer Science)
S. Feng – Mentor (TU Delft - Multimedia Computing)
O.E. Scharenborg – Mentor (TU Delft - Multimedia Computing)
C.M. Jonker – Graduation committee member (TU Delft - Interactive Intelligence)
More Info
expand_more
git repository with the project files and results
https://github.com/jordyjordy/TDNN-OPGRU-MandarinOther than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
This research expands past research on implementing the TDNN-OPGRU network for Automatic Phoneme Recognition on Dutch speech by implementing and testing the TDNN-OPGRU network on Mandarin speech. The goal of this research is to investigate the performance of the TDNN-OPGRU architecture when decoding phonemes in Mandarin prepared and spontaneous speech. The difference in Phoneme Error Rate between prepared and spontaneous speech is being determined, and the effect that tones have on the PER is being investigated since Mandarin is a tonal language. The results are that a substantial amount of the PER comes from substitutions that are made where only the tone is incorrectly determined. However, tone does not appear to have an impact on the difference in PER between spontaneous and prepared speech since it is responsible for an similar amount of the substitutions in both types of speech. The inclusion of tone also causes the error rate of the TDNN-OPGRU architecture on base phonemes to increase.