Evaluation of phoneme recognition through TDNN-OPGRU on Mandarin speech

Bachelor Thesis (2021)
Author(s)

J. van der Tang (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

S. Feng – Mentor (TU Delft - Multimedia Computing)

O.E. Scharenborg – Mentor (TU Delft - Multimedia Computing)

C.M. Jonker – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2021 Jordy van der Tang
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 Jordy van der Tang
Graduation Date
01-07-2021
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Related content

git repository with the project files and results

https://github.com/jordyjordy/TDNN-OPGRU-Mandarin
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This research expands past research on implementing the TDNN-OPGRU network for Automatic Phoneme Recognition on Dutch speech by implementing and testing the TDNN-OPGRU network on Mandarin speech. The goal of this research is to investigate the performance of the TDNN-OPGRU architecture when decoding phonemes in Mandarin prepared and spontaneous speech. The difference in Phoneme Error Rate between prepared and spontaneous speech is being determined, and the effect that tones have on the PER is being investigated since Mandarin is a tonal language. The results are that a substantial amount of the PER comes from substitutions that are made where only the tone is incorrectly determined. However, tone does not appear to have an impact on the difference in PER between spontaneous and prepared speech since it is responsible for an similar amount of the substitutions in both types of speech. The inclusion of tone also causes the error rate of the TDNN-OPGRU architecture on base phonemes to increase.

Files

Research_Paper_4_.pdf
(pdf | 0.605 Mb)
License info not available