Assessing the performance of the TDNN-BLSTM architecture for phoneme recognition of English speech

Bachelor Thesis (2021)
Author(s)

I.A. Klom (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

O.E. Scharenborg – Mentor (TU Delft - Multimedia Computing)

S. Feng – Mentor (TU Delft - Multimedia Computing)

CM Jonker – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2021 Irene Klom
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 Irene Klom
Graduation Date
01-07-2021
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This research studies the Projected Bidirectional Long Short-Term Memory Time Delayed Neural Network (TDNN-BLSTM) model for English phoneme recognition. It contributes to the field of phoneme recognition by analyzing the performance of the TDNN-BLSTM model based on the TIMIT corpus and the Buckeye corpus, respectively containing read speech and spontaneous speech. The TIMIT corpus can be used as benchmark to make comparisons between architectures. The Buckeye corpus is used to better understand how the TDNN-BLSTM architecture would perform on recorded informal conversations.
Parameter values are taken from literature and are optimized.
Using the improved parameters, the results show Phoneme Error Rates (PER) for read speech to be 31.78% and for spontaneous speech to be 54.03%. Related work shows PER scores for read speech to be 14.9% and for spontaneous speech to be 23.4%.
This indicates that the TDNN-BLSTM architecture does not perform as well as other acoustic models for both spontaneous and read speech.

Files

License info not available