Assessing the performance of the TDNN-BLSTM architecture for phoneme recognition of English speech

None, None

Assessing the performance of the TDNN-BLSTM architecture for phoneme recognition of English speech

Bachelor Thesis (2021)

Author(s)

I.A. Klom (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

O.E. Scharenborg – Mentor (Multimedia Computing)

S. Feng – Mentor (Multimedia Computing)

C.M. Jonker – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Speech recognition Phoneme Recognition TDNN-BLSTM Spontaneous speech Read speech TIMIT Buckeye Phoneme Error Rate Acoustic Model

To reference this document use

https://resolver.tudelft.nl/uuid:ea5735dc-8384-4e6d-9e42-53b5beafd2f1

More Info

expand_more

Publication Year

2021

Language

English

Graduation Date

01-07-2021

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

257

Collections

thesis

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This research studies the Projected Bidirectional Long Short-Term Memory Time Delayed Neural Network (TDNN-BLSTM) model for English phoneme recognition. It contributes to the field of phoneme recognition by analyzing the performance of the TDNN-BLSTM model based on the TIMIT corpus and the Buckeye corpus, respectively containing read speech and spontaneous speech. The TIMIT corpus can be used as benchmark to make comparisons between architectures. The Buckeye corpus is used to better understand how the TDNN-BLSTM architecture would perform on recorded informal conversations.
Parameter values are taken from literature and are optimized.
Using the improved parameters, the results show Phoneme Error Rates (PER) for read speech to be 31.78% and for spontaneous speech to be 54.03%. Related work shows PER scores for read speech to be 14.9% and for spontaneous speech to be 23.4%.
This indicates that the TDNN-BLSTM architecture does not perform as well as other acoustic models for both spontaneous and read speech.

Files

Research_paper_Irene_Klom.pdf

(pdf | 0.332 Mb)

License info not available