Assessing the performance of the TDNN-BLSTM architecture for phoneme recognition of English speech

More Info
expand_more

Abstract

This research studies the Projected Bidirectional Long Short-Term Memory Time Delayed Neural Network (TDNN-BLSTM) model for English phoneme recognition. It contributes to the field of phoneme recognition by analyzing the performance of the TDNN-BLSTM model based on the TIMIT corpus and the Buckeye corpus, respectively containing read speech and spontaneous speech. The TIMIT corpus can be used as benchmark to make comparisons between architectures. The Buckeye corpus is used to better understand how the TDNN-BLSTM architecture would perform on recorded informal conversations.
Parameter values are taken from literature and are optimized.
Using the improved parameters, the results show Phoneme Error Rates (PER) for read speech to be 31.78% and for spontaneous speech to be 54.03%. Related work shows PER scores for read speech to be 14.9% and for spontaneous speech to be 23.4%.
This indicates that the TDNN-BLSTM architecture does not perform as well as other acoustic models for both spontaneous and read speech.