Evaluating the performance of TDNN-BLSTM on Mandarin read and spontaneous speech

None, None

Evaluating the performance of TDNN-BLSTM on Mandarin read and spontaneous speech

Bachelor Thesis (2021)

Author(s)

M. Chiroşca (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Siyuan Feng – Mentor (TU Delft - Multimedia Computing)

Odette Scharenborg – Mentor (TU Delft - Multimedia Computing)

CM Jonker – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Speech recognition Neural Networks Phoneme Recognition

To reference this document use:

https://resolver.tudelft.nl/uuid:dd65b686-0acc-46de-a28f-456ac9aecf32

More Info

expand_more

Publication Year

2021

Language

English

Copyright

Graduation Date

01-07-2021

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

A limitation of current ASR systems is the so-called out-of-vocabulary words. The solution to overcome this limitation is to use APR systems. Previous research on Dutch APR systems identified Time Delayed Bidirectional Long-Short Term Memory Neural Network (TDNN-BLSTM) as one of best performing state-of-the-art NN architecture for PR. The goal of this research is to evaluate the performance of the TDNN-BLSTM architecture for phoneme recognition on Mandarin read and spontaneous speech, analyze the differences in performance for the two speech styles as well as compare the results with previous research on Dutch PR.

To achieve this goal 4 different NN models of the TDNN-BLSTM architecture were built and trained on Mandarin read and spontaneous speech. The test results of the NN models were used to calculate the phoneme error rate (PER), decomposed PER, and the contribution of individual phonemes to the overall PER. Based on these findings, conclusions are formulated regarding the impact of different languages, speech styles, and the architectural changes on the performance of the TDNN-BLSTM architecture.

Files

Evaluating_the_performance_of_... (pdf)

(pdf | 0.337 Mb)

License info not available