Evaluating the performance of TDNN-BLSTM on Mandarin read and spontaneous speech
M. Chiroşca (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Siyuan Feng – Mentor (TU Delft - Multimedia Computing)
Odette Scharenborg – Mentor (TU Delft - Multimedia Computing)
CM Jonker – Graduation committee member (TU Delft - Interactive Intelligence)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
A limitation of current ASR systems is the so-called out-of-vocabulary words. The solution to overcome this limitation is to use APR systems. Previous research on Dutch APR systems identified Time Delayed Bidirectional Long-Short Term Memory Neural Network (TDNN-BLSTM) as one of best performing state-of-the-art NN architecture for PR. The goal of this research is to evaluate the performance of the TDNN-BLSTM architecture for phoneme recognition on Mandarin read and spontaneous speech, analyze the differences in performance for the two speech styles as well as compare the results with previous research on Dutch PR.
To achieve this goal 4 different NN models of the TDNN-BLSTM architecture were built and trained on Mandarin read and spontaneous speech. The test results of the NN models were used to calculate the phoneme error rate (PER), decomposed PER, and the contribution of individual phonemes to the overall PER. Based on these findings, conclusions are formulated regarding the impact of different languages, speech styles, and the architectural changes on the performance of the TDNN-BLSTM architecture.