Training and testing the TDNN-OPGRU acoustic model on English read and spontaneous speech

None, None

Training and testing the TDNN-OPGRU acoustic model on English read and spontaneous speech

Bachelor Thesis (2021)

Author(s)

G.D. Genkov (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Siyuan Feng – Mentor (TU Delft - Multimedia Computing)

O.E. Scharenborg – Graduation committee member (TU Delft - Multimedia Computing)

CM Jonker – Coach (TU Delft - Interactive Intelligence)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Phoneme Recognition Spontaneous speech Phoneme Error Rate Acoustic Model TDNN-OPGRU English Prepared speech

To reference this document use:

https://resolver.tudelft.nl/uuid:350beee7-6bca-41c8-823c-dffd584736eb

More Info

expand_more

Publication Year

2021

Language

English

Copyright

Graduation Date

01-07-2021

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Automatic phoneme recognition (APR) is the process of recognizing phonemes (spoken sounds) in a recording of speech. It can be used for any application requiring fast and accurate transcription, i.e. a courthouse. This research creates such a model using the TDNN-OPGRU architecture and trains it on two datasets of recorded English speech - "TIMIT" for prewritten sentences being read out (prepared/read speech) and "Buckeye" for recorded interviews (spontaneous speech). The results of the model are analyzed and compared to similar research. The main conclusion is that the results obtained do not exceed previous research and in some cases are considerably worse. The reasoning for that is also included.

Files

RP_1_.pdf

(pdf | 0.332 Mb)

License info not available