Learning the Model Structure of Dynamic Bayesian Networks for Automated Speech Recognition

None, None

Learning the Model Structure of Dynamic Bayesian Networks for Automated Speech Recognition

Master Thesis (2010)

Author(s)

G. Harahap

Contributor(s)

C.M. Jonker – Mentor

P. Wiggers – Mentor

H.G. Gross – Mentor

Copyright

Artificial Intelligence Bayesian Network Data mining Model learning Automated speech recognition Language model Search algorithm

To reference this document use:

https://resolver.tudelft.nl/uuid:7129f512-d59e-4644-b7cd-a6d1552a9106

More Info

expand_more

Publication Year

2010

Copyright

Downloads counter

63

Collections

thesis

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Improving the performance of Automated Speech Recognition system requires incorporating more knowledge in the model of Automated Speech Recognition system. Information such as the context of the conversation and the characteristics of the speaker can make the task of recognizing speech more accurate. The challenge is how this knowledge can be incorporated in the model of Automated Speech Recognition easily. The answer to this challenge is in using Dynamic Bayesian Network as the model of Automated Speech Recognition. Dynamic Bayesian Network makes extending Automated Speech Recognition model with new knowledge easier by representing the new knowledge as new variable(s) in the model. However, having these variables designing the most optimal model is still not an easy task, especially when there are a large number of variables. In this thesis, a mechanism is developed to learn the Dynamic Bayesian Network model of Automated Speech Recognition system automatically. In essence, this mechanism can be decomposed into two important components, namely metric and search algorithm. The metric is a quantitative measure of how optimal the model is, while the search algorithm defines the process of learning the most optimal model. This thesis will focus on the model of ASR that has to do with the choice of word in a sentence and put less focus on the acoustic part of the model. For this purpose, a list of possible metrics and search algorithms are presented. For each of this metric and search algorithm, the details of the implementation are also provided. By testing each metric and search algorithm with artificial language and real conversational language, it will be discussed which metric and which search algorithm is suitable for learning the model of Automated Speech Recognition.

Files

Gherry_Harahap_-_Learning_the_... (pdf)

(pdf | 2.62 Mb)

License info not available