Learning the Model Structure of Dynamic Bayesian Networks for Automated Speech Recognition
G. Harahap
C.M. Jonker – Mentor
P. Wiggers – Mentor
H.G. Gross – Mentor
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Improving the performance of Automated Speech Recognition system requires incorporating more knowledge in the model of Automated Speech Recognition system. Information such as the context of the conversation and the characteristics of the speaker can make the task of recognizing speech more accurate. The challenge is how this knowledge can be incorporated in the model of Automated Speech Recognition easily. The answer to this challenge is in using Dynamic Bayesian Network as the model of Automated Speech Recognition. Dynamic Bayesian Network makes extending Automated Speech Recognition model with new knowledge easier by representing the new knowledge as new variable(s) in the model. However, having these variables designing the most optimal model is still not an easy task, especially when there are a large number of variables. In this thesis, a mechanism is developed to learn the Dynamic Bayesian Network model of Automated Speech Recognition system automatically. In essence, this mechanism can be decomposed into two important components, namely metric and search algorithm. The metric is a quantitative measure of how optimal the model is, while the search algorithm defines the process of learning the most optimal model. This thesis will focus on the model of ASR that has to do with the choice of word in a sentence and put less focus on the acoustic part of the model. For this purpose, a list of possible metrics and search algorithms are presented. For each of this metric and search algorithm, the details of the implementation are also provided. By testing each metric and search algorithm with artificial language and real conversational language, it will be discussed which metric and which search algorithm is suitable for learning the model of Automated Speech Recognition.