Testing The Performance Of Minimal Models Trained On Sparse Data

None, None

Testing The Performance Of Minimal Models Trained On Sparse Data

Bachelor Thesis (2025)

Author(s)

M.J. Pieters (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

S.E. Verwer – Mentor (TU Delft - Algorithmics)

Simon Dieck – Mentor (TU Delft - Algorithmics)

Faculty

Electrical Engineering, Mathematics and Computer Science

To reference this document use:

https://resolver.tudelft.nl/uuid:7ba52060-9553-47ff-b659-14a4295cf202

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

25-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Deterministic Finite Automata (DFA) learning is the problem of reconstructing a DFA from its traces. For the development of methods for this problem, randomly sampled data is often used to train and test the performance of models. The choice of sampling technique can result in data sets with unforseen properties. The technique used in the STAMINA competition is such that that the number of final states and the size of alphabet were thought to potentially effect the test performance of resultant models. This was tested experimentally, by comparing the test performances of minimal models identified on traces from differently constructed DFAs. It was found that, although an increase in alphabet size results in overall longer traces that vary more with length, test performance still struggled. This shows that a DFA with a larger alphabet will need more traces than an equivalent smaller DFA. Additionally, it was found that the number of final states had a significant effect in the resulting test performance, and had a significant effect on the length of sampled traces. It was found that increased node count did not have an effect on sampled word length, and resulted in worse test performance.

Files

Thesis-final-for-real.pdf

(pdf | 0.67 Mb)

License info not available