Deterministic Finite Automata (DFA) learning is the problem of reconstructing a DFA from its traces. For the development of methods for this problem, randomly sampled data is often used to train and test the performance of models. The choice of sampling technique can result in da
...
Deterministic Finite Automata (DFA) learning is the problem of reconstructing a DFA from its traces. For the development of methods for this problem, randomly sampled data is often used to train and test the performance of models. The choice of sampling technique can result in data sets with unforseen properties. The technique used in the STAMINA competition is such that that the number of final states and the size of alphabet were thought to potentially effect the test performance of resultant models. This was tested experimentally, by comparing the test performances of minimal models identified on traces from differently constructed DFAs. It was found that, although an increase in alphabet size results in overall longer traces that vary more with length, test performance still struggled. This shows that a DFA with a larger alphabet will need more traces than an equivalent smaller DFA. Additionally, it was found that the number of final states had a significant effect in the resulting test performance, and had a significant effect on the length of sampled traces. It was found that increased node count did not have an effect on sampled word length, and resulted in worse test performance.