Ensemble Techniques for DFA Learning

DFA Ensembles without Suitability Metrics

Bachelor Thesis (2025)
Author(s)

G.T. Kontos (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

S.E. Verwer – Mentor (TU Delft - Algorithmics)

S. Dieck – Graduation committee member (TU Delft - Algorithmics)

N.M. Gürel – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
26-06-2025
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Deterministic Finite Automata (DFAs) are interpretable classification models, typically learned through merging states of a large tree-like automaton, an Augmented Prefix Tree Acceptor (APTA), according to heuristic suitability metrics. This paper introduces an ensembling approach for DFAs that does not depend on such heuristics. Starting from the APTA, we construct diverse automata by applying randomized sequences of state merges, while avoiding repetition of merges whenever possible. We also propose a novel graph-connectivity-based metric for inter-model variety. Experimental results on the STAMINA competition datasets yield improved predictive performance compared to models learned using state-of-the-art heuristics on sparse datasets, as well as a tight connection between inter-model variety and performance.

Files

License info not available