Ensemble Techniques for DFA Learning
DFA Ensembles without Suitability Metrics
G.T. Kontos (TU Delft - Electrical Engineering, Mathematics and Computer Science)
S.E. Verwer – Mentor (TU Delft - Algorithmics)
S. Dieck – Graduation committee member (TU Delft - Algorithmics)
N.M. Gürel – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Deterministic Finite Automata (DFAs) are interpretable classification models, typically learned through merging states of a large tree-like automaton, an Augmented Prefix Tree Acceptor (APTA), according to heuristic suitability metrics. This paper introduces an ensembling approach for DFAs that does not depend on such heuristics. Starting from the APTA, we construct diverse automata by applying randomized sequences of state merges, while avoiding repetition of merges whenever possible. We also propose a novel graph-connectivity-based metric for inter-model variety. Experimental results on the STAMINA competition datasets yield improved predictive performance compared to models learned using state-of-the-art heuristics on sparse datasets, as well as a tight connection between inter-model variety and performance.