Ensemble techniques for (P)DFA learning

Effect of changing the sequence orders on DFA ensembles learned via EDSM

Bachelor Thesis (2025)
Author(s)

W.M. Cupiał (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

S.E. Verwer – Mentor (TU Delft - Algorithmics)

Simon Dieck – Mentor (TU Delft - Algorithmics)

N.M. Gürel – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
26-06-2025
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Learning a Deterministic Finite Automaton (DFA) from a language sample is an essential problem in grammatical inference, with applications in various fields, such as modeling and analyzing software systems. In this work, we propose approaches to create an ensemble of DFAs learned with the Evidence Driven State Merging algorithm. To produce varying models from the given data, we introduce two algorithms for manipulating the sequence orders. Additionally, we propose a similarity metric that allows for reducing the ensemble size by discarding similar models. The proposed approaches were analyzed and empirically evaluated using the dataset used during the StaMinA competition. Experimental results demonstrate that the methods for obtaining ensembles of DFAs presented in this work provide a number of advantages over the single DFA learned from the classical prefix tree acceptor using EDSM.

Files

Research_paper.pdf
(pdf | 0.729 Mb)
License info not available