Adapting the EDSM Algorithm for Ensemble Learning

A Machine Learning Approach to DFA Inference

Bachelor Thesis (2025)
Author(s)

R. Dumitru (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

S.E. Verwer – Mentor (TU Delft - Algorithmics)

Simon Dieck – Mentor (TU Delft - Algorithmics)

N.M. Gürel – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
26-06-2025
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Learning Deterministic Finite Automata (DFA) from given input data has been a central task in the field of Grammatical Inference, and progress in this area is of great interest from both theoretical and practical points of view. To address this challenge, several algorithms have been proposed and evaluated using established benchmarks. One such competition-winning algorithm, Evidence Driven State Merging (EDSM), uses a heuristic to learn a DFA from given data. However, improvements leveraging ensemble techniques from machine learning have yet to be explored. In this paper, we investigate ways to adapt the EDSM algorithm to fit into the ensemble learning framework and analyze the performance of such obtained models when applied to unseen data. To this end, we compare the performance of the ensembles to that of a standard EDSM-learned model, evaluating both their output quality and the diversity within each ensemble. The results indicate significant improvements in scenarios where the data is sparse.

Files

RP_paper_final.pdf
(pdf | 2.61 Mb)
License info not available