Adapting the EDSM Algorithm for Ensemble Learning

None, None

Adapting the EDSM Algorithm for Ensemble Learning

A Machine Learning Approach to DFA Inference

Bachelor Thesis (2025)

Author(s)

R. Dumitru (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

S.E. Verwer – Mentor (TU Delft - Algorithmics)

Simon Dieck – Mentor (TU Delft - Algorithmics)

N.M. Gürel – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Machine Learning Heuristic Language Identification Finite State Automata

To reference this document use:

https://resolver.tudelft.nl/uuid:31c0142c-330d-4617-b59a-abd4bd0c3d6c

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

26-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Learning Deterministic Finite Automata (DFA) from given input data has been a central task in the field of Grammatical Inference, and progress in this area is of great interest from both theoretical and practical points of view. To address this challenge, several algorithms have been proposed and evaluated using established benchmarks. One such competition-winning algorithm, Evidence Driven State Merging (EDSM), uses a heuristic to learn a DFA from given data. However, improvements leveraging ensemble techniques from machine learning have yet to be explored. In this paper, we investigate ways to adapt the EDSM algorithm to fit into the ensemble learning framework and analyze the performance of such obtained models when applied to unseen data. To this end, we compare the performance of the ensembles to that of a standard EDSM-learned model, evaluating both their output quality and the diversity within each ensemble. The results indicate significant improvements in scenarios where the data is sparse.

Files

RP_paper_final.pdf

(pdf | 2.61 Mb)

License info not available