A theoretical analysis of optimal and heuristic methods for DFA learning

None, None

A theoretical analysis of optimal and heuristic methods for DFA learning

Bachelor’s Degree Thesis

Bachelor Thesis (2025)

Author(s)

H. Radu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Sicco Verwer – Mentor (TU Delft - Algorithmics)

Simon Dieck – Mentor (TU Delft - Algorithmics)

Soham Chakraborty – Graduation committee member (TU Delft - Programming Languages)

Faculty

Electrical Engineering, Mathematics and Computer Science

Heuristics Theoretical analysis DFA learning Optimal methods

To reference this document use:

https://resolver.tudelft.nl/uuid:70b4510d-7e96-4921-9ede-46619d65434f

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

26-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Deterministic finite automata (DFA) are interpretable models used for classification and prediction tasks based on sequence data. They often act as surrogate models for software systems. Plenty of methods exist for the purpose of DFA learning. Examples include optimal algorithms such as SAT-based encoding and various heuristic methods as the likes of the BlueFringe framework of the EDSM algorithm. By definition, optimal algorithms can guarantee a minimal DFA consistent with the training data, but this does not exclude the possibility of heuristics also finding the optimal solution. However, it is generally believed that optimal methods could always require strictly less data to learn such a minimal model than their counterparts. In our research, we provide mathematical proofs and counter-examples that show the above statement to be false. We further demonstrate that, unless formally defined, there exist numerous languages and settings where heuristics outperform optimal methods on data efficiency benchmarks. Finally, we prove that optimal methods are equal to the BlueFringe framework in terms of optimistic learning efficiency.

Files

RP_final_paper.pdf

(pdf | 0.529 Mb)

License info not available