Ensemble Techniques for PDFA Learning

Diversity-Driven Ensemble Learning with the Alergia Algorithm

Bachelor Thesis (2025)
Author(s)

B. Łytkowski (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

S.E. Verwer – Mentor (TU Delft - Algorithmics)

S. Dieck – Graduation committee member (TU Delft - Algorithmics)

N.M. Gürel – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
26-06-2025
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Probabilistic deterministic Finite Automata (PDFA) learning is a machine learning method used for tasks requiring human understandability and more formal validation. In recent years we saw numerous applications of ensemble techniques with other machine learning models such as decision trees. Following the success of these attempts, in this paper, we aim to integrate ensemble methods into Alergia, which is a famous algorithm in the PDFA learning realm. We present a randomized variation of the Alergia algorithm and show how to build an ensemble out of it. Such an ensemble can visibly outperform a single Alergia model, which is documented by a series of experiments. Next, we present a custom distance metric measuring dissimilarity between a pair of Alergia models. We show how it can be used to build an Inter-Model Variety score quantifying the overall diversity of a group of models. Lastly, we analyze several methods that strive to select a well-performing diverse ensemble out of a big population of generated models.

Files

Research_paper_final.pdf
(pdf | 0.642 Mb)
License info not available