Ensemble Techniques for PDFA Learning
Diversity-Driven Ensemble Learning with the Alergia Algorithm
B. Łytkowski (TU Delft - Electrical Engineering, Mathematics and Computer Science)
S.E. Verwer – Mentor (TU Delft - Algorithmics)
S. Dieck – Graduation committee member (TU Delft - Algorithmics)
N.M. Gürel – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Probabilistic deterministic Finite Automata (PDFA) learning is a machine learning method used for tasks requiring human understandability and more formal validation. In recent years we saw numerous applications of ensemble techniques with other machine learning models such as decision trees. Following the success of these attempts, in this paper, we aim to integrate ensemble methods into Alergia, which is a famous algorithm in the PDFA learning realm. We present a randomized variation of the Alergia algorithm and show how to build an ensemble out of it. Such an ensemble can visibly outperform a single Alergia model, which is documented by a series of experiments. Next, we present a custom distance metric measuring dissimilarity between a pair of Alergia models. We show how it can be used to build an Inter-Model Variety score quantifying the overall diversity of a group of models. Lastly, we analyze several methods that strive to select a well-performing diverse ensemble out of a big population of generated models.