Probabilistic deterministic Finite Automata (PDFA) learning is a machine learning method used for tasks requiring human understandability and more formal validation. In recent years we saw numerous applications of ensemble techniques with other machine learning models such as dec
...
Probabilistic deterministic Finite Automata (PDFA) learning is a machine learning method used for tasks requiring human understandability and more formal validation. In recent years we saw numerous applications of ensemble techniques with other machine learning models such as decision trees. Following the success of these attempts, in this paper, we aim to integrate ensemble methods into Alergia, which is a famous algorithm in the PDFA learning realm. We present a randomized variation of the Alergia algorithm and show how to build an ensemble out of it. Such an ensemble can visibly outperform a single Alergia model, which is documented by a series of experiments. Next, we present a custom distance metric measuring dissimilarity between a pair of Alergia models. We show how it can be used to build an Inter-Model Variety score quantifying the overall diversity of a group of models. Lastly, we analyze several methods that strive to select a well-performing diverse ensemble out of a big population of generated models.