MIHEA in Action: A Reproduction Study and Evaluation on Learning Bayesian Networks from Data

More Info
expand_more

Abstract

Mixed-integer optimization problems, incorporating both discrete and continuous variables, present unique challenges across various domains such as computer science, finance, logistics, and healthcare. Evolutionary Algorithms (EAs) have emerged as powerful optimization techniques capable of tackling such complex problems in either the discrete or continuous domain. Model-based EAs, integrating machine learning techniques, have further improved the efficiency and scalability of these algorithms. The first algorithm to combine discrete and continuous model-based EAs is the Mixed-Integer Hybrid Evolutionary Algorithm (MIHEA). Despite its potential, MIHEA remains relatively underutilized in research endeavours. This thesis seeks to bridge this gap in research by applying the algorithm to a novel context: structure learning of Bayesian Networks (BNs).

BNs offer a transparent framework for probabilistic reasoning, making them well-suited for various applications. However, learning the structure and discretizations of BNs from data is a challenging task. It is not uncommon for the datasets to contain values of continuous nodes as well. Assuming no normality, these nodes in the data need to be discretized, since BNs are designed for discrete data. The optimal discretizations of these nodes depend on the structure of the BN, meaning that the discretization optimization should happen simultaneously with the structure learning. MIHEA holds promise for addressing this challenge by leveraging its mixed-integer optimization capabilities.

The investigation performed in this thesis starts with a reproduction study. The description of the code for MIHEA is shown to be inconsistent with the experiment results, which prompts a reproduction study resulting in a version of the code that more accurately fits the results.
Subsequently, MIHEA is applied to the structure learning of BNs, where discrete variables represent network structure and continuous variables encode discretizations of continuous nodes. Several solution representations are explored. The experiments show that MIHEA achieves similar or better performance than the state-of-the-art DBN-GOMEA approach on the task of recreating randomly generated BNs from data, at the cost of increased execution time.
These results demonstrate the potential of model-based mixed-integer EAs, particularly MIHEA, for BN structure learning from continuous data. The findings encourage further exploration and utilization of mixed-integer EAs in solving (real-world) problems involving BNs and continuous data.