TA

T. Alderliesten

info

Please Note

37 records found

Conference paper (2025) - Joe Harrison, Tanja Alderliesten, Peter A.N. Bosman
In Symbolic Regression (SR), achieving a proper balance between accuracy and interpretability remains a key challenge. The Genetic Programming variant of the Gene-pool Optimal Mixing Evolutionary Algorithm (GP-GOMEA) is of particular interest as it achieves state-of-the-art performance using a template that limits the size of expressions. A recently introduced expansion, modular GP-GOMEA, is capable of decomposing expressions using multiple subexpressions, further increasing chances of interpretability. However, modular GP-GOMEA may create larger expressions, increasing the need to balance size and accuracy. A multi-objective variant of GP-GOMEA exists, which can be used, for instance, to optimize for size and accuracy simultaneously, discovering their trade-off. However, even with enhancements that we propose in this paper to improve the performance of multi-objective modular GP-GOMEA, when optimizing for size and accuracy, the single-objective version in which a multi-objective archive is used only for logging, still consistently finds a better average hypervolume. We consequently analyze when a single-objective approach should be preferred. Additionally, we explore an objective that stimulates re-use in multi-objective modular GP-GOMEA. ...
Conference paper (2025) - Renzo Scholman, T. Alderliesten, Peter Bosman
The Gene-pool Optimal Mixing EA (GOMEA) family of EAs offers a specific means to exploit problem-specific knowledge through linkage learning, i.e., inter-variable dependency detection, expressed using subsets of variables, that should undergo joint variation. Such knowledge can be exploited if faster fitness evaluations are possible when only a few variables are changed in a solution, enabling large speed-ups. The recent-most version of Real-Valued GOMEA (RV-GOMEA) can learn a conditional linkage model during optimization using fitness-based linkage learning, enabling fine-grained dependency exploitation in learning and sampling a Gaussian distribution. However, while the most efficient Gaussian-based EAs, like NES and CMA-ES, employ incremental learning of the Gaussian distribution rather than performing full re-estimation every generation, the recent-most RV-GOMEA version does not employ such incremental learning. In this paper, we therefore study whether incremental distribution estimation can lead to efficiency enhancements of RV-GOMEA. We consider various benchmark problems with varying degrees of overlapping dependencies. We find that, compared to RV-GOMEA and VKD-CMA-ES, the required number of evaluations to reach high-quality solutions can be reduced by a factor of up to 1.5 if population sizes are tuned problem-specifically, while a reduction by a factor of 2–3 can be achieved with generic population-sizing guidelines. ...
Conference paper (2025) - Mafalda Malafaia, Thalea Schlender, Tanja Alderliesten, Peter A.N. Bosman
Real-world problems are often dependent on multiple data modalities, making multimodal fusion essential for leveraging diverse information sources. In high-stakes domains, such as in healthcare, understanding how each modality contributes to the prediction is critical to ensure trustworthy and interpretable AI models. We present MultiFIX, an interpretability-driven multimodal data fusion pipeline that explicitly engineers distinct features from different modalities and combines them to make the final prediction. Initially, only deep learning components are used to train a model from data. The black-box (deep learning) components are subsequently either explained using post-hoc methods such as Grad-CAM for images or fully replaced by interpretable blocks, namely symbolic expressions for tabular data, resulting in an explainable model. We study the use of MultiFIX using several training strategies for feature extraction and predictive modeling. Besides highlighting strengths and weaknesses of MultiFIX, experiments on a variety of synthetic datasets with varying degrees of interaction between modalities demonstrate that MultiFIX can generate multimodal models that can be used to accurately explain both the extracted features and their integration without compromising predictive performance. ...
Conference paper (2025) - Mafalda Malafaia, Thalea Schlender, Peter A.N. Bosman, Tanja Alderliesten
Medical applications often involve several data modalities, particularly medical images and clinical information, which can be combined to enhance the decision-making process by improving accuracy. Multimodal learning approaches can leverage all available data for increased robustness in the resulting models, consequently outperforming unimodal approaches. Furthermore, AI frameworks must be human-verifiable and interpretable to be deployed in real-world situations, considering legal and privacy aspects. Due to the opaque nature of Deep Learning (DL) methods, interpretability is often limited despite their state-of-the-art performance in many tasks. Genetic Programming (GP) can provide compact and interpretable symbolic expressions for tabular data but is less effective for image analysis. We introduce MultiFIX: a new interpretability-focused pipeline for multimodal learning that leverages the strengths of DL and GP to explicitly engineer features from different data types and combine them to make the final prediction. The MultiFIX pipeline comprises two stages: the training stage, where a DL (black-box) model is trained using different training procedures to extract relevant features from each modality; and the inference stage, where the resulting model is transformed to be interpretable. Image features are explained with attention maps by Grad-CAM, and inherently interpretable symbolic expressions evolved with GP fully replace the tabular feature engineering block, and the fusion of the extracted features to predict the target label. To show the application potential of the presented pipeline, we demonstrate MultiFIX with a Melanoma Risk Assessment dataset. Results show that MultiFIX outperforms unimodal models while offering explanations that can be straightforwardly analysed and are consistent with the expectations. ...
Journal article (2025) - Arkadiy Dushatskiy, Peter A.N. Bosman, Karel A. Hinnen, Jan Wiersma, Henrike Westerveld, Bradley R. Pieters, Tanja Alderliesten
Background and Purpose: Recently, we introduced a novel Deep Learning (DL) based (semi-)automatic method for medical image segmentation. Unlike classical DL segmentation methods, it produces multiple segmentation variants (reflecting the variation of manual segmentations) instead of just one. Potentially, with this approach, there is a higher chance that a clinician prefers one of automatically produced segmentation variants. This work focuses on evaluating this method on prostate segmentation in MRI scans used for brachytherapy and investigating its potential clinical usefulness. Materials and Methods: Three experienced radiation oncologists graded (per-slice and per-scan) segmentations produced by our method, reference segmentations (manually created and used for brachytherapy treatment planning) and segmentations produced by a classical DL method. The study was retrospective and the way the segmentation was generated (our method, classical DL method, or manually) was blinded for the clinicians. The grades reflect the amount of manual correction required. Additionally, the clinicians were asked to rank segmentations to evaluate which one is preferred for each scan. The study was performed on 13 prostate cancer patients. Results: Segmentations produced by our method are graded as requiring no manual correction in 292/576 (51 %) slices compared to 240/576 (42 %) slices in the case of the segmentations produced by a classical DL method. Furthermore, in fewer slices, 38 (6.6 %) vs. 48 (8.3 %), segmentations by our method were graded as unacceptable. Conclusion: Our study has demonstrated that deep-learning-based segmentation methods can produce high-quality segmentations. Our method was evaluated better than a classical DL method, indicating the potential for integration into clinical practice. ...
Review (2024) - Leah R.M. Dickhoff, Renzo J. Scholman, Danique L.J. Barten, Ellen M. Kerkhof, Jelmen J. Roorda, Laura A. Velema, Lukas J.A. Stalpers, Bradley R. Pieters, Peter A.N. Bosman, Tanja Alderliesten
PURPOSE: Without a clear definition of an optimal treatment plan, no optimization model can be perfect. Therefore, instead of automatically finding a single “optimal” plan, finding multiple, yet different near-optimal plans, can be an insightful approach to support radiation oncologists in finding the plan they are looking for. METHODS AND MATERIALS: BRIGHT is a flexible AI-based optimization method for brachytherapy treatment planning that has already been shown capable of finding high-quality plans that trade-off target volume coverage and healthy tissue sparing. We leverage the flexibility of BRIGHT to find plans with similar dose-volume criteria, yet different dose distributions. We further describe extensions that facilitate fast plan adaptation should planning aims need to be adjusted, and straightforwardly allow incorporating hospital-specific aims besides standard protocols. RESULTS: Results are obtained for prostate (n = 12) and cervix brachytherapy (n = 36). We demonstrate the possible differences in dose distribution for optimized plans with equal dose-volume criteria. We furthermore demonstrate that adding hospital-specific aims enables adhering to hospital-specific practice while still being able to automatically create cervix plans that more often satisfy the EMBRACE-II protocol than clinical practice. Finally, we illustrate the feasibility of fast plan adaptation. CONCLUSIONS: Methods such as BRIGHT enable new ways to construct high-quality treatment plans for brachytherapy while offering new insights by making explicit the options one has. In particular, it becomes possible to present to radiation oncologists a manageable set of alternative plans that, from an optimization perspective are equally good, yet differ in terms of coverage-sparing trade-offs and shape of the dose distribution. ...

B-Spline-based vs. Mesh-based Multi-Objective Deformable Image Registration

Conference paper (2024) - Georgios Andreadis, Joas I. Mulder, Anton Bouter, Peter A.N. Bosman, Tanja Alderliesten
The transformation model is an essential component of any deformable image registration approach. It provides a representation of physical deformations between images, thereby defining the range and realism of registrations that can be found. Two types of transformation models have emerged as popular choices: B-spline models and mesh models. Although both models have been investigated in detail, a direct comparison has not yet been made, since the models are optimized using very different optimization methods in practice. B-spline models are predominantly optimized using gradient-descent methods, while mesh models are typically optimized using finite-element method solvers or evolutionary algorithms. Multi-objective optimization methods, which aim to find a diverse set of high-quality trade-off registrations, are increasingly acknowledged to be important in deformable image registration. Since these methods search for a diverse set of registrations, they can provide a more complete picture of the capabilities of different transformation models, making them suitable for a comparison of models. In this work, we conduct the first direct comparison between B-spline and mesh transformation models, by optimizing both models with the same state-of-the-art multi-objective optimization method, the Multi-Objective Real-Valued Gene-pool Optimal Mixing Evolutionary Algorithm (MO-RV-GOMEA). The combination with B-spline transformation models, moreover, is novel. We experimentally compare both models on two different registration problems that are both based on pelvic CT scans of cervical cancer patients, featuring large deformations. Our results, on three cervical cancer patients, indicate that the choice of transformation model can have a profound impact on the diversity and quality of achieved registration outcomes. ...
Conference paper (2023) - Alexander Chebykin, Arkadiy Dushatskiy, Tanja Alderliesten, Peter Bosman
In this work, we show that simultaneously training and mixing neural networks is a promising way to conduct Neural Architecture Search (NAS). For hyperparameter optimization, reusing the partially trained weights allows for efficient search, as was previously demonstrated by the Population Based Training (PBT) algorithm. We propose PBT-NAS, an adaptation of PBT to NAS where architectures are improved during training by replacing poorly-performing networks in a population with the result of mixing well-performing ones and inheriting the weights using the shrink-perturb technique. After PBT-NAS terminates, the created networks can be directly used without retraining. PBT-NAS is highly parallelizable and effective: on challenging tasks (image generation and reinforcement learning) PBT-NAS achieves superior performance compared to baselines (random search and mutation-based PBT). ...

What Works in Gradient-Based Coefficient Optimisation for Symbolic Regression'

Conference paper (2023) - Joe Harrison, Marco Virgolin, Tanja Alderliesten, Peter Bosman
The aim of Symbolic Regression (SR) is to discover interpretable expressions that accurately describe data. The accuracy of an expression depends on both its structure and coefficients. To keep the structure simple enough to be interpretable, effective coefficient optimisation becomes key. Gradient-based optimisation is clearly effective at training neural networks in Deep Learning (DL), which can essentially be viewed as large, over-parameterised expressions: in this paper, we study how gradient-based optimisation techniques as often used in DL transfer to SR. In particular, we first assess what techniques work well across random SR expressions, independent of any specific SR algorithm. We find that mini-batching and gradient-clipping can be helpful (similar to DL), while second-order optimisers outperform first-order ones (different from DL). Next, we consider whether including gradient-based optimisation in Genetic Programming (GP), a classic SR algorithm, is beneficial. On five real-world datasets, in a generation-based comparison, we find that second-order optimisation outperforms coefficient mutation (or no optimisation). However, in time-based comparisons, performance gaps shrink substantially because the computational expensiveness of second-order optimisation causes GP to perform fewer generations. The interplay of computational costs between the optimisation of structure and coefficients is thus a critical aspect to consider. ...
Conference paper (2023) - Arthur Guijt, Dirk Thierens, Tanja Alderliesten, Peter A.N. Bosman
In a parallel EA one can strictly adhere to the generational clock, and wait for all evaluations in a generation to be done. However, this idle time limits the throughput of the algorithm and wastes computational resources. Alternatively, an EA can be made asynchronous parallel. However, EAs using classic recombination and selection operators (GAs) are known to suffer from an evaluation time bias, which also influences the performance of the approach. Model-Based Evolutionary Algorithms (MBEAs) are more scalable than classic GAs by virtue of capturing the structure of a problem in a model. If this model is learned through linkage learning based on the population, the learned model may also capture biases. Thus, if an asynchronous parallel MBEA is also affected by an evaluation time bias, this could result in learned models to be less suited to solving the problem, reducing performance. Therefore, in this work, we study the impact and presence of evaluation time biases on MBEAs in an asynchronous parallelization setting, and compare this to the biases in GAs. We find that a modern MBEA, GOMEA, is unaffected by evaluation time biases, while the more classical MBEA, ECGA, is affected, much like GAs are. ...
Conference paper (2023) - Timo M. Deist, Monika Grewal, Frank J.W.M. Dankers, Tanja Alderliesten, Peter A.N. Bosman
Real-world problems are often multi-objective, with decision-makers unable to specify a priori which trade-off between the conflicting objectives is preferable. Intuitively, building machine learning solutions in such cases would entail providing multiple predictions that span and uniformly cover the Pareto front of all optimal trade-off solutions. We propose a novel approach for multi-objective training of neural networks to approximate the Pareto front during inference. In our approach, we train the neural networks multi-objectively using a dynamic loss function, wherein each network’s losses (corresponding to multiple objectives) are weighted by their hypervolume maximizing gradients. Experiments on different multi-objective problems show that our approach returns well-spread outputs across different trade-offs on the approximated Pareto front without requiring the trade-off vectors to be specified a priori. Further, results of comparisons with the state-of-the-art approaches highlight the added value of our proposed approach, especially in cases where the Pareto front is asymmetric. ...
Journal article (2023) - Monika Grewal, Jan Wiersma, Henrike Westerveld, Peter A.N. Bosman, Tanja Alderliesten
Purpose: Deformable image registration (DIR) can benefit from additional guidance using corresponding landmarks in the images. However, the benefits thereof are largely understudied, especially due to the lack of automatic landmark detection methods for three-dimensional (3D) medical images. Approach: We present a deep convolutional neural network (DCNN), called DCNN-Match, that learns to predict landmark correspondences in 3D images in a self-supervised manner. We trained DCNN-Match on pairs of computed tomography (CT) scans containing simulated deformations. We explored five variants of DCNN-Match that use different loss functions and assessed their effect on the spatial density of predicted landmarks and the associated matching errors. We also tested DCNN-Match variants in combination with the open-source registration software Elastix to assess the impact of predicted landmarks in providing additional guidance to DIR. Results: We tested our approach on lower abdominal CT scans from cervical cancer patients: 121 pairs containing simulated deformations and 11 pairs demonstrating clinical deformations. The results showed significant improvement in DIR performance when landmark correspondences predicted by DCNN-Match were used in the case of simulated (p = 0e0) as well as clinical deformations (p = 0.030). We also observed that the spatial density of the automatic landmarks with respect to the underlying deformation affect the extent of improvement in DIR. Finally, DCNN-Match was found to generalize to magnetic resonance imaging scans without requiring retraining, indicating easy applicability to other datasets. Conclusions: DCNN-match learns to predict landmark correspondences in 3D medical images in a self-supervised manner, which can improve DIR performance. ...
Conference paper (2022) - Alexander Chebykin, Tanja Alderliesten, Peter A.N. Bosman
To achieve excellent performance with modern neural networks, having the right network architecture is important. Neural Architecture Search (NAS) concerns the automatic discovery of task-specific network architectures. Modern NAS approaches leverage super-networks whose subnetworks encode candidate neural network architectures. These subnetworks can be trained simultaneously, removing the need to train each network from scratch, thereby increasing the efficiency of NAS. A recent method called Neural Architecture Transfer (NAT) further improves the efficiency of NAS for computer vision tasks by using a multi-objective evolutionary algorithm to find high-quality subnetworks of a supernetwork pretrained on ImageNet. Building upon NAT, we introduce ENCAS - - Evolutionary Neural Cascade Search. ENCAS can be used to search over multiple pretrained supernetworks to achieve a trade-off front of cascades of different neural network architectures, maximizing accuracy while minimizing FLOPs count. We test ENCAS on common computer vision benchmarks (CIFAR-10, CIFAR-100, ImageNet) and achieve Pareto dominance over previous state-of-the-art NAS models up to 1.5 GFLOPs. Additionally, applying ENCAS to a pool of 518 publicly available ImageNet classifiers leads to Pareto dominance in all computation regimes and to increasing the maximum accuracy from 88.6% to 89.0%, accompanied by an 18% decrease in computation effort from 362 to 296 GFLOPs. ...
Conference paper (2022) - Arkadiy Dushatskiy, Tanja Alderliesten, Peter A.N. Bosman
Neural Architecture Search (NAS) has recently become a topic of great interest. However, there is a potentially impactful issue within NAS that remains largely unrecognized: noise. Due to stochastic factors in neural network initialization, training, and the chosen train/validation dataset split, the performance evaluation of a neural network architecture, which is often based on a single learning run, is also stochastic. This may have a particularly large impact if a dataset is small. We therefore propose to reduce this noise by evaluating architectures based on average performance over multiple network training runs using different random seeds and cross-validation. We perform experiments for a combinatorial optimization formulation of NAS in which we vary noise reduction levels. We use the same computational budget for each noise level in terms of network training runs, i.e., we allow less architecture evaluations when averaging over more training runs. Multiple search algorithms are considered, including evolutionary algorithms which generally perform well for NAS. We use two publicly available datasets from the medical image segmentation domain where datasets are often limited and variability among samples is often high. Our results show that reducing noise in architecture evaluations enables finding better architectures by all considered search algorithms. ...
Conference paper (2022) - Arkadiy Dushatskiy, Gerry Lowe, Peter A.N. Bosman, Tanja Alderliesten
Deep learning algorithms have become the golden standard for segmentation of medical imaging data. In most works, the variability and heterogeneity of real clinical data is acknowledged to still be a problem. One way to automatically overcome this is to capture and exploit this variation explicitly. Here, we propose an approach that improves on our previous work in this area and explain how it potentially can improve clinical acceptance of (semi-)automatic segmentation methods. In contrast to a standard neural network that produces one segmentation, we propose to use a multi-path Unet network that produces multiple segmentation variants, presumably corresponding to the variations that reside in the dataset. Different paths of the network are trained on disjoint data subsets. Because a priori it may be unclear what variations exist in the data, the subsets should be automatically determined. This is achieved by searching for the best data partitioning with an evolutionary optimization algorithm. Because each network path can become more specialized when trained on a more homogeneous data subset, better segmentation quality can be achieved. In practical usage, various automatically produced segmentations can be presented to a medical expert, from which the preferred segmentation can be selected. In experiments with a real clinical dataset of CT scans with prostate segmentations, our approach provides an improvement of several percentage points in terms of Dice and surface Dice coefficients compared to when all network paths are trained on all training data. Noticeably, the largest improvement occurs in the upper part of the prostate that is known to be most prone to inter-observer segmentation variation. ...
Conference paper (2022) - Arthur Guijt, Dirk Thierens, Tanja Alderliesten, Peter A.N. Bosman
Model-Based Evolutionary Algorithms (MBEAs) can be highly scalable by virtue of linkage (or variable interaction) learning. This requires, however, that the linkage model can capture the exploitable structure of a problem. Usually, a single type of linkage structure is attempted to be captured using models such as a linkage tree. However, in practice, problems may exhibit multiple linkage structures. This is for instance the case in multi-objective optimization when the objectives have different linkage structures. This cannot be modelled sufficiently well when using linkage models that aim at capturing a single type of linkage structure, deteriorating the advantages brought by MBEAs. Therefore, here, we introduce linkage kernels, whereby a linkage structure is learned for each solution over its local neighborhood. We implement linkage kernels into the MBEA known as GOMEA that was previously found to be highly scalable when solving various problems. We further introduce a novel benchmark function called Best-of-Traps (BoT) that has an adjustable degree of different linkage structures. On both BoT and a worst-case scenario-based variant of the well-known MaxCut problem, we experimentally find a vast performance improvement of linkage-kernel GOMEA over GOMEA with a single linkage tree as well as the MBEA known as DSMGA-II. ...
Journal article (2022) - Kelly L.A. van Bindsbergen, Hinke van der Hoek, Marloes van Gorp, Mike E.U. Ligthart, Koen V. Hindriks, Mark A. Neerincx, Tanja Alderliesten, Peter A.N. Bosman, Johannes H.M. Merks, More authors...
Objectives: Children with cancer often experience sleep problems, which are associated with many negative physical and psychological health outcomes, as well as with a lower quality of life. Therefore, interventions are strongly required to improve sleep in this population. We evaluated interactive education with respect to sleep hygiene with a social robot at a pediatric oncology outpatient clinic regarding the feasibility, experiences, and preliminary effectiveness. Methods: Researchers approached children (8 to 12 years old) who were receiving anticancer treatment and who were visiting the outpatient clinic with their parents during the two-week study period. The researchers completed observation forms regarding feasibility, and parents completed the Children’s Sleep Hygiene Scale before and two weeks after the educational regimen. The experiences of children and parents were evaluated in semi-structured interviews. We analyzed open answers by labeling each answer with a topic reflecting the content and collapsed these topics into categories. We used descriptive statistics to describe the feasibility and experiences, and a dependent-samples t-test to evaluate the preliminary effectiveness. Results: Twenty-eight families participated (58% response rate) and all interactions with the robot were completed. The children and parents reported that they learned something new (75% and 50%, respectively), that they wanted to learn from the robot more often (83% and 75%, respectively), and that they applied the sleeping tips from the robot afterwards at home (54%). Regarding the preliminary effectiveness, children showed a statistically significant improvement in their sleep hygiene (p = 0.047, d = 0.39). Conclusions: Providing an educational regimen on sleep hygiene in a novel, interactive way by using a social robot at the outpatient clinic seemed feasible, and the children and parents mostly exhibited positive reactions. We found preliminary evidence that the sleep hygiene of children with cancer improved. ...
Conference paper (2022) - Dazhuang Liu, Marco Virgolin, Tanja Alderliesten, Peter A.N. Bosman
Genetic programming (GP) is one of the best approaches today to discover symbolic regression models. To find models that trade off accuracy and complexity, the non-dominated sorting genetic algorithm II (NSGA-II) is widely used. Unfortunately, it has been shown that NSGA-II can be inefficient: in early generations, low-complexity models over-replicate and take over most of the population. Consequently, studies have proposed different approaches to promote diversity. Here, we study the root of this problem, in order to design a superior approach. We find that the over-replication of low complexity-models is due to a lack of evolvability, i.e., the inability to produce offspring with improved accuracy. We therefore extend NSGA-II to track, over time, the evolvability of models of different levels of complexity. With this information, we limit how many models of each complexity level are allowed to survive the generation. We compare this new version of NSGA-II, evoNSGA-II, with the use of seven existing multi-objective GP approaches on ten widely-used data sets, and find that evoNSGA-II is equal or superior to using these approaches in almost all comparisons. Furthermore, our results confirm that evoNSGA-II behaves as intended: models that are more evolvable form the majority of the population. Code: https://github.com/dzhliu/evoNSGA-II ...
Conference paper (2022) - Martijn M.A. Bosma, Arkadiy Dushatskiy, Monika Grewal, Tanja Alderliesten, Peter A.N. Bosman
Deep Neural Networks (DNNs) have the potential for making various clinical procedures more time-efficient by automating medical image segmentation. Due to their strong, in some cases human-level, performance, they have become the standard approach in this field. The design of the best possible medical image segmentation DNNs, however, is task-specific. Neural Architecture Search (NAS), i.e., the automation of neural network design, has been shown to have the capability to outperform manually designed networks for various tasks. However, the existing NAS methods for medical image segmentation have explored a quite limited range of types of DNN architectures that can be discovered. In this work, we propose a novel NAS search space for medical image segmentation networks. This search space combines the strength of a generalised encoder-decoder structure, well known from U-Net, with network blocks that have proven to have a strong performance in image classification tasks. The search is performed by looking for the best topology of multiple cells simultaneously with the configuration of each cell within, allowing for interactions between topology and cell-level attributes. From experiments on two publicly available datasets, we find that the networks discovered by our proposed NAS method have better performance than well-known handcrafted segmentation networks, and outperform networks found with other NAS approaches that perform only topology search, and topology-level search followed by cell-level search. ...
Conference paper (2022) - Joe Harrison, Tanja Alderliesten, Peter A.N. Bosman
Genetic Programming (GP) can make an important contribution to explainable artificial intelligence because it can create symbolic expressions as machine learning models. Nevertheless, to be explainable, the expressions must not become too large. This may, however, limit their potential to be accurate. The re-use of subexpressions has the unique potential to mitigate this issue. The Genetic Programming Gene-pool Optimal Mixing Evolutionary Algorithm (GP-GOMEA) is a recent model-based GP approach that has been found particularly capable of evolving small expressions. However, its tree representation offers no explicit mechanisms to re-use subexpressions. By contrast, the graph representation in Cartesian GP (CGP) is natively capable of re-use. For this reason, we introduce CGP-GOMEA, a variant of GP-GOMEA that uses graphs instead of trees. We experimentally compare various configurations of CGP-GOMEA with GP-GOMEA and find that CGP-GOMEA performs on par with GP-GOMEA on three common datasets. Moreover, CGP-GOMEA is found to produce models that re-use subexpressions more often than GP-GOMEA uses duplicate subexpressions. This indicates that CGP-GOMEA has unique added potential, allowing to find even smaller expressions than GP-GOMEA with similar accuracy. ...