C. Witteveen | TU Delft Repository

Improving Model-Based Genetic Programming for Symbolic Regression of Small Expressions

Journal article (2021) - M. Virgolin, T. Alderliesten, C. Witteveen, P. A.N. Bosman

The Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) is a model-based EA framework that has been shown to perform well in several domains, including Genetic Programming (GP). Differently from traditional EAs where variation acts blindly, GOMEA learns a model of interdependencies within the genotype, that is, the linkage, to estimate what patterns to propagate. In this article, we study the role of Linkage Learning (LL) performed by GOMEA in Symbolic Regression (SR). We show that the non-uniformity in the distribution of the genotype in GP populations negatively biases LL, and propose a method to correct for this. We also propose approaches to improve LL when ephemeral random constants are used. Furthermore, we adapt a scheme of interleaving runs to alleviate the burden of tuning the population size, a crucial parameter for LL, to SR. We run experiments on 10 real-world datasets, enforcing a strict limitation on solution size, to enable interpretability. We find that the new LL method outperforms the standard one, and that GOMEA outperforms both traditional and semantic GP. We also find that the small solutions evolved by GOMEA are competitive with tuned decision trees, making GOMEA a promising new approach to SR. ...

On a comparison of Newton–Raphson solvers for power flow problems

Journal article (2019) - Baljinnyam Sereeter, Cornelis Vuik, Cees Witteveen

A general framework is given for applying the Newton–Raphson method to solve power flow problems, using power and current-mismatch functions in polar, Cartesian coordinates and complex form. These two mismatch functions and three coordinates, result in six possible ways to apply the Newton–Raphson method for the solution of power flow problems. We present a theoretical framework to analyze these variants for load (PQ)buses and generator (PV)buses. Furthermore, we compare newly developed versions in this paper with existing variants of the Newton power flow method. The convergence behavior of all methods is investigated by numerical experiments on transmission and distribution networks. We conclude that variants using the polar current-mismatch and Cartesian current-mismatch functions that are developed in this paper, performed the best result for both distribution and transmission networks. ...

Linear power flow method improved with numerical analysis techniques applied to a very large network

Journal article (2019) - Baljaa Sereeter, Werner van Westering, Cornelis Vuik, Cees Witteveen

In this paper, we propose a fast linear power flow method using a constant impedance load model to simulate both the entire Low Voltage (LV) and Medium Voltage (MV) networks in a single simulation. Accuracy and efficiency of this linear approach are validated by comparing it with the Newton power flow algorithm and a commercial network design tool Vision on various distribution networks including real network data. Results show that our method can be as accurate as classical Nonlinear Power Flow (NPF) methods using a constant power load model and additionally, it is much faster than NPF computations. In our research, it is shown that voltage problems can be identified more efficiently when MV and LV are integrally evaluated. Moreover, Numerical Analysis (NA) techniques are applied to the Large Linear Power Flow (LLPF) problem with 27 million nonzeros in order to improve the computation time by studying the properties of the linear system. Finally, the original computation times of LLPF problems with real and complex components are reduced by 2.8 times and 5.7 times, respectively. ...

Optimal power flow formulations and their impacts on the performance of solution methods

Conference paper (2019) - Baljinnyam Sereeter, Cornelis Vuik, Cornelis Witteveen, Peter Palensky

In this paper, we study four equivalent mathematical formulations of the Optimal Power Flow (OPF) problem and their impacts on the performance of solution methods. We show how four mathematical formulations of the OPF problem can be obtained by rewriting equality constraints given as the power flow problem into four equivalent mathematical equations using power balance or current balance equations in polar or Cartesian coordinates while keeping the same physical formulation. All four mathematical formulations are implemented in Matpower. In order to identify the formulation that results in the best convergence characteristics for the solution method, we apply MIPS, KNITRO, and FMINCON on various test cases using three different initial conditions. We compare all four formulations in terms of impact factors on the solution method such a number of nonzero elements in the Jacobian and Hessian matrices, a number of iterations and computational time on each iteration. The numerical results show that the performance of the OPF solution method is not only dependent upon the choice of the solution method itself, but also upon the exact mathematical formulation used to specify the OPF problem. ...

Large-scale parallelization of partial evaluations in evolutionary algorithms for real-world problems

Conference paper (2018) - Anton Bouter, Tanja Alderliesten, Arjan Bel, Cees Witteveen, Peter Bosman

The importance and potential of Gray-Box Optimization (GBO) with evolutionary algorithms is becoming increasingly clear lately both for benchmark and real-world problems. We consider the GBO setting where partial evaluations are possible, meaning that sub-functions of the evaluation function are known and can be exploited to improve optimization efficiency. In this paper, we show that the efficiency of GBO can be greatly improved through large-scale parallelism, exploiting the fact that each evaluation function requires the calculation of a number of independent sub-functions. This is especially interesting for real-world problems where often the majority of the computational effort is spent on the evaluation function. Moreover, we show how the best parallelization technique largely depends on factors including the number of sub-functions and their required computation time, revealing that for different parts of the optimization the best parallelization technique should be selected based on these factors. As an illustration, we show how large-scale parallelization can be applied to optimization of high-dose-rate brachytherapy treatment plans for prostate cancer. We find that use of a modern Graphics Processing Unit (GPU) was the most efficient parallelization technique in all realistic scenarios, leading to substantial speed-ups up to a factor of 73. ...

Symbolic regression and feature construction with GP-GOMEA applied to radiotherapy dose reconstruction of childhood cancer survivors

Conference paper (2018) - Marco Virgolin, Tanja Alderliesten, Arjan Bel, Cees Witteveen, Peter A.N. Bosman

The recently introduced Gene-pool Optimal Mixing Evolutionary Algorithm for Genetic Programming (GP-GOMEA) has been shown to find much smaller solutions of equally high quality compared to other state-of-the-art GP approaches. This is an interesting aspect as small solutions better enable human interpretation. In this paper, an adaptation of GP-GOMEA to tackle real-world symbolic regression is proposed, in order to find small yet accurate mathematical expressions, and with an application to a problem of clinical interest. For radiotherapy dose reconstruction, a model is sought that captures anatomical patient similarity. This problem is particularly interesting because while features are patient-specific, the variable to regress is a distance, and is defined over patient pairs. We show that on benchmark problems as well as on the application, GP-GOMEA outperforms variants of standard GP. To find even more accurate models, we further consider an evolutionary meta learning approach, where GP-GOMEA is used to construct small, yet effective features for a different machine learning algorithm. Experimental results show how this approach significantly improves the performance of linear regression, support vector machines, and random forest, while providing meaningful and interpretable features. ...

On the feasibility of automatically selecting similar patients in highly individualized radiotherapy dose reconstruction for historic data of pediatric cancer survivors

Journal article (2018) - Marco Virgolin, Irma W.E.M. van Dijk, Jan Wiersma, Cécile M. Ronckers, Cees Witteveen, Arjan Bel, Peter A.N. Bosman

Purpose: The aim of this study is to establish the first step toward a novel and highly individualized three-dimensional (3D) dose distribution reconstruction method, based on CT scans and organ delineations of recently treated patients. Specifically, the feasibility of automatically selecting the CT scan of a recently treated childhood cancer patient who is similar to a given historically treated child who suffered from Wilms’ tumor is assessed. Methods: A cohort of 37 recently treated children between 2- and 6-yr old are considered. Five potential notions of ground-truth similarity are proposed, each focusing on different anatomical aspects. These notions are automatically computed from CT scans of the abdomen and 3D organ delineations (liver, spleen, spinal cord, external body contour). The first is based on deformable image registration, the second on the Dice similarity coefficient, the third on the Hausdorff distance, the fourth on pairwise organ distances, and the last is computed by means of the overlap volume histogram. The relationship between typically available features of historically treated patients and the proposed ground-truth notions of similarity is studied by adopting state-of-the-art machine learning techniques, including random forest. Also, the feasibility of automatically selecting the most similar patient is assessed by comparing ground-truth rankings of similarity with predicted rankings. Results: Similarities (mainly) based on the external abdomen shape and on the pairwise organ distances are highly correlated (Pearson r_p ≥ 0.70) and are successfully modeled with random forests based on historically recorded features (pseudo-R² ≥ 0.69). In contrast, similarities based on the shape of internal organs cannot be modeled. For the similarities that random forest can reliably model, an estimation of feature relevance indicates that abdominal diameters and weight are the most important. Experiments on automatically selecting similar patients lead to coarse, yet quite robust results: the most similar patient is retrieved only 22% of the times, however, the error in worst-case scenarios is limited, with the fourth most similar patient being retrieved. Conclusions: Results demonstrate that automatically selecting similar patients is feasible when focusing on the shape of the external abdomen and on the position of internal organs. Moreover, whereas the common practice in phantom-based dose reconstruction is to select a representative phantom using age, height, and weight as discriminant factors for any treatment scenario, our analysis on abdominal tumor treatment for children shows that the most relevant features are weight and the anterior–posterior and left–right abdominal diameters. ...

Purpose: The aim of this study is to establish the first step toward a novel and highly individualized three-dimensional (3D) dose distribution reconstruction method, based on CT scans and organ delineations of recently treated patients. Specifically, the feasibility of automatically selecting the CT scan of a recently treated childhood cancer patient who is similar to a given historically treated child who suffered from Wilms’ tumor is assessed. Methods: A cohort of 37 recently treated children between 2- and 6-yr old are considered. Five potential notions of ground-truth similarity are proposed, each focusing on different anatomical aspects. These notions are automatically computed from CT scans of the abdomen and 3D organ delineations (liver, spleen, spinal cord, external body contour). The first is based on deformable image registration, the second on the Dice similarity coefficient, the third on the Hausdorff distance, the fourth on pairwise organ distances, and the last is computed by means of the overlap volume histogram. The relationship between typically available features of historically treated patients and the proposed ground-truth notions of similarity is studied by adopting state-of-the-art machine learning techniques, including random forest. Also, the feasibility of automatically selecting the most similar patient is assessed by comparing ground-truth rankings of similarity with predicted rankings. Results: Similarities (mainly) based on the external abdomen shape and on the pairwise organ distances are highly correlated (Pearson r_p ≥ 0.70) and are successfully modeled with random forests based on historically recorded features (pseudo-R² ≥ 0.69). In contrast, similarities based on the shape of internal organs cannot be modeled. For the similarities that random forest can reliably model, an estimation of feature relevance indicates that abdominal diameters and weight are the most important. Experiments on automatically selecting similar patients lead to coarse, yet quite robust results: the most similar patient is retrieved only 22% of the times, however, the error in worst-case scenarios is limited, with the fourth most similar patient being retrieved. Conclusions: Results demonstrate that automatically selecting similar patients is feasible when focusing on the shape of the external abdomen and on the position of internal organs. Moreover, whereas the common practice in phantom-based dose reconstruction is to select a representative phantom using age, height, and weight as discriminant factors for any treatment scenario, our analysis on abdominal tumor treatment for children shows that the most relevant features are weight and the anterior–posterior and left–right abdominal diameters.

Efficient, effective, and insightful tackling of the high-dose-rate brachytherapy treatment planning problem for prostate cancer using evolutionary multi-objective Optimization Algorithms

Conference paper (2017) - Ngoc Hoang Luong, Anton Bouter, Marjolein C. Van Der Meer, Yury Niatsetski, Cees Witteveen, Arjan Bel, Tanja Alderliesten, Peter A.N. Bosman

We address the problemof high-dose-rate brachytherapy treatment planning for prostate cancer. The problem involves determining a treatment plan consisting of the so-called dwell times that a radiation source resides at different positions inside the patient such that the prostate volume and the seminal vesicles are covered by the prescribed radiation dose level as much as possiblewhile the organs at risk, e.g., bladder, rectum, and urethra, are irradiated as little as possible. The problem is highly constrained, following clinical requirements for radiation dose distributionwhile the planning process for treatment planners to design a clinically-Acceptable treatment plan is strictly time-limited. In this paper, we propose that the problem can be formulated as a bi-objective optimization problem that intuitively describes trade-offs between target volumes to be radiated and organs to be spared. We solve this problem with the recently-introduced Multi-Objective Real-Valued Genepool Optimal Mixing Evolutionary Algorithm (MO-RV-GOMEA), which is a promising MOEA that is able to effectively exploit dependencies between problem variables to tackle complicated problems in the continuous domain. MO-RV-GOMEA also has the capability to perform partial evaluations if problem structures allow local variations in existing solutions to be efficiently computed, substantially accelerating the overall optimization performance. Experiments on real medical data and comparison with state-of-Theart MOEAs confirm our claims. ...

The multi-objective real-valued gene-pool optimal mixing evolutionary algorithm

Conference paper (2017) - Anton Bouter, Ngoc Hoang Luong, Cees Witteveen, Tanja Alderliesten, Peter Bosman

The recently introduced Multi-Objective Gene-pool Optimal Mixing Evolutionary Algorithm (MO-GOMEA) exhibits excellent scalability in solving a wide range of challenging discrete multi-objective optimization problems. In this paper, we address scalability issues in solving multi-objective optimization problems with continuous variables by introducing the Multi-Objective Real-Valued GOMEA (MO-RV-GOMEA), which combines MO-GOMEA with aspects of the multi-objective estimation-of-distribution algorithm known as MAMaLGaM. MO-RV-GOMEA exploits linkage structure in optimization problems by performing distribution estimation, adaptation, and sampling as well as solution mixing based on an explicitly-defined linkage model. Such a linkage model can be defined a priori when some problem-specific knowledge is available, or it can be learned from the population. The scalability of MO-RV-GOMEA using different linkage models is compared to the state-of-the-art multi-objective evolutionary algorithms NSGA-II and MAMaLGaM on a wide range of benchmark problems. MO-RV-GOMEA is found to retain the excellent scalability of MO-GOMEA through the successful exploitation of linkage structure, scaling substantially better than NSGA-II and MAMaLGaM. This scalability is even further improved when partial evaluations are possible, achieving strongly sub-linear scalability in terms of the number of evaluations. ...

Exploiting linkage information in real-valued optimization with the real-valued gene-pool optimal mixing evolutionary algorithm

Conference paper (2017) - Anton Bouter, Cees Witteveen, Tanja Alderliesten, Peter Bosman

The recently introduced Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) has been shown to be among the state-of-the-art for solving discrete optimization problems. Key to the success of GOMEA is its ability to efficiently exploit the linkage structure of a problem. Here, we introduce the Real-Valued GOMEA (RV-GOMEA), which incorporates several aspects of the real-valued EDA known as AMaLGaM into GOMEA in order to make GOMEA well-suited for real-valued optimization. The key strength of GOMEA to competently exploit linkage structure is effectively preserved in RV-GOMEA, enabling excellent performance on problems that exhibit a linkage structure that is to some degree decomposable. Moreover, the main variation operator of GOMEA enables substantial improvements in performance if the problem allows for partial evaluations, which may be very well possible in many real-world applications. Comparisons of performance with state-of-the-art algorithms such as CMA-ES and AMaLGaM on a set of well-known benchmark problems show that RV-GOMEA achieves comparable, excellent scalability in case of black-box optimization. Moreover, RV-GOMEA achieves unprecedented scalability on problems that allow for partial evaluations, reaching near-optimal solutions for problems with up to millions of real-valued variables within one hour on a normal desktop computer. ...

Newton Power Flow Methods for Unbalanced Three-Phase Distribution Networks

Journal article (2017) - Baljinnyam Sereeter, Kees Vuik, Cees Witteveen

Two mismatch functions (power or current) and three coordinates (polar, Cartesian andcomplex form) result in six versions of the Newton–Raphson method for the solution of powerflow problems. In this paper, five new versions of the Newton power flow method developed forsingle-phase problems in our previous paper are extended to three-phase power flow problems.Mathematical models of the load, load connection, transformer, and distributed generation (DG)are presented. A three-phase power flow formulation is described for both power and currentmismatch functions. Extended versions of the Newton power flow method are compared with thebackward-forward sweep-based algorithm. Furthermore, the convergence behavior for differentloading conditions,R/Xratios, and load models, is investigated by numerical experiments onbalanced and unbalanced distribution networks. On the basis of these experiments, we conclude thattwo versions using the current mismatch function in polar and Cartesian coordinates perform thebest for both balanced and unbalanced distribution networks. ...

On a comparison of Newton-Raphson solvers for power flow problems

Report (2017) - B. Sereeter, C. Vuik, C. Witteveen

A general framework is given for applying the Newton-Raphson method to solve power flow problems, using power and current-mismatch functions in polar, Cartesian coordinates and complex form. These two mismatch functions and three coordinates, result in six versions of the Newton-Raphson method for the solution of power flow problems. We present a theoretical framework to compare these variants for PQ-buses and PV-buses. Furthermore, the convergence behavior is investigated by numerical experiments. This enables us to compare new versions with existing versions of the Newton power flow methods. We conclude that the polar current-mismatch and Cartesian current-mismatch versions that are developed in this paper, performed the best result for both distribution and transmission networks. ...

Scalable genetic programming by gene-pool optimal mixing and input-space entropy-based building block learning

Other (2017) - Marco Virgolin, Tanja Alderliesten, Cees Witteveen, P.A.N. Bosman

The Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) is a recently introduced model-based EA that has been shown to be capable of outperforming state-of-the-art alternative EAs in terms of scalability when solving discrete optimization problems. One of the key aspects of GOMEA's success is a variation operator that is designed to extensively exploit linkage models by effectively combining partial solutions. Here, we bring the strengths of GOMEA to Genetic Programming (GP), introducing GP-GOMEA. Under the hypothesis of having little problem-specific knowledge, and in an effort to design easy-to-use EAs, GP-GOMEA requires no parameter specification. On a set of well-known benchmark problems we find that GP-GOMEA outperforms standard GP while being on par with more recently introduced, state-of-the-art EAs. We furthermore introduce Input-space Entropy-based Building-block Learning (IEBL), a novel approach to identifying and encapsulating relevant building blocks (subroutines) into new terminals and functions. On problems with an inherent degree of modularity, IEBL can contribute to compact solution representations, providing a large potential for knock-on effects in performance. On the difficult, but highly modular Even Parity problem, GP-GOMEA+IEBL obtains excellent scalability, solving the 14-bit instance in less than 1 hour. ...

Stochastic task networks trading performance for stability

Conference paper (2017) - Kiriakos Simon Mountakis, Tomas Klos, Cees Witteveen

This paper concerns networks of precedence constraints between tasks with random durations, known as stochastic task networks, often used to model uncertainty in real-world applications. In some applications, we must associate tasks with reliable start-times from which realized start-times will (most likely) not deviate too far. We examine a dispatching strategy according to which a task starts as early as precedence constraints allow, but not earlier than its corresponding planned release-time. As these release-times are spread farther apart on the time-axis, the randomness of realized start-times diminishes (i.e. stability increases). Effectively, task start-times becomes less sensitive to the outcome durations of their network predecessors. With increasing stability, however, performance deteriorates (e.g. expected makespan increases). Assuming a sample of the durations is given, we define an LP for finding release-times that minimize the performance penalty of reaching a desired level of stability. The resulting LP is costly to solve, so, targeting a specific part of the solution-space, we define an associated Simple Temporal Problem (STP) and show how optimal release-times can be constructed from its earliest-start-time solution. Exploiting the special structure of this STP, we present our main result, a dynamic programming algorithm that finds optimal release-times with considerable efficiency gains. ...

Newton Power Flow Methods for Unbalanced Three-Phase Distribution Networks

Report (2017) - B. Sereeter, C. Vuik, C. Witteveen

Two mismatch functions (power or current) and three coordinates (polar, Cartesian and complex form) result in six versions of the Newton–Raphson method for the solution of power flow problems. In this paper, five new versions of the Newton power flow method developed for single-phase problems in our previous paper are extended to three-phase power flow problems. Mathematical models of the load, load connection, transformer, and dis- tributed generation (DG) are presented. A three-phase power flow formulation is described for both power and current mismatch functions. Extended versions of the Newton power flow method are compared with the backward-forward sweep-based algorithm. Furthermore, the convergence behavior for different loading conditions, R/X ratios, and load models, is investigated by numerical experiments on balanced and unbalanced distribution networks. On the basis of these experiments, we conclude that two versions using the current mis match function in polar and Cartesian coordinates perform the best for both balanced and unbalanced distribution networks. ...

Learning to Associate Distances with Historical Patient Data to Enable Fine-grained Studying of Late Adverse Effects of Paediatric Radiotherapy: Data, Methodology, and First Results

Conference paper (2016) - Marco Virgolin, Irma Dijk van, Jan Wiersma, Cecile Ronckers, Cees Witteveen, Coen Rasch, Arjan Bel, Tanja Alderliesten, Peter Bosman

Optimising Flexibility for Simple Temporal Networks

Conference paper (2016) - Cees Witteveen

We generalise a recently proposed concurrent flexibility metric to overcome some of its shortcomings. We show that these shortcomings can be removed if one selects an optimal subset of variables for which the concurrent flexibility is determined. The flexibility of the remaining variables does not play a role in the determination of the flexibility of the system. We present a preliminary experimental evaluation of the improvement in concurrent flexibility that can be obtained by comparing some (approximation) algorithms. Their performance on several benchmark sets is evaluated. As a result, in some cases the concurrent flexibility of an STN can be enhanced by 20-50%. ...

Temporal Flexibility Revisited: Maximizing Flexibility by Computing Bipartite Matchings

Conference paper (2015) - Simon Mountakis, Tomas Klos, Cees Witteveen

We discuss two flexibility metrics for Simple Temporal Networks (STNs): the so-called naive flexibility metric based on the difference between earliest and latest starting times of temporal variables, and a recently proposed concurrent flexibility metric. We establish an interesting connection between the computation of these flexibility metrics and properties of the minimal distance matrix DS of an STN S: the concurrent flexibility metric can be computed by finding a minimum weight matching of a weighted bipartite graph completely specified by DS, while the naive flexibility metric corresponds to computing a maximum weight matching in the same graph. From a practical point of view this correspondence offers an advantage: instead of using an O(n5) LP-based approach, reducing the problem to a matching problem we derive an O(n3) algorithm for computing the concurrent flexibility metric. ...

Context-aware reconfiguration of large-scale surveillance systems

Argumentative approach

Journal article (2015) - Peter Novák, Cees Witteveen

The Metis research project aims at supporting maritime safety and security by facilitating continuous monitoring of vessels in national coastal waters and prevention of phenomena, such as vessel collisions, environmental hazard, or detection of malicious intents, such as smuggling. Surveillance systems such as Metis typically comprise a number of heterogeneous information sources and information aggregators. Among the main problems of their deployment lies their scalability with respect to a potentially large number of monitored entities. One of the solutions to the problem is continuous and timely adaptation and reconfiguration of the system according to the changing environment it operates in. At any given timepoint, the system should use only a minimal set of information sources and aggregators needed to facilitate effective and early detection of indicators of interest. Here, we describe the Metis system prototype and introduce a theoretical framework for modelling scalable information-aggregation systems. We model information-aggregation systems as networks of inter-dependent reasoning agents, each representing a mechanism for justification/refutation of a conclusion derived by the agent. The proposed continuous reconfiguration algorithm relies on standard results from abstract argumentation and corresponds to computation of a grounded extension of the argumentation framework associated with the system. Finally, we demonstrate the flexibility of the presented framework by extending the proposed algorithm to adapt to context-dependent changes in information sources availability, as well as shifts in system's focus according to its context. ...

Reconfiguration of large-scale surveillance systems

Conference paper (2013) - P Novak, C Witteveen