Jv

J.G.M. van der Linden

info

Please Note

10 records found

Algorithms and Applications

In recent years, artificial intelligence has increasingly permeated society, including applications in high-stakes domains that may have a significant and even harmful impact on people. For these domains, it is essential to have comprehensible and reliable models. However, many of the most commonly used models, such as neural networks, are black-box models, i.e., inherently opaque and thus hard to trust.

Therefore, this dissertation explores decision trees as a human-interpretable alternative. Specifically, we focus on optimal decision trees, since the traditional greedy heuristics often learn decision trees that are too large to interpret easily, whereas optimal decision trees provably optimize the objective of a learning problem (e.g., accuracy) given a constraint on model size. The compactness and transparency of these optimal models contribute to their interpretability, while in many cases they achieve accuracy comparable to that of much more complex black-box models. In addition, with many optimal learning methods it is easier to specify a learning task other than the default of maximizing classification accuracy.

However, since learning optimal decision trees is an NP-hard problem, scalability is a major concern. State-of-the-art model-based approaches, such as mixed-integer programming and Boolean satisfiability, often do not scale beyond a few hundred samples and shallow depth limits. On the other hand, search-based approaches, such as dynamic programming, scale much better by exploiting the separable tree structure using specialized algorithms to solve subtrees as independent (and repeated) subproblems. But these methods are mostly limited to classification, while their scalability depends on a coarse binarization of the numerical data.

Therefore, in this thesis, we study optimal decision trees across four challenges: (i) efficiently optimizing learning tasks other than classification, (ii) improving our understanding of the differences between optimal and traditional greedy decision trees, (iii) efficiently handling data with numerical attributes; and (iv) efficiently finding the whole Rashomon set: the set of all (near-)optimal decision trees.

In the first challenge we address improving scalability for optimizing decision trees for a variety of learning tasks other than classification. We build on existing dynamic programming approaches for their scalability, and because this scalability depends on the property that subproblems are separable, we study and formalize the precise necessary and sufficient conditions for separability. These conditions turn out to be broader than previously assumed, and we use these new insights to improve the scalability of optimal decision trees for a variety of learning tasks, such as classification under fairness constraints, survival analysis, regression, prescriptive policy learning, and cost-sensitive classification. We also show how different optimization techniques, such as branch-and-bound and a specialized subroutine for depth-two trees, can be adapted and applied across several of these tasks. Together, these techniques improve scalability by one or more orders of magnitude over the state of the art for multiple learning tasks.

The second challenge is to improve our understanding of the differences between optimal and greedy heuristic methods for learning decision trees because previous comparisons were severely limited by lack of scalability. We empirically study the effect of the loss function and hyperparameter tuning of optimal decision trees and conclude that the heuristic and optimal approach are inherently different, for example, concave loss functions, which are required for many heuristics, do not work well for optimal methods. Previous comparisons also resulted in divergent and contradicting claims about optimal methods, and therefore, we empirically test six such claims from the literature. We confirm the most important one: when measuring both size (as a proxy of interpretability) and accuracy, optimal decision trees clearly dominate trees learned by greedy heuristics. We also refute several claims, including the claim that optimal methods are more prone to overfitting, and the claim that the differences between the two methods diminish with more data. For both claims, we find evidence suggesting the opposite.

The third challenge is to improve scalability when the data contains numerical attributes, without falling back to a coarse binarization as most dynamic programming approaches do. Therefore, we develop new algorithmic techniques that exploit the properties of numerical attributes. We show how reasoning about subproblem similarity makes it possible to prove that large parts of the search space can be pruned, while the ordering among numeric values makes it possible to compute aspects of the problem incrementally. Together, these techniques result in scalability improvements of one to two orders of magnitude compared to the state of the art.

For the fourth challenge, we go beyond learning a single optimal model. Instead, we focus on the set of all decision trees within a small percentage of the optimal solution, the so-called Rashomon set. The Rashomon set can be useful for data analysis, for example in determining which attributes have the greatest impact on predictions, or in finding better counterfactual explanations. Our contribution is a more efficient, anytime, sorted enumeration of all decision trees in the Rashomon set using lazy evaluation, as well as adapting and applying existing algorithmic techniques for optimal decision trees to compute the Rashomon set. Here too, our method improves scalability by one or more orders of magnitude compared to the previous best approach.

In summary, this dissertation improves scalability by several orders of magnitude, broadens the range of possible applications, and deepens both the theoretical and empirical understanding of optimal decision trees and decision tree Rashomon sets. Therefore, we can now recommend the use of optimal decision trees as human interpretable models for a large variety of real-world data science problems. ...
Computing an optimal classification tree that provably maximizes training performance within a given size limit, is NP-hard, and in practice, most state-of-the-art methods do not scale beyond computing optimal trees of depth three. Therefore, most methods rely on a coarse binarization of continuous features to maintain scalability. We propose a novel algorithm that optimizes trees directly on the continuous feature data using dynamic programming with branch-and-bound. We develop new pruning techniques that eliminate many sub-optimal splits in the search when similar to previously computed splits and we provide an efficient subroutine for computing optimal depth-two trees. Our experiments demonstrate that these techniques improve runtime by one or more orders of magnitude over state-of-the-art optimal methods and improve test accuracy by 5% over greedy heuristics.
...

A Dynamic Programming Approach

Survival analysis studies and predicts the time of death, or other singular unrepeated events, based on historical data, while the true time of death for some instances is unknown. Survival trees enable the discovery of complex nonlinear relations in a compact human comprehensible model, by recursively splitting the population and predicting a distinct survival distribution in each leaf node. We use dynamic programming to provide the first survival tree method with optimality guarantees, enabling the assessment of the optimality gap of heuristics. We improve the scalability of our method through a special algorithm for computing trees up to depth two. The experiments show that our method’s run time even outperforms some heuristics for realistic cases while obtaining similar out-of-sample performance with the state-of-the-art. ...
Global optimization of decision trees has shown to be promising in terms of accuracy, size, and consequently human comprehensibility. However, many of the methods used rely on general-purpose solvers for which scalability remains an issue. Dynamic programming methods have been shown to scale much better because they exploit the tree structure by solving subtrees as independent subproblems. However, this only works when an objective can be optimized separately for subtrees. We explore this relationship in detail and show necessary and sufficient conditions for such separability and generalize previous dynamic programming approaches into a framework that can optimize any combination of separable objectives and constraints. Experiments on five application domains show the general applicability of this framework, while outperforming the scalability of general-purpose solvers by a large margin. ...

A Dynamic Programming Approach

Interpretable and fair machine learning models are required for many applications, such as credit assessment and in criminal justice. Decision trees offer this interpretability, especially when they are small. Optimal decision trees are of particular interest because they offer the best performance possible for a given size. However, state-of-the-art algorithms for fair and optimal decision trees have scalability issues, often requiring several hours to find such trees even for small datasets. Previous research has shown that dynamic programming (DP) performs well for optimizing decision trees because it can exploit the tree structure. However, adding a global fairness constraint to a DP approach is not straightforward, because the global constraint violates the condition that subproblems should be independent. We show how such a constraint can be incorporated by introducing upper and lower bounds on final fairness values for partial solutions of subproblems, which enables early comparison and pruning. Our results show that our model can find fair and optimal trees several orders of magnitude faster than previous methods, and now also for larger datasets that were previously beyond reach. Moreover, we show that with this substantial improvement our method can find the full Pareto front in the trade-off between accuracy and fairness. ...
Due to increasing numbers of intermittent and distributed generators in power systems, there is an increasing need for demand responses to maintain the balance between electricity generation and use at all times. For example, the electrification of transportation significantly adds to the amount of flexible electricity demand. Several methods have been developed to schedule such flexible energy consumption. However, an objective way of comparing these methods is lacking, especially when decisions are made based on incomplete information which is repeatedly updated. This paper presents a new benchmarking framework designed to bridge this gap. Surveys that classify flexibility planning algorithms were an input to define this benchmarking standard. The benchmarking framework can be used for different objectives and under diverse conditions faced by electricity production stakeholders interested in flexibility scheduling algorithms. Our contribution was implemented in a software toolbox providing a simulation environment that captures the evolution of look-ahead information, which enables comparing online planning and scheduling algorithms. This toolbox includes seven planning algorithms. This paper includes two case studies measuring the performances of these algorithms under uncertain market conditions. These case studies illustrate the importance of online decision making, the influence of data quality on the performance of the algorithms, the benefit of using robust and stochastic programming approaches, and the necessity of trustworthy benchmarking. ...
Journal article (2021) - Natalia Romero, Koos van der Linden, German Morales-Espania, Mathijs de Weerdt
The power system is undergoing a significant change as it adapts to the intermittency and uncertainty from renewable generation. Flexibility from loads such as electric vehicles (EVs) can serve as reserves to sustain the supply-demand balance in the grid. Some reserve markets have rules for participation that are computationally challenging for aggregators of such flexible loads: they are asked to bid both volume and price, and on top of this there is a minimum-volume requirement, a constraint currently under discussion both in the US and European markets. Several state-of-the-art methods to find a bidding strategy for the demand scheduling of large fleets of flexible loads in the day-ahead and reserve market are adapted to deal with such a shared constraint, and are compared based on costs, unscheduled demand, and running time. The experimental analysis shows that although such a shared constraint significantly affects scalability, some of the proposed adaptations can deal with this without much loss in quality. This comparison also shows the importance of including good uncertainty models for dealing with the risk of not meeting the users’ demands, and that it is possible to find an optimal single price per time unit for scheduling a fleet of EVs. ...
Conference paper (2018) - Mathijs de Weerdt, Michael Albert, Vincent Conitzer, Koos van der Linden
The problem of optimally scheduling the charging demand of electric vehicles within the constraints of the electricity infrastructure is called the charge scheduling problem. The models of the charging speed, horizon, and charging demand determine the computational complexity of the charge scheduling problem. We show that for about 20 variants the problem is either in P or weakly NP-hard and dynamic programs exist to compute optimal solutions. About 10 other variants of the problem are strongly NP-hard, presenting a potentially significant obstacle to their use in practical situations of scale. An experimental study establishes up to what parameter values the dynamic programs can determine optimal solutions in a couple of minutes.
...
Conference paper (2018) - Mathijs de Weerdt, Michael Albert, Vincent Conitzer, Koos van der Linden
The problem of optimally scheduling the charging demand of electric vehicles within the constraints of the electricity infrastructure is called the charge scheduling problem. The models of the charging speed, horizon, and charging demand determine the computational complexity of the charge scheduling problem. For about 20 variants the problem is either in P or weakly NP-hard and dynamic programs exist to compute optimal solutions. About 10 other variants of the problem are strongly NP-hard, presenting a potentially significant obstacle to their use in practical situations of scale. ...
Conference paper (2018) - Koos van der Linden, Mathijs de Weerdt, German Morales-Espana
In power systems, demand and supply always have to be balanced. This is becoming more challenging due to the sustained penetration of renewable energy sources. Because of the increasing amount of electrical vehicles (EVs), and the high capacity and flexibility of their charging process, EVs are a good candidate for providing balancing services to electric systems. We propose a stochastic optimization method for an EV aggregator that models the uncertainty of the imbalance price, the reserve prices and the probability of acceptance and deployment of reserves. The model results in an optimal charging and discharging strategy considering day-ahead purchase, imbalance trading and reserve bids. Unlike previous studies, the reserve bids consists of both a quantity and an optimal price. Experimental evaluation shows that the proposed stochastic optimization method results in lower costs than deterministic and quantity-only bid solutions. ...